From 04e70ea14eb11bacd58b675b3820f80cd715b3ec Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:44:50 +0700 Subject: [PATCH 001/667] Add model 2023-11-06-bert_ner_biobert_ner_bc2gm_corpus_en --- ...06-bert_ner_biobert_ner_bc2gm_corpus_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_bc2gm_corpus_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_bc2gm_corpus_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_bc2gm_corpus_en.md new file mode 100644 index 00000000000000..743d73b5f5dd24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_bc2gm_corpus_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biobert_ner_bc2gm_corpus BertForTokenClassification from drAbreu +author: John Snow Labs +name: bert_ner_biobert_ner_bc2gm_corpus +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biobert_ner_bc2gm_corpus` is a English model originally trained by drAbreu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_ner_bc2gm_corpus_en_5.2.0_3.0_1699288986237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_ner_bc2gm_corpus_en_5.2.0_3.0_1699288986237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_ner_bc2gm_corpus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biobert_ner_bc2gm_corpus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_ner_bc2gm_corpus| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/drAbreu/bioBERT-NER-BC2GM_corpus \ No newline at end of file From 504e1b29c731c48bfa359c21ab46a4e33338dbbb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:45:50 +0700 Subject: [PATCH 002/667] Add model 2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa --- ...persian_farsi_base_uncased_ner_peyma_fa.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa.md new file mode 100644 index 00000000000000..afd53a37430dc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Persian bert_ner_bert_persian_farsi_base_uncased_ner_peyma BertForTokenClassification from HooshvareLab +author: John Snow Labs +name: bert_ner_bert_persian_farsi_base_uncased_ner_peyma +date: 2023-11-06 +tags: [bert, fa, open_source, token_classification, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_persian_farsi_base_uncased_ner_peyma` is a Persian model originally trained by HooshvareLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa_5.2.0_3.0_1699288961564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_persian_farsi_base_uncased_ner_peyma_fa_5.2.0_3.0_1699288961564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_persian_farsi_base_uncased_ner_peyma","fa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_persian_farsi_base_uncased_ner_peyma", "fa") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_persian_farsi_base_uncased_ner_peyma| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|fa| +|Size:|606.6 MB| + +## References + +https://huggingface.co/HooshvareLab/bert-fa-base-uncased-ner-peyma \ No newline at end of file From 14a9bc0d88bf0dcd2819d382e971fde4e0e1b431 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:46:50 +0700 Subject: [PATCH 003/667] Add model 2023-11-06-bert_ner_bert_base_german_cased_20000_ner_uncased_de --- ..._base_german_cased_20000_ner_uncased_de.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_20000_ner_uncased_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_20000_ner_uncased_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_20000_ner_uncased_de.md new file mode 100644 index 00000000000000..c828469b2d68ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_20000_ner_uncased_de.md @@ -0,0 +1,114 @@ +--- +layout: model +title: German BertForTokenClassification Base Uncased model (from domischwimmbeck) +author: John Snow Labs +name: bert_ner_bert_base_german_cased_20000_ner_uncased +date: 2023-11-06 +tags: [bert, ner, open_source, de, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-german-cased-20000-ner-uncased` is a German model originally trained by `domischwimmbeck`. + +## Predicted Entities + +`PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_german_cased_20000_ner_uncased_de_5.2.0_3.0_1699286426973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_german_cased_20000_ner_uncased_de_5.2.0_3.0_1699286426973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_german_cased_20000_ner_uncased","de") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ich liebe Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_german_cased_20000_ner_uncased","de") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ich liebe Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.bert.uncased_base").predict("""Ich liebe Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_german_cased_20000_ner_uncased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.9 MB| +|Case sensitive:|false| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/domischwimmbeck/bert-base-german-cased-20000-ner-uncased \ No newline at end of file From 75e8c183f1db1ea51904aa7bce13316febfda04d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:47:51 +0700 Subject: [PATCH 004/667] Add model 2023-11-06-bert_ner_archeobertje_ner_en --- ...2023-11-06-bert_ner_archeobertje_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_archeobertje_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_archeobertje_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_archeobertje_ner_en.md new file mode 100644 index 00000000000000..a496112d3f3e62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_archeobertje_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_archeobertje_ner BertForTokenClassification from alexbrandsen +author: John Snow Labs +name: bert_ner_archeobertje_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_archeobertje_ner` is a English model originally trained by alexbrandsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_archeobertje_ner_en_5.2.0_3.0_1699271484539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_archeobertje_ner_en_5.2.0_3.0_1699271484539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_archeobertje_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_archeobertje_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_archeobertje_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.5 MB| + +## References + +https://huggingface.co/alexbrandsen/ArcheoBERTje-NER \ No newline at end of file From 83d5ac372b010ecd5cffd9a8cb1f481421088e2a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:48:51 +0700 Subject: [PATCH 005/667] Add model 2023-11-06-bert_ner_bert_finetuned_mutation_recognition_4_en --- ...ert_finetuned_mutation_recognition_4_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_4_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_4_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_4_en.md new file mode 100644 index 00000000000000..567ae730a700f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_4_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Salvatore) +author: John Snow Labs +name: bert_ner_bert_finetuned_mutation_recognition_4 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-mutation-recognition-4` is a English model originally trained by `Salvatore`. + +## Predicted Entities + +`SNP`, `ProteinMutation`, `DNAMutation` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_mutation_recognition_4_en_5.2.0_3.0_1699289226587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_mutation_recognition_4_en_5.2.0_3.0_1699289226587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_mutation_recognition_4","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_mutation_recognition_4","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.mutation_recognition_4.by_salvatore").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_mutation_recognition_4| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Salvatore/bert-finetuned-mutation-recognition-4 \ No newline at end of file From c3df27427ffc89fe5d9055a09bce1fafc441beee Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:49:51 +0700 Subject: [PATCH 006/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner_en --- ...23-11-06-bert_ner_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..1401996a5a439b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from caotianyu1996) +author: John Snow Labs +name: bert_ner_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert_finetuned_ner` is a English model originally trained by `caotianyu1996`. + +## Predicted Entities + +`Disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_en_5.2.0_3.0_1699285947988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_en_5.2.0_3.0_1699285947988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_caotianyu1996").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/caotianyu1996/bert_finetuned_ner \ No newline at end of file From 9ce7987a287a2ebf77db987281cdea9da5af1d8c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:50:51 +0700 Subject: [PATCH 007/667] Add model 2023-11-06-bert_ner_craft_original_bluebert_384_en --- ...bert_ner_craft_original_bluebert_384_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_bluebert_384_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_bluebert_384_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_bluebert_384_en.md new file mode 100644 index 00000000000000..ebb634e152a2c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_bluebert_384_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_craft_original_bluebert_384 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_craft_original_bluebert_384 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_craft_original_bluebert_384` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_craft_original_bluebert_384_en_5.2.0_3.0_1699279342315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_craft_original_bluebert_384_en_5.2.0_3.0_1699279342315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_craft_original_bluebert_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_craft_original_bluebert_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_craft_original_bluebert_384| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/CRAFT-Original-BlueBERT-384 \ No newline at end of file From e5c1d9c7e342308d0bda8a2e8c0a95a4281d504f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:51:51 +0700 Subject: [PATCH 008/667] Add model 2023-11-06-bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx --- ...ietnamese_italian_spanish_tinparadox_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx.md new file mode 100644 index 00000000000000..a642640fb50741 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_ner_ner_english_vietnamese_italian_spanish_tinparadox BertForTokenClassification from tinparadox +author: John Snow Labs +name: bert_ner_ner_english_vietnamese_italian_spanish_tinparadox +date: 2023-11-06 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ner_english_vietnamese_italian_spanish_tinparadox` is a Multilingual model originally trained by tinparadox. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx_5.2.0_3.0_1699281445339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_english_vietnamese_italian_spanish_tinparadox_xx_5.2.0_3.0_1699281445339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_english_vietnamese_italian_spanish_tinparadox","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_ner_english_vietnamese_italian_spanish_tinparadox", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_english_vietnamese_italian_spanish_tinparadox| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/tinparadox/NER-en-vi-it-es \ No newline at end of file From 911a3d0859730d318fd0fdaa5971fef6469e83d0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:52:52 +0700 Subject: [PATCH 009/667] Add model 2023-11-06-bert_ner_bertimbau_large_lener_breton_luciano_pt --- ...bertimbau_large_lener_breton_luciano_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_large_lener_breton_luciano_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_large_lener_breton_luciano_pt.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_large_lener_breton_luciano_pt.md new file mode 100644 index 00000000000000..b04f7d0da73c3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_large_lener_breton_luciano_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese bert_ner_bertimbau_large_lener_breton_luciano BertForTokenClassification from Luciano +author: John Snow Labs +name: bert_ner_bertimbau_large_lener_breton_luciano +date: 2023-11-06 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bertimbau_large_lener_breton_luciano` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bertimbau_large_lener_breton_luciano_pt_5.2.0_3.0_1699289482461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bertimbau_large_lener_breton_luciano_pt_5.2.0_3.0_1699289482461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bertimbau_large_lener_breton_luciano","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bertimbau_large_lener_breton_luciano", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bertimbau_large_lener_breton_luciano| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Luciano/bertimbau-large-lener_br \ No newline at end of file From 6e199fd7b256e8b7b9cb7fe7c080d6625724fc51 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:53:52 +0700 Subject: [PATCH 010/667] Add model 2023-11-06-bert_ner_bert_base_turkish_ner_cased_pretrained_tr --- ...rt_base_turkish_ner_cased_pretrained_tr.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_turkish_ner_cased_pretrained_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_turkish_ner_cased_pretrained_tr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_turkish_ner_cased_pretrained_tr.md new file mode 100644 index 00000000000000..d5f08cf809e3f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_turkish_ner_cased_pretrained_tr.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Turkish BertForTokenClassification Base Cased model (from beyhan) +author: John Snow Labs +name: bert_ner_bert_base_turkish_ner_cased_pretrained +date: 2023-11-06 +tags: [bert, ner, open_source, tr, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-turkish-ner-cased-pretrained` is a Turkish model originally trained by `beyhan`. + +## Predicted Entities + +`LOC`, `U-ORG`, `PER`, `U-LOC`, `L-ORG`, `U-PER`, `ORG`, `L-LOC`, `L-PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_turkish_ner_cased_pretrained_tr_5.2.0_3.0_1699287757355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_turkish_ner_cased_pretrained_tr_5.2.0_3.0_1699287757355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_turkish_ner_cased_pretrained","tr") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Spark NLP'yi seviyorum"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_turkish_ner_cased_pretrained","tr") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Spark NLP'yi seviyorum").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("tr.ner.bert.cased_base.by_beyhan").predict("""Spark NLP'yi seviyorum""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_turkish_ner_cased_pretrained| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/beyhan/bert-base-turkish-ner-cased-pretrained \ No newline at end of file From dd1e0e2ecf023e18228f285bca000cba5d53101b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:54:52 +0700 Subject: [PATCH 011/667] Add model 2023-11-06-bert_ner_bert_mt4ts_en --- .../2023-11-06-bert_ner_bert_mt4ts_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mt4ts_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mt4ts_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mt4ts_en.md new file mode 100644 index 00000000000000..1527b623f0ee42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mt4ts_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bert_mt4ts BertForTokenClassification from kevinjesse +author: John Snow Labs +name: bert_ner_bert_mt4ts +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_mt4ts` is a English model originally trained by kevinjesse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mt4ts_en_5.2.0_3.0_1699286187357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mt4ts_en_5.2.0_3.0_1699286187357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_mt4ts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_mt4ts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_mt4ts| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|549.8 MB| + +## References + +https://huggingface.co/kevinjesse/bert-MT4TS \ No newline at end of file From eddd53065f7cb4589fc9cb3312bca865cf9672c5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:55:52 +0700 Subject: [PATCH 012/667] Add model 2023-11-06-bert_ner_bertimbau_base_lener_breton_luciano_pt --- ..._bertimbau_base_lener_breton_luciano_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_base_lener_breton_luciano_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_base_lener_breton_luciano_pt.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_base_lener_breton_luciano_pt.md new file mode 100644 index 00000000000000..8f460182593e1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bertimbau_base_lener_breton_luciano_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese bert_ner_bertimbau_base_lener_breton_luciano BertForTokenClassification from Luciano +author: John Snow Labs +name: bert_ner_bertimbau_base_lener_breton_luciano +date: 2023-11-06 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bertimbau_base_lener_breton_luciano` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bertimbau_base_lener_breton_luciano_pt_5.2.0_3.0_1699289631856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bertimbau_base_lener_breton_luciano_pt_5.2.0_3.0_1699289631856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bertimbau_base_lener_breton_luciano","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bertimbau_base_lener_breton_luciano", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bertimbau_base_lener_breton_luciano| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Luciano/bertimbau-base-lener_br \ No newline at end of file From 81ede8d45b693126e546281d8179f005abb700bb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:56:53 +0700 Subject: [PATCH 013/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner2_en --- ...3-11-06-bert_ner_bert_finetuned_ner2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner2_en.md new file mode 100644 index 00000000000000..3c2dad5357bfb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Lamine) +author: John Snow Labs +name: bert_ner_bert_finetuned_ner2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner2` is a English model originally trained by `Lamine`. + +## Predicted Entities + +`geo`, `org`, `tim`, `gpe`, `per` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner2_en_5.2.0_3.0_1699289789019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner2_en_5.2.0_3.0_1699289789019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.sourcerecognition.v2.by_lamine").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Lamine/bert-finetuned-ner2 \ No newline at end of file From 7dac1e03b9a72c06766058f7917c34484748e817 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:57:53 +0700 Subject: [PATCH 014/667] Add model 2023-11-06-bert_ner_ag_based_ner_en --- .../2023-11-06-bert_ner_ag_based_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ag_based_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ag_based_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ag_based_ner_en.md new file mode 100644 index 00000000000000..23c2a51ce6c515 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ag_based_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Wanjiru) +author: John Snow Labs +name: bert_ner_ag_based_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `ag_based_ner` is a English model originally trained by `Wanjiru`. + +## Predicted Entities + +`ITEM`, `REGION`, `METRIC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ag_based_ner_en_5.2.0_3.0_1699283645796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ag_based_ner_en_5.2.0_3.0_1699283645796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ag_based_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ag_based_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.base.by_wanjiru").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ag_based_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Wanjiru/ag_based_ner \ No newline at end of file From 5ef11f5c09079d3a9d4c6287b90d20470100799a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:58:53 +0700 Subject: [PATCH 015/667] Add model 2023-11-06-bert_ner_carblacac_bert_finetuned_ner_en --- ...ert_ner_carblacac_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_carblacac_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_carblacac_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_carblacac_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..1edc7b39800505 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_carblacac_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from carblacac) +author: John Snow Labs +name: bert_ner_carblacac_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `carblacac`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_carblacac_bert_finetuned_ner_en_5.2.0_3.0_1699289829935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_carblacac_bert_finetuned_ner_en_5.2.0_3.0_1699289829935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_carblacac_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_carblacac_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_carblacac").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_carblacac_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/carblacac/bert-finetuned-ner \ No newline at end of file From f5fdea75ab2729e074badd58def7d59f3e72a1a2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Mon, 6 Nov 2023 23:59:53 +0700 Subject: [PATCH 016/667] Add model 2023-11-06-bert_ner_bert_large_tweetner_2020_en --- ...06-bert_ner_bert_large_tweetner_2020_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_tweetner_2020_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_tweetner_2020_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_tweetner_2020_en.md new file mode 100644 index 00000000000000..5ec096f0d7dd5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_tweetner_2020_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Large Cased model (from tner) +author: John Snow Labs +name: bert_ner_bert_large_tweetner_2020 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-large-tweetner-2020` is a English model originally trained by `tner`. + +## Predicted Entities + +`corporation`, `product`, `location`, `person`, `creative_work`, `group`, `event` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_large_tweetner_2020_en_5.2.0_3.0_1699289944517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_large_tweetner_2020_en_5.2.0_3.0_1699289944517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_large_tweetner_2020","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_large_tweetner_2020","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.tweet.large").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_large_tweetner_2020| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/tner/bert-large-tweetner-2020 \ No newline at end of file From 9fb10c1c5eb9f41a7e6239c1400e4fe1d4e0359d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:00:53 +0700 Subject: [PATCH 017/667] Add model 2023-11-06-bert_ner_craft_chem_modified_scibert_en --- ...bert_ner_craft_chem_modified_scibert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_modified_scibert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_modified_scibert_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_modified_scibert_en.md new file mode 100644 index 00000000000000..9b44a16ae5813d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_modified_scibert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_craft_chem_modified_scibert BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_craft_chem_modified_scibert +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_craft_chem_modified_scibert` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_craft_chem_modified_scibert_en_5.2.0_3.0_1699277322588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_craft_chem_modified_scibert_en_5.2.0_3.0_1699277322588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_craft_chem_modified_scibert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_craft_chem_modified_scibert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_craft_chem_modified_scibert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/ghadeermobasher/CRAFT-Chem-Modified_SciBERT \ No newline at end of file From ef901158739ce55064d27339428c94def29d2204 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:01:53 +0700 Subject: [PATCH 018/667] Add model 2023-11-06-bert_ner_model_corsican_imb_en --- ...23-11-06-bert_ner_model_corsican_imb_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_model_corsican_imb_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_model_corsican_imb_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_model_corsican_imb_en.md new file mode 100644 index 00000000000000..cd91093003a9db --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_model_corsican_imb_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_model_corsican_imb BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_model_corsican_imb +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_model_corsican_imb` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_model_corsican_imb_en_5.2.0_3.0_1699281413219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_model_corsican_imb_en_5.2.0_3.0_1699281413219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_model_corsican_imb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_model_corsican_imb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_model_corsican_imb| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/Model_co_imb \ No newline at end of file From 6873178c1728ee1b13a045a3b5b6a54b93c4781b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:02:54 +0700 Subject: [PATCH 019/667] Add model 2023-11-06-bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en --- ...al_imbalanced_scibert_scivocab_cased_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en.md new file mode 100644 index 00000000000000..d1b086f78e5b05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en_5.2.0_3.0_1699273050637.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased_en_5.2.0_3.0_1699273050637.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc5cdr_chemical_imbalanced_scibert_scivocab_cased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/ghadeermobasher/BC5CDR-Chemical_Imbalanced-scibert_scivocab_cased \ No newline at end of file From 4e5ec3383f617b2fe4a4a8a36b3428043af219c2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:03:54 +0700 Subject: [PATCH 020/667] Add model 2023-11-06-bert_ner_hiner_original_muril_base_cased_en --- ..._ner_hiner_original_muril_base_cased_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_hiner_original_muril_base_cased_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hiner_original_muril_base_cased_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hiner_original_muril_base_cased_en.md new file mode 100644 index 00000000000000..ed440b8f3bd84b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hiner_original_muril_base_cased_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_hiner_original_muril_base_cased BertForTokenClassification from cfilt +author: John Snow Labs +name: bert_ner_hiner_original_muril_base_cased +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_hiner_original_muril_base_cased` is a English model originally trained by cfilt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_hiner_original_muril_base_cased_en_5.2.0_3.0_1699276931660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_hiner_original_muril_base_cased_en_5.2.0_3.0_1699276931660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_hiner_original_muril_base_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_hiner_original_muril_base_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_hiner_original_muril_base_cased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|890.5 MB| + +## References + +https://huggingface.co/cfilt/HiNER-original-muril-base-cased \ No newline at end of file From 8e5c8b570d3a3344d37516284c520b45cf1c7e0e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:04:54 +0700 Subject: [PATCH 021/667] Add model 2023-11-06-bert_ner_bert_finetuned_ades_model_1_en --- ...bert_ner_bert_finetuned_ades_model_1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ades_model_1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ades_model_1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ades_model_1_en.md new file mode 100644 index 00000000000000..b7f9d02c50ab53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ades_model_1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bert_finetuned_ades_model_1 BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: bert_ner_bert_finetuned_ades_model_1 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_finetuned_ades_model_1` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ades_model_1_en_5.2.0_3.0_1699286687992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ades_model_1_en_5.2.0_3.0_1699286687992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ades_model_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_finetuned_ades_model_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ades_model_1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ajtamayoh/bert-finetuned-ADEs_model_1 \ No newline at end of file From b2d31ef7ec22e9034ac06224c5252ee569587ccb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:05:54 +0700 Subject: [PATCH 022/667] Add model 2023-11-06-bert_ner_bert_ner_cased_sonar1_nld_en --- ...6-bert_ner_bert_ner_cased_sonar1_nld_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_cased_sonar1_nld_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_cased_sonar1_nld_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_cased_sonar1_nld_en.md new file mode 100644 index 00000000000000..8a9c9973be2ad8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_cased_sonar1_nld_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from proycon) +author: John Snow Labs +name: bert_ner_bert_ner_cased_sonar1_nld +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-ner-cased-sonar1-nld` is a English model originally trained by `proycon`. + +## Predicted Entities + +`misc`, `org`, `eve`, `pro`, `loc`, `per` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_ner_cased_sonar1_nld_en_5.2.0_3.0_1699290244203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_ner_cased_sonar1_nld_en_5.2.0_3.0_1699290244203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_ner_cased_sonar1_nld","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_ner_cased_sonar1_nld","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.cased").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_ner_cased_sonar1_nld| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/proycon/bert-ner-cased-sonar1-nld \ No newline at end of file From 2701eed652e63b26b549c06dd07c8bdf9a3306e8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:06:55 +0700 Subject: [PATCH 023/667] Add model 2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en --- ...1_pubmed_finetuned_ner_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en.md new file mode 100644 index 00000000000000..b4c95407b43bb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from fidukm34) +author: John Snow Labs +name: bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biobert_v1.1_pubmed-finetuned-ner-finetuned-ner` is a English model originally trained by `fidukm34`. + +## Predicted Entities + +`Begin`, `Disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en_5.2.0_3.0_1699290236029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner_en_5.2.0_3.0_1699290236029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert.pubmed.finetuned.by_fidukm34").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_v1.1_pubmed_finetuned_ner_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/fidukm34/biobert_v1.1_pubmed-finetuned-ner-finetuned-ner \ No newline at end of file From 68d7367aebdd5cd7f8f9d87cec8e753458b87e1c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:07:55 +0700 Subject: [PATCH 024/667] Add model 2023-11-06-bert_ner_bert_finetuned_mutation_recognition_1_en --- ...ert_finetuned_mutation_recognition_1_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_1_en.md new file mode 100644 index 00000000000000..00aad0ac559587 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_mutation_recognition_1_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Salvatore) +author: John Snow Labs +name: bert_ner_bert_finetuned_mutation_recognition_1 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-mutation-recognition-1` is a English model originally trained by `Salvatore`. + +## Predicted Entities + +`SNP`, `ProteinMutation`, `DNAMutation` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_mutation_recognition_1_en_5.2.0_3.0_1699289221752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_mutation_recognition_1_en_5.2.0_3.0_1699289221752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_mutation_recognition_1","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_mutation_recognition_1","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.mutation_recognition_1.by_salvatore").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_mutation_recognition_1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Salvatore/bert-finetuned-mutation-recognition-1 \ No newline at end of file From e82fa44353582ddd5c0506b0b13b9ab90b8b1def Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:08:55 +0700 Subject: [PATCH 025/667] Add model 2023-11-06-bert_ner_bc4chemd_modified_pubmed_clinical_en --- ...er_bc4chemd_modified_pubmed_clinical_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_modified_pubmed_clinical_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_modified_pubmed_clinical_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_modified_pubmed_clinical_en.md new file mode 100644 index 00000000000000..108ad3017a3e85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_modified_pubmed_clinical_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bc4chemd_modified_pubmed_clinical BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bc4chemd_modified_pubmed_clinical +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bc4chemd_modified_pubmed_clinical` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc4chemd_modified_pubmed_clinical_en_5.2.0_3.0_1699271493242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc4chemd_modified_pubmed_clinical_en_5.2.0_3.0_1699271493242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc4chemd_modified_pubmed_clinical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bc4chemd_modified_pubmed_clinical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc4chemd_modified_pubmed_clinical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/BC4CHEMD-Modified_pubmed_clinical \ No newline at end of file From bbf2d4f7e67428068ad642fdee6f785f7f29203d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:09:55 +0700 Subject: [PATCH 026/667] Add model 2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa --- ...persian_farsi_base_uncased_ner_arman_fa.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa.md new file mode 100644 index 00000000000000..8a85947a609ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Persian bert_ner_bert_persian_farsi_base_uncased_ner_arman BertForTokenClassification from HooshvareLab +author: John Snow Labs +name: bert_ner_bert_persian_farsi_base_uncased_ner_arman +date: 2023-11-06 +tags: [bert, fa, open_source, token_classification, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_persian_farsi_base_uncased_ner_arman` is a Persian model originally trained by HooshvareLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa_5.2.0_3.0_1699288728231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_persian_farsi_base_uncased_ner_arman_fa_5.2.0_3.0_1699288728231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_persian_farsi_base_uncased_ner_arman","fa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_persian_farsi_base_uncased_ner_arman", "fa") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_persian_farsi_base_uncased_ner_arman| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/HooshvareLab/bert-fa-base-uncased-ner-arman \ No newline at end of file From ad8a0333ae60ce0d5439943fdbffe72538622d8c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:10:55 +0700 Subject: [PATCH 027/667] Add model 2023-11-06-bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en --- ...13fowl_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..e0d7e2bab8170c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from artemis13fowl) +author: John Snow Labs +name: bert_ner_artemis13fowl_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `artemis13fowl`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699282963671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_artemis13fowl_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699282963671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_artemis13fowl_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_artemis13fowl_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.accelerate.by_artemis13fowl").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_artemis13fowl_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/artemis13fowl/bert-finetuned-ner-accelerate \ No newline at end of file From 388b76c7bf56d0c14ad874309a3ca93e56029eb3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:11:56 +0700 Subject: [PATCH 028/667] Add model 2023-11-06-bert_ner_bert_title_org_en --- .../2023-11-06-bert_ner_bert_title_org_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_title_org_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_title_org_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_title_org_en.md new file mode 100644 index 00000000000000..a7a92b2abf0228 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_title_org_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from pkushiqiang) +author: John Snow Labs +name: bert_ner_bert_title_org +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-title-org` is a English model originally trained by `pkushiqiang`. + +## Predicted Entities + +`major`, `org`, `job_title`, `degree` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_title_org_en_5.2.0_3.0_1699290598864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_title_org_en_5.2.0_3.0_1699290598864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_title_org","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_title_org","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.title_org.by_pkushiqiang").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_title_org| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/pkushiqiang/bert-title-org \ No newline at end of file From cf7b11012a76da5550cc22da4841e278feafa434 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:12:56 +0700 Subject: [PATCH 029/667] Add model 2023-11-06-bert_ner_codeswitch_hineng_lid_lince_hi --- ...bert_ner_codeswitch_hineng_lid_lince_hi.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_lid_lince_hi.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_lid_lince_hi.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_lid_lince_hi.md new file mode 100644 index 00000000000000..7b08fd6b289633 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_lid_lince_hi.md @@ -0,0 +1,110 @@ +--- +layout: model +title: Hindi Named Entity Recognition (from sagorsarker) +author: John Snow Labs +name: bert_ner_codeswitch_hineng_lid_lince +date: 2023-11-06 +tags: [bert, ner, token_classification, hi, open_source, onnx] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `codeswitch-hineng-lid-lince` is a Hindi model orginally trained by `sagorsarker`. + +## Predicted Entities + +`mixed`, `hin`, `other`, `unk`, `en`, `ambiguous`, `ne`, `fw` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_hineng_lid_lince_hi_5.2.0_3.0_1699290564166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_hineng_lid_lince_hi_5.2.0_3.0_1699290564166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_hineng_lid_lince","hi") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["मुझे स्पार्क एनएलपी बहुत पसंद है"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_hineng_lid_lince","hi") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("मुझे स्पार्क एनएलपी बहुत पसंद है").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_codeswitch_hineng_lid_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|hi| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sagorsarker/codeswitch-hineng-lid-lince +- https://ritual.uh.edu/lince/home +- https://github.com/sagorbrur/codeswitch \ No newline at end of file From 8f9e68d19820d944e669151725ee2820802f0358 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:13:56 +0700 Subject: [PATCH 030/667] Add model 2023-11-06-bert_ner_bgc_accession_en --- .../2023-11-06-bert_ner_bgc_accession_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bgc_accession_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bgc_accession_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bgc_accession_en.md new file mode 100644 index 00000000000000..97466930cdbbc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bgc_accession_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Maaly) +author: John Snow Labs +name: bert_ner_bgc_accession +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bgc-accession` is a English model originally trained by `Maaly`. + +## Predicted Entities + +`bgc` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bgc_accession_en_5.2.0_3.0_1699289917120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bgc_accession_en_5.2.0_3.0_1699289917120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bgc_accession","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bgc_accession","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.bgc_accession.by_maaly").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bgc_accession| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Maaly/bgc-accession +- https://gitlab.com/maaly7/emerald_bgcs_annotations \ No newline at end of file From 8e5d7b3910eab391bac5214d1a40ae6d6f135956 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:14:56 +0700 Subject: [PATCH 031/667] Add model 2023-11-06-bert_ner_batya66_bert_finetuned_ner_en --- ...-bert_ner_batya66_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_batya66_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_batya66_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_batya66_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..3808b518cf9253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_batya66_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from batya66) +author: John Snow Labs +name: bert_ner_batya66_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `batya66`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_batya66_bert_finetuned_ner_en_5.2.0_3.0_1699285160966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_batya66_bert_finetuned_ner_en_5.2.0_3.0_1699285160966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_batya66_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_batya66_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_batya66").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_batya66_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/batya66/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 698f2fa6027cd14779f0370233a20fe8e8c0688a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:15:56 +0700 Subject: [PATCH 032/667] Add model 2023-11-06-bert_ner_craft_modified_pubmedbert_512_en --- ...rt_ner_craft_modified_pubmedbert_512_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_modified_pubmedbert_512_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_modified_pubmedbert_512_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_modified_pubmedbert_512_en.md new file mode 100644 index 00000000000000..fff65932311023 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_modified_pubmedbert_512_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_craft_modified_pubmedbert_512 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_craft_modified_pubmedbert_512 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_craft_modified_pubmedbert_512` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_craft_modified_pubmedbert_512_en_5.2.0_3.0_1699279132658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_craft_modified_pubmedbert_512_en_5.2.0_3.0_1699279132658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_craft_modified_pubmedbert_512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_craft_modified_pubmedbert_512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_craft_modified_pubmedbert_512| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/CRAFT-Modified-PubMedBERT-512 \ No newline at end of file From ae2beeb4f010ba53bd309dce8e2e9a73a59ee702 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:16:57 +0700 Subject: [PATCH 033/667] Add model 2023-11-06-bert_ner_biored_dis_original_pubmedbert_512_5_en --- ...biored_dis_original_pubmedbert_512_5_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_512_5_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_512_5_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_512_5_en.md new file mode 100644 index 00000000000000..1927a111501ff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_512_5_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biored_dis_original_pubmedbert_512_5 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_biored_dis_original_pubmedbert_512_5 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biored_dis_original_pubmedbert_512_5` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_original_pubmedbert_512_5_en_5.2.0_3.0_1699278719798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_original_pubmedbert_512_5_en_5.2.0_3.0_1699278719798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biored_dis_original_pubmedbert_512_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biored_dis_original_pubmedbert_512_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biored_dis_original_pubmedbert_512_5| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioRed-Dis-Original-PubMedBERT-512-5 \ No newline at end of file From 88e59eaaad7278acfcc9e514565aeb5466cd5afe Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:17:57 +0700 Subject: [PATCH 034/667] Add model 2023-11-06-bert_ner_tg_relation_model_en --- ...023-11-06-bert_ner_tg_relation_model_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tg_relation_model_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tg_relation_model_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tg_relation_model_en.md new file mode 100644 index 00000000000000..90d725d40bbdaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tg_relation_model_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_tg_relation_model BertForTokenClassification from alichte +author: John Snow Labs +name: bert_ner_tg_relation_model +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_tg_relation_model` is a English model originally trained by alichte. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tg_relation_model_en_5.2.0_3.0_1699283849338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tg_relation_model_en_5.2.0_3.0_1699283849338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tg_relation_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_tg_relation_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tg_relation_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/alichte/TG-Relation-Model \ No newline at end of file From 74d4880eff899a8d97fbc79667a140093fc9d93c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:18:57 +0700 Subject: [PATCH 035/667] Add model 2023-11-06-bert_ner_codeswitch_nepeng_lid_lince_en --- ...bert_ner_codeswitch_nepeng_lid_lince_en.md | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_nepeng_lid_lince_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_nepeng_lid_lince_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_nepeng_lid_lince_en.md new file mode 100644 index 00000000000000..1c7d391ccef70f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_nepeng_lid_lince_en.md @@ -0,0 +1,116 @@ +--- +layout: model +title: English Named Entity Recognition (from sagorsarker) +author: John Snow Labs +name: bert_ner_codeswitch_nepeng_lid_lince +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `codeswitch-nepeng-lid-lince` is a English model orginally trained by `sagorsarker`. + +## Predicted Entities + +`mixed`, `other`, `en`, `ambiguous`, `ne`, `nep` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_nepeng_lid_lince_en_5.2.0_3.0_1699290953334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_nepeng_lid_lince_en_5.2.0_3.0_1699290953334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_nepeng_lid_lince","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_nepeng_lid_lince","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.codeswitch_nepeng_lid_lince.by_sagorsarker").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_codeswitch_nepeng_lid_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sagorsarker/codeswitch-nepeng-lid-lince +- https://ritual.uh.edu/lince/home +- https://github.com/sagorbrur/codeswitch \ No newline at end of file From fcb6165311367550f6839a0cbf7f0c1d3133efd8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:19:57 +0700 Subject: [PATCH 036/667] Add model 2023-11-06-bert_ner_bigbio_mtl_en --- .../2023-11-06-bert_ner_bigbio_mtl_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bigbio_mtl_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bigbio_mtl_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bigbio_mtl_en.md new file mode 100644 index 00000000000000..f8de5679a54fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bigbio_mtl_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from bigscience-biomedical) +author: John Snow Labs +name: bert_ner_bigbio_mtl +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bigbio-mtl` is a English model originally trained by `bigscience-biomedical`. + +## Predicted Entities + +`medmentions_full_ner:B-T085)`, `pdr_EAE:Theme)`, `bionlp_shared_task_2009_ner:I-Entity)`, `pcr_ner:B-Herb)`, `gnormplus_ner:I-Gene)`, `bionlp_st_2013_cg_EAE:Participant)`, `pubmed_qa_labeled_fold0_CLF:yes)`, `bionlp_st_2013_gro_ner:B-Ribosome)`, `anat_em_ner:O)`, `seth_corpus_RE:Equals)`, `chemprot_RE:CPR:10)`, `medmentions_full_ner:B-T102)`, `medmentions_full_ner:I-T171)`, `medmentions_full_ner:I-T082)`, `bionlp_st_2013_cg_ED:B-Positive_regulation)`, `anat_em_ner:B-Multi-tissue_structure)`, `hprd50_ner:O)`, `bionlp_st_2013_gro_ner:B-OxidativeStress)`, `mlee_ED:I-Transcription)`, `cellfinder_ner:I-GeneProtein)`, `chia_ner:B-Reference_point)`, `medmentions_full_ner:B-T015)`, `ncbi_disease_ner:B-CompositeMention)`, `bionlp_st_2013_gro_ner:I-RNAPolymerase)`, `bionlp_st_2013_gro_ner:B-Virus)`, `bionlp_st_2013_gro_ED:B-Pathway)`, `medmentions_full_ner:B-T025)`, `chebi_nactem_abstr_ann1_ner:B-Metabolite)`, `bio_sim_verb_sts:7)`, `bionlp_st_2013_gro_ED:B-Maintenance)`, `medmentions_full_ner:I-T129)`, `scai_disease_ner:B-DISEASE)`, `chemprot_RE:CPR:9)`, `biorelex_ner:B-chemical)`, `bionlp_st_2013_gro_ED:I-TranscriptionOfGene)`, `bionlp_st_2013_gro_ED:I-BindingOfProteinToProteinBindingSiteOfProtein)`, `bionlp_st_2013_cg_ner:B-Amino_acid)`, `pubmed_qa_labeled_fold0_CLF:maybe)`, `bionlp_st_2013_gro_ner:I-Sequence)`, `pico_extraction_ner:O)`, `bc5cdr_ner:B-Chemical)`, `bionlp_st_2013_pc_ner:B-Simple_chemical)`, `bionlp_st_2011_id_ED:B-Gene_expression)`, `an_em_ner:B-Developing_anatomical_structure)`, `bionlp_st_2019_bb_ner:I-Phenotype)`, `genia_term_corpus_ner:B-DNA_family_or_group)`, `medmentions_st21pv_ner:I-T204)`, `bionlp_st_2013_gro_ner:B-bZIP)`, `bionlp_st_2013_gro_ner:I-Eukaryote)`, `bionlp_st_2013_pc_ner:I-Complex)`, `mlee_ner:I-Cell)`, `bionlp_shared_task_2009_ED:I-Localization)`, `hprd50_ner:I-protein)`, `mantra_gsc_en_patents_ner:B-PHYS)`, `bionlp_st_2013_gro_ED:B-RegulationOfGeneExpression)`, `medmentions_full_ner:B-T020)`, `genia_term_corpus_ner:B-ANDprotein_moleculeprotein_molecule)`, `bionlp_shared_task_2009_EAE:AtLoc)`, `genia_term_corpus_ner:B-protein_molecule)`, `bionlp_st_2013_gro_ner:B-Agonist)`, `mantra_gsc_en_medline_ner:B-PHEN)`, `medmentions_full_ner:B-T030)`, `biorelex_ner:I-RNA-family)`, `medmentions_full_ner:B-T169)`, `ddi_corpus_ner:B-BRAND)`, `medmentions_full_ner:B-T087)`, `genia_term_corpus_ner:I-nucleotide)`, `bionlp_st_2013_gro_ED:I-CellCyclePhaseTransition)`, `mantra_gsc_en_medline_ner:B-DEVI)`, `tmvar_v1_ner:O)`, `bionlp_st_2013_gro_ED:I-CellularComponentOrganizationAndBiogenesis)`, `bioscope_abstracts_ner:B-speculation)`, `ebm_pico_ner:B-Outcome_Adverse-effects)`, `bionlp_shared_task_2009_EAE:Site)`, `mantra_gsc_en_medline_ner:B-PHYS)`, `bionlp_st_2013_gro_ner:I-Lipid)`, `genia_term_corpus_ner:I-ANDprotein_substructureprotein_substructure)`, `medmentions_st21pv_ner:B-T007)`, `bionlp_st_2013_cg_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-Organism)`, `bc5cdr_ner:O)`, `bionlp_st_2011_id_EAE:Site)`, `bionlp_st_2013_gro_ner:I-NucleicAcid)`, `medmentions_full_ner:I-T040)`, `bionlp_st_2013_gro_ED:B-BindingOfProteinToProteinBindingSiteOfProtein)`, `mlee_ED:I-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-ExpressionProfiling)`, `medmentions_full_ner:I-T044)`, `mantra_gsc_en_emea_ner:I-DEVI)`, `chia_ner:I-Person)`, `ebm_pico_ner:B-Intervention_Pharmacological)`, `scai_disease_ner:O)`, `medmentions_full_ner:I-T121)`, `bionlp_st_2011_epi_ner:I-Entity)`, `mantra_gsc_en_emea_ner:I-ANAT)`, `genia_term_corpus_ner:B-cell_component)`, `bionlp_st_2019_bb_RE:Lives_In)`, `bionlp_st_2013_gro_ED:B-CatabolicPathway)`, `mantra_gsc_en_medline_ner:B-ANAT)`, `medmentions_full_ner:I-T065)`, `bionlp_st_2013_gro_ner:B-TranscriptionCofactor)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfDNA)`, `pdr_EAE:Cause)`, `anat_em_ner:I-Developing_anatomical_structure)`, `anat_em_ner:B-Cancer)`, `bionlp_st_2013_pc_ED:B-Gene_expression)`, `genia_term_corpus_ner:I-ORDNA_domain_or_regionDNA_domain_or_region)`, `scai_disease_ner:I-ADVERSE)`, `bionlp_st_2013_cg_ED:B-Dephosphorylation)`, `bionlp_st_2013_gro_ED:I-Heterodimerization)`, `mlee_ED:B-Catabolism)`, `biorelex_ner:I-protein-isoform)`, `bionlp_shared_task_2009_COREF:None)`, `bionlp_st_2013_gro_ED:B-RNASplicing)`, `bionlp_st_2013_gro_EAE:hasPatient)`, `mantra_gsc_en_medline_ner:I-ANAT)`, `medmentions_full_ner:I-T015)`, `bionlp_st_2013_pc_EAE:Product)`, `bionlp_st_2013_pc_EAE:AtLoc)`, `bionlp_st_2013_gro_ED:B-ProteinTargeting)`, `cellfinder_ner:B-CellComponent)`, `mantra_gsc_en_medline_ner:I-DISO)`, `bionlp_st_2013_gro_ED:I-Translation)`, `bionlp_st_2013_gro_ner:I-Prokaryote)`, `genia_term_corpus_ner:I-lipid)`, `bionlp_st_2013_pc_ED:B-Deacetylation)`, `biorelex_ner:B-RNA)`, `scai_chemical_ner:B-FAMILY)`, `bionlp_st_2013_gro_ED:I-Pathway)`, `bionlp_st_2013_gro_ner:B-ProteinIdentification)`, `bionlp_st_2011_ge_ner:O)`, `mlee_ner:B-Protein_domain_or_region)`, `bionlp_st_2011_id_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelixTF)`, `bionlp_st_2013_gro_ner:I-Chromatin)`, `mlee_ED:I-Binding)`, `mirna_ner:B-Relation_Trigger)`, `bionlp_st_2013_gro_ner:B-Nucleotide)`, `linnaeus_ner:I-species)`, `medmentions_full_ner:I-T024)`, `verspoor_2013_ner:I-body-part)`, `bionlp_st_2011_epi_EAE:Sidechain)`, `bionlp_st_2013_gro_ner:I-ReporterGeneConstruction)`, `bionlp_st_2013_gro_ner:B-DNAFragment)`, `bionlp_st_2013_gro_ner:B-PositiveTranscriptionRegulator)`, `medmentions_full_ner:I-T049)`, `medmentions_full_ner:I-T025)`, `verspoor_2013_ner:I-gene)`, `bionlp_st_2019_bb_RE:Exhibits)`, `bionlp_st_2013_cg_ED:B-Gene_expression)`, `bionlp_st_2013_ge_ner:O)`, `mlee_ner:I-Developing_anatomical_structure)`, `mlee_ED:B-Positive_regulation)`, `bionlp_st_2013_gro_ED:B-FormationOfTranscriptionInitiationComplex)`, `bionlp_st_2011_ge_ner:B-Entity)`, `ddi_corpus_ner:I-GROUP)`, `medmentions_full_ner:I-T017)`, `bionlp_st_2013_gro_ED:I-Mutation)`, `bionlp_st_2011_id_EAE:AtLoc)`, `bionlp_st_2011_ge_ED:B-Regulation)`, `bionlp_st_2011_ge_EAE:Theme)`, `bionlp_st_2013_gro_ner:I-ExperimentalMethod)`, `bionlp_st_2013_gro_ner:B-HMGTF)`, `chemdner_ner:B-Chemical)`, `ehr_rel_sts:1)`, `medmentions_full_ner:I-T196)`, `bioscope_papers_ner:B-negation)`, `bionlp_shared_task_2009_ED:I-Negative_regulation)`, `bionlp_st_2013_pc_ED:B-Phosphorylation)`, `biorelex_RE:bind)`, `bioinfer_ner:B-Protein_complex)`, `scai_chemical_ner:I-TRIVIALVAR)`, `bionlp_shared_task_2009_ED:I-Binding)`, `bionlp_st_2011_rel_ner:I-Entity)`, `anat_em_ner:B-Tissue)`, `bionlp_st_2013_cg_ED:I-Remodeling)`, `bionlp_st_2013_cg_ner:I-Cell)`, `medmentions_full_ner:I-T074)`, `sciq_SEQ:None)`, `mantra_gsc_en_medline_ner:I-PROC)`, `bionlp_st_2011_id_ED:I-Negative_regulation)`, `bionlp_st_2013_gro_ner:I-Agonist)`, `chia_ner:I-Reference_point)`, `medmentions_full_ner:B-T024)`, `bionlp_st_2013_gro_ner:B-Histone)`, `chia_ner:I-Negation)`, `lll_RE:None)`, `ncbi_disease_ner:I-DiseaseClass)`, `bionlp_st_2013_gro_ner:I-Chromosome)`, `scai_disease_ner:B-ADVERSE)`, `medmentions_full_ner:B-T130)`, `bionlp_st_2011_epi_ED:B-Catalysis)`, `bionlp_st_2011_epi_ner:O)`, `mlee_EAE:AtLoc)`, `bionlp_st_2013_gro_ED:B-RegulationOfPathway)`, `genia_term_corpus_ner:I-RNA_family_or_group)`, `biosses_sts:8)`, `bionlp_st_2013_gro_ner:I-MolecularFunction)`, `verspoor_2013_ner:B-gene)`, `an_em_ner:I-Cell)`, `bionlp_st_2011_id_ED:B-Localization)`, `bionlp_st_2011_ge_EAE:Site)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomainTF)`, `bionlp_st_2013_gro_EAE:hasAgent)`, `bionlp_st_2013_gro_ner:B-DNARegion)`, `bionlp_shared_task_2009_ED:O)`, `mlee_EAE:Cause)`, `bionlp_st_2011_epi_ED:B-Ubiquitination)`, `bionlp_st_2013_gro_ED:I-GeneExpression)`, `bionlp_st_2013_gro_ner:I-CatalyticActivity)`, `anat_em_ner:B-Anatomical_system)`, `lll_RE:genic_interaction)`, `bionlp_st_2013_gro_ner:B-Nucleus)`, `bionlp_st_2013_ge_ED:B-Acetylation)`, `ebm_pico_ner:B-Intervention_Educational)`, `medmentions_st21pv_ner:B-T005)`, `mlee_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-OrganicChemical)`, `medmentions_full_ner:I-T022)`, `gnormplus_ner:B-FamilyName)`, `bionlp_st_2013_gro_ED:I-NegativeRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:I-ChromosomalDNA)`, `anat_em_ner:B-Cell)`, `bionlp_st_2013_gro_ner:I-TranscriptionCofactor)`, `chia_ner:I-Observation)`, `bioscope_abstracts_ner:I-negation)`, `medmentions_full_ner:I-T089)`, `bionlp_st_2013_gro_ner:B-AP2EREBPRelatedDomain)`, `bionlp_st_2013_gro_ner:I-ComplexMolecularEntity)`, `bionlp_st_2013_gro_ner:B-Lipid)`, `mlee_ED:B-Death)`, `biorelex_ner:I-gene)`, `bionlp_st_2011_id_ED:I-Positive_regulation)`, `medmentions_st21pv_ner:B-T058)`, `bionlp_st_2011_id_ED:O)`, `biorelex_ner:B-protein-region)`, `bionlp_st_2011_id_ED:B-Regulation)`, `verspoor_2013_RE:relatedTo)`, `bionlp_st_2011_id_ED:I-Gene_expression)`, `genia_term_corpus_ner:B-cell_line)`, `bionlp_st_2013_gro_ner:B-UpstreamRegulatorySequence)`, `genia_term_corpus_ner:B-polynucleotide)`, `genia_term_corpus_ner:I-cell_component)`, `medmentions_full_ner:B-T013)`, `bionlp_st_2011_ge_COREF:None)`, `ebm_pico_ner:B-Participant_Sample-size)`, `bionlp_st_2013_gro_ED:B-RNAMetabolism)`, `bionlp_st_2013_gro_ner:I-RNA)`, `ddi_corpus_RE:EFFECT)`, `medmentions_st21pv_ner:B-T031)`, `bionlp_st_2013_cg_ner:I-Immaterial_anatomical_entity)`, `ebm_pico_ner:I-Intervention_Physical)`, `bionlp_st_2013_gro_ner:B-MolecularStructure)`, `bionlp_st_2013_gro_ED:B-GeneExpression)`, `bionlp_st_2013_pc_ner:B-Complex)`, `medmentions_full_ner:I-T090)`, `medmentions_st21pv_ner:I-T005)`, `bionlp_st_2013_gro_ED:B-ProteinTransport)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomainTF)`, `bionlp_st_2013_gro_ner:I-CpGIsland)`, `bionlp_st_2013_gro_ner:B-AminoAcid)`, `bionlp_st_2013_gro_ED:B-SPhase)`, `bionlp_st_2011_epi_COREF:None)`, `bionlp_st_2013_pc_ner:I-Cellular_component)`, `genia_term_corpus_ner:B-ANDDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_gro_ner:B-Chromosome)`, `medmentions_full_ner:I-T010)`, `bionlp_st_2013_gro_ner:I-OxidativeStress)`, `bionlp_st_2013_cg_ner:I-Anatomical_system)`, `bionlp_st_2013_gro_ED:B-BindingOfTFToTFBindingSiteOfDNA)`, `medmentions_st21pv_ner:I-T062)`, `medmentions_full_ner:B-T081)`, `scai_chemical_ner:B-PARTIUPAC)`, `bionlp_st_2013_gro_ner:I-RibosomalRNA)`, `verspoor_2013_ner:O)`, `bionlp_st_2011_epi_ED:B-Methylation)`, `bionlp_shared_task_2009_ner:B-Entity)`, `bionlp_st_2013_pc_ED:B-Transport)`, `bio_sim_verb_sts:3)`, `bionlp_st_2013_gro_ED:I-Elongation)`, `medmentions_full_ner:B-T058)`, `biorelex_ner:B-protein)`, `mantra_gsc_en_patents_ner:B-DEVI)`, `bionlp_st_2013_gro_ner:I-BasicDomain)`, `medmentions_full_ner:I-T071)`, `bionlp_st_2013_gro_ED:I-DevelopmentalProcess)`, `bionlp_st_2013_cg_ED:B-Catabolism)`, `mlee_ED:B-Growth)`, `mlee_EAE:Theme)`, `ebm_pico_ner:I-Intervention_Surgical)`, `bionlp_st_2011_ge_ner:I-Entity)`, `an_em_ner:I-Organ)`, `bionlp_st_2013_ge_ED:B-Positive_regulation)`, `iepa_RE:PPI)`, `bionlp_st_2013_gro_ner:B-PhysicalContinuant)`, `chemprot_RE:CPR:4)`, `bionlp_st_2011_id_EAE:Theme)`, `bionlp_st_2013_cg_ED:B-Amino_acid_catabolism)`, `genia_term_corpus_ner:B-other_name)`, `medmentions_full_ner:I-T130)`, `bionlp_st_2011_id_ED:I-Process)`, `mantra_gsc_en_patents_ner:O)`, `bionlp_st_2013_pc_ED:B-Ubiquitination)`, `medmentions_full_ner:B-T018)`, `bionlp_st_2011_id_EAE:ToLoc)`, `bionlp_st_2013_cg_ner:B-Organism)`, `medmentions_full_ner:B-T014)`, `bionlp_st_2013_pc_ED:I-Activation)`, `mlee_ED:I-Death)`, `medmentions_full_ner:I-T047)`, `bionlp_st_2011_ge_EAE:ToLoc)`, `bionlp_st_2013_cg_ED:I-Gene_expression)`, `bionlp_st_2013_gro_ner:B-AntisenseRNA)`, `bionlp_st_2013_gro_ner:B-ProteinCodingDNARegion)`, `bionlp_st_2013_gro_ED:I-BindingOfTFToTFBindingSiteOfDNA)`, `bionlp_st_2013_pc_ED:B-Methylation)`, `bionlp_st_2013_gro_ED:B-GeneMutation)`, `mlee_EAE:None)`, `bionlp_shared_task_2009_EAE:CSite)`, `chebi_nactem_fullpaper_ner:I-Protein)`, `genia_term_corpus_ner:I-multi_cell)`, `bionlp_st_2013_cg_ED:B-Cell_division)`, `ncbi_disease_ner:B-DiseaseClass)`, `bionlp_st_2013_gro_ner:I-Gene)`, `ebm_pico_ner:B-Intervention_Surgical)`, `medmentions_full_ner:B-T042)`, `medmentions_full_ner:I-T051)`, `cellfinder_ner:B-GeneProtein)`, `bionlp_st_2011_id_COREF:None)`, `biorelex_ner:I-brand)`, `bionlp_st_2013_gro_ner:B-CatalyticActivity)`, `chebi_nactem_abstr_ann1_ner:I-Biological_Activity)`, `bionlp_st_2013_gro_ED:B-OrganismalProcess)`, `bionlp_st_2013_gro_EAE:hasAgent2)`, `chebi_nactem_abstr_ann1_ner:I-Species)`, `bionlp_st_2013_pc_ED:B-Deubiquitination)`, `bionlp_st_2013_gro_ner:I-GeneProduct)`, `mayosrs_sts:6)`, `anat_em_ner:B-Immaterial_anatomical_entity)`, `bio_sim_verb_sts:1)`, `bionlp_st_2011_epi_ner:B-Entity)`, `medmentions_full_ner:I-T169)`, `bionlp_st_2013_gro_ner:B-bZIPTF)`, `mlee_ner:B-Immaterial_anatomical_entity)`, `an_em_RE:None)`, `verspoor_2013_ner:B-Physiology)`, `sciq_SEQ:answer)`, `cellfinder_ner:I-CellType)`, `mlee_RE:frag)`, `medmentions_st21pv_ner:I-T103)`, `ddi_corpus_RE:None)`, `bionlp_st_2013_gro_ner:I-AntisenseRNA)`, `medmentions_st21pv_ner:I-T091)`, `bionlp_st_2011_epi_EAE:Cause)`, `bionlp_st_2013_gro_ED:I-BindingToRNA)`, `bionlp_st_2013_gro_ED:I-PositiveRegulationOfTranscription)`, `bionlp_st_2013_pc_COREF:coref)`, `medmentions_full_ner:I-T067)`, `medmentions_full_ner:B-T005)`, `bionlp_st_2013_gro_ED:I-CellularMetabolicProcess)`, `bionlp_st_2011_epi_ED:B-Acetylation)`, `osiris_ner:B-variant)`, `ncbi_disease_ner:O)`, `spl_adr_200db_train_ner:I-DrugClass)`, `mantra_gsc_en_patents_ner:I-CHEM)`, `bionlp_st_2013_gro_ED:B-CellHomeostasis)`, `mayosrs_sts:2)`, `mirna_ner:I-Species)`, `bionlp_st_2013_cg_ED:B-Reproduction)`, `medmentions_full_ner:I-T102)`, `medmentions_st21pv_ner:I-T033)`, `medmentions_full_ner:B-T097)`, `bionlp_st_2013_pc_ED:I-Negative_regulation)`, `bionlp_st_2013_gro_ED:B-Dimerization)`, `ebm_pico_ner:I-Participant_Age)`, `medmentions_full_ner:B-T095)`, `bionlp_st_2013_gro_ED:B-RegulationOfProcess)`, `medmentions_full_ner:B-T002)`, `bionlp_st_2013_gro_ED:B-Binding)`, `bionlp_st_2013_gro_ED:B-BindingOfProtein)`, `verspoor_2013_ner:I-Concepts_Ideas)`, `bionlp_st_2011_epi_ner:I-Protein)`, `ddi_corpus_ner:O)`, `bionlp_st_2013_gro_ED:I-RNAMetabolism)`, `an_em_ner:I-Multi-tissue_structure)`, `medmentions_full_ner:B-T062)`, `genia_term_corpus_ner:I-ANDDNA_family_or_groupDNA_family_or_group)`, `medmentions_full_ner:I-T080)`, `ebm_pico_ner:B-Outcome_Physical)`, `medmentions_st21pv_ner:B-T103)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactor)`, `chia_ner:I-Qualifier)`, `genia_term_corpus_ner:B-protein_domain_or_region)`, `bionlp_st_2013_gro_ED:B-IntraCellularTransport)`, `bionlp_st_2013_gro_ner:I-ThreeDimensionalMolecularStructure)`, `bionlp_st_2013_gro_ner:I-TranscriptionCoactivator)`, `an_em_ner:I-Immaterial_anatomical_entity)`, `chebi_nactem_fullpaper_ner:I-Chemical)`, `mantra_gsc_en_emea_ner:B-PROC)`, `biosses_sts:5)`, `bionlp_st_2013_cg_ner:B-Cancer)`, `genia_term_corpus_ner:B-BUT_NOTother_nameother_name)`, `bionlp_st_2013_gro_ED:I-CellDivision)`, `bionlp_st_2013_gro_ED:I-TranscriptionTermination)`, `bionlp_st_2013_cg_ED:B-Acetylation)`, `mlee_ED:I-Localization)`, `ehr_rel_sts:2)`, `biorelex_ner:I-protein-DNA-complex)`, `bionlp_st_2011_id_COREF:coref)`, `bioinfer_RE:None)`, `nlm_gene_ner:B-Gene)`, `medmentions_full_ner:B-T104)`, `biosses_sts:6)`, `bionlp_st_2013_gro_ner:B-ReporterGene)`, `biosses_sts:1)`, `biorelex_ner:I-organism)`, `chia_ner:B-Value)`, `cellfinder_ner:B-Anatomy)`, `bionlp_st_2013_gro_ED:I-RegulatoryProcess)`, `verspoor_2013_ner:B-body-part)`, `bionlp_st_2013_gro_ED:I-Localization)`, `biorelex_ner:B-RNA-family)`, `ebm_pico_ner:B-Intervention_Control)`, `bionlp_st_2013_cg_ED:B-Binding)`, `bionlp_st_2013_gro_ED:B-BindingOfProteinToDNA)`, `bionlp_st_2013_ge_EAE:Cause)`, `chemprot_RE:CPR:3)`, `chia_RE:Has_mood)`, `pico_extraction_ner:I-outcome)`, `medmentions_st21pv_ner:B-T074)`, `bionlp_st_2013_cg_ner:I-Amino_acid)`, `bionlp_st_2013_cg_ED:B-Protein_processing)`, `bionlp_st_2013_cg_ED:B-Regulation)`, `medmentions_full_ner:B-T197)`, `bionlp_st_2013_gro_ED:I-NegativeRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_cg_ED:I-Transcription)`, `bionlp_st_2013_ge_ED:B-Gene_expression)`, `mantra_gsc_en_patents_ner:I-PHYS)`, `bionlp_st_2013_gro_ner:B-NucleicAcid)`, `bionlp_st_2013_gro_ED:B-CellDivision)`, `medmentions_st21pv_ner:I-T017)`, `bionlp_st_2011_id_EAE:CSite)`, `medmentions_full_ner:I-T046)`, `medmentions_full_ner:B-T204)`, `bionlp_st_2013_pc_ED:I-Dissociation)`, `spl_adr_200db_train_ner:B-Negation)`, `bionlp_st_2013_gro_ED:I-MetabolicPathway)`, `bionlp_st_2013_ge_ED:B-Regulation)`, `nlm_gene_ner:B-GENERIF)`, `verspoor_2013_ner:I-Disorder)`, `bionlp_st_2013_gro_ner:I-ReporterGene)`, `bionlp_st_2013_gro_ner:B-Vitamin)`, `bionlp_st_2013_cg_ner:B-Immaterial_anatomical_entity)`, `bionlp_st_2013_pc_ED:B-Acetylation)`, `chia_ner:B-Visit)`, `mantra_gsc_en_medline_ner:I-OBJC)`, `mayosrs_sts:8)`, `bionlp_st_2013_cg_ner:I-DNA_domain_or_region)`, `osiris_ner:B-gene)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressor)`, `bionlp_st_2013_cg_ED:I-Regulation)`, `bionlp_st_2013_gro_ner:I-RNAMolecule)`, `bionlp_st_2011_ge_ner:I-Protein)`, `mlee_ED:I-Regulation)`, `mlee_COREF:coref)`, `bionlp_st_2013_cg_ED:B-Metastasis)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelix)`, `bioinfer_ner:I-Gene)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivatorActivity)`, `medmentions_full_ner:I-T131)`, `genia_term_corpus_ner:B-protein_family_or_group)`, `linnaeus_filtered_ner:I-species)`, `medmentions_st21pv_ner:I-T168)`, `medmentions_full_ner:B-T123)`, `genia_term_corpus_ner:B-cell_type)`, `chebi_nactem_fullpaper_ner:B-Chemical)`, `ddi_corpus_ner:I-DRUG_N)`, `scai_chemical_ner:I-FAMILY)`, `bionlp_st_2013_gro_ner:I-Locus)`, `biorelex_ner:B-DNA)`, `mlee_EAE:FromLoc)`, `mlee_ED:B-Synthesis)`, `bionlp_st_2013_pc_ED:I-Inactivation)`, `bionlp_st_2013_gro_EAE:hasPatient2)`, `bionlp_st_2013_gro_ner:B-Transcript)`, `anat_em_ner:B-Organ)`, `chebi_nactem_abstr_ann1_ner:I-Spectral_Data)`, `anat_em_ner:I-Organism_substance)`, `spl_adr_200db_train_ner:B-DrugClass)`, `bionlp_st_2013_gro_ED:I-Splicing)`, `bionlp_st_2013_pc_ED:B-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-ProteinSubunit)`, `bionlp_st_2013_gro_ED:B-ResponseToChemicalStimulus)`, `bionlp_st_2013_gro_ner:B-MutantGene)`, `bionlp_st_2013_pc_ED:B-Binding)`, `bionlp_st_2019_bb_ner:B-Phenotype)`, `bionlp_st_2013_gro_ED:B-CellMotility)`, `diann_iber_eval_en_ner:I-Neg)`, `mantra_gsc_en_medline_ner:B-DISO)`, `mlee_ED:I-Growth)`, `ddi_corpus_ner:B-DRUG_N)`, `biorelex_ner:B-protein-domain)`, `bionlp_st_2013_gro_ner:B-Eukaryote)`, `ncbi_disease_ner:I-CompositeMention)`, `chebi_nactem_fullpaper_ner:I-Spectral_Data)`, `seth_corpus_ner:I-SNP)`, `bionlp_st_2013_gro_ED:B-Elongation)`, `bionlp_st_2013_cg_ner:B-Organ)`, `hprd50_ner:B-protein)`, `biorelex_ner:I-DNA)`, `bionlp_st_2013_gro_ED:I-CellDeath)`, `bionlp_st_2013_cg_ner:I-Organism_subdivision)`, `bionlp_st_2013_cg_ED:B-Planned_process)`, `bionlp_st_2013_cg_ner:B-Cellular_component)`, `bionlp_st_2013_pc_ner:B-Cellular_component)`, `bionlp_st_2019_bb_ner:B-Microorganism)`, `ddi_corpus_RE:INT)`, `medmentions_st21pv_ner:B-T038)`, `cellfinder_ner:B-CellLine)`, `bioinfer_ner:I-GeneproteinRNA)`, `bionlp_shared_task_2009_EAE:None)`, `bionlp_st_2011_id_ner:I-Chemical)`, `bionlp_st_2013_gro_ED:B-BindingOfTranscriptionFactorToDNA)`, `bionlp_st_2011_id_ED:B-Protein_catabolism)`, `bionlp_st_2013_cg_ED:B-Cell_differentiation)`, `bionlp_shared_task_2009_ED:B-Negative_regulation)`, `bionlp_st_2013_cg_ED:B-Ubiquitination)`, `nlm_gene_ner:O)`, `bionlp_st_2013_pc_ED:I-Regulation)`, `bionlp_st_2013_gro_ED:I-CellFateDetermination)`, `biorelex_ner:I-mutation)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorBindingSiteOfDNA)`, `mantra_gsc_en_emea_ner:I-LIVB)`, `biorelex_COREF:None)`, `bionlp_st_2013_gro_ED:I-CellHomeostasis)`, `bionlp_st_2013_gro_ner:B-PhysicalContact)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactor)`, `medmentions_full_ner:B-T167)`, `medmentions_st21pv_ner:B-T091)`, `seth_corpus_ner:I-Gene)`, `bionlp_st_2013_gro_ED:I-ProteinCatabolism)`, `ebm_pico_ner:O)`, `bionlp_st_2011_ge_COREF:coref)`, `bionlp_st_2013_gro_ner:I-bHLHTF)`, `mlee_ner:B-Organ)`, `bionlp_st_2013_gro_ED:B-BindingToMolecularEntity)`, `pdr_ED:I-Cause_of_disease)`, `bionlp_st_2011_epi_ED:B-Glycosylation)`, `medmentions_full_ner:B-T031)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorComplex)`, `biorelex_ner:B-disease)`, `chebi_nactem_fullpaper_ner:I-Biological_Activity)`, `medmentions_st21pv_ner:I-T092)`, `bionlp_st_2013_cg_COREF:coref)`, `medmentions_full_ner:B-T168)`, `pcr_ner:I-Chemical)`, `mlee_ED:B-Dissociation)`, `genia_relation_corpus_RE:None)`, `medmentions_full_ner:B-T092)`, `genia_term_corpus_ner:I-ANDDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_gro_ED:I-FormationOfProteinDNAComplex)`, `mlee_ED:B-Development)`, `medmentions_full_ner:I-T032)`, `bionlp_st_2013_gro_ED:I-RNASplicing)`, `medmentions_full_ner:I-T167)`, `genia_term_corpus_ner:B-protein_NA)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivator)`, `bionlp_st_2013_ge_ner:B-Entity)`, `chemprot_RE:CPR:5)`, `bionlp_shared_task_2009_ED:I-Transcription)`, `an_em_ner:B-Multi-tissue_structure)`, `minimayosrs_sts:2)`, `chia_ner:I-Measurement)`, `chia_RE:Has_temporal)`, `bionlp_shared_task_2009_EAE:Cause)`, `bionlp_st_2013_gro_ED:B-RegulationOfTranscription)`, `biorelex_ner:B-protein-DNA-complex)`, `cellfinder_ner:I-CellComponent)`, `bionlp_st_2013_gro_ED:B-MolecularInteraction)`, `bionlp_st_2013_cg_ED:B-Transcription)`, `medmentions_full_ner:I-UnknownType)`, `mlee_EAE:Site)`, `bionlp_st_2013_gro_ED:I-Homodimerization)`, `bionlp_st_2013_gro_ner:I-Phenotype)`, `chemprot_ner:I-GENE-N)`, `nlm_gene_ner:B-Other)`, `biorelex_ner:B-reagent)`, `genia_term_corpus_ner:B-ANDDNA_family_or_groupDNA_family_or_group)`, `medmentions_full_ner:I-T019)`, `bionlp_st_2013_gro_ner:B-DNABindingSite)`, `nlmchem_ner:O)`, `biorelex_ner:B-organism)`, `chebi_nactem_abstr_ann1_ner:B-Spectral_Data)`, `bionlp_st_2013_cg_ner:I-Multi-tissue_structure)`, `ebm_pico_ner:I-Outcome_Mental)`, `medmentions_full_ner:B-T010)`, `scai_disease_ner:I-DISEASE)`, `mantra_gsc_en_medline_ner:I-GEOG)`, `scai_chemical_ner:B-IUPAC)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfProtein)`, `chebi_nactem_fullpaper_ner:O)`, `verspoor_2013_ner:B-mutation)`, `biorelex_ner:B-protein-isoform)`, `chemprot_ner:I-GENE-Y)`, `bionlp_st_2013_cg_EAE:CSite)`, `medmentions_full_ner:I-T095)`, `bionlp_st_2013_gro_ED:B-ResponseProcess)`, `mirna_ner:I-Diseases)`, `bionlp_st_2013_gro_ner:I-DNABindingSite)`, `an_em_ner:O)`, `biorelex_ner:O)`, `seth_corpus_RE:AssociatedTo)`, `mlee_EAE:Participant)`, `mlee_ED:B-Negative_regulation)`, `bioscope_abstracts_ner:B-negation)`, `chebi_nactem_fullpaper_ner:I-Metabolite)`, `bionlp_st_2011_epi_ED:B-Demethylation)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressorActivity)`, `bionlp_shared_task_2009_ner:O)`, `bionlp_shared_task_2009_EAE:Theme)`, `mlee_ED:B-Protein_processing)`, `medmentions_full_ner:B-T029)`, `medmentions_st21pv_ner:I-T058)`, `bionlp_st_2011_ge_ner:B-Protein)`, `bionlp_st_2013_ge_ner:B-Protein)`, `scicite_TEXT:background)`, `medmentions_full_ner:I-T029)`, `bionlp_st_2013_ge_ED:B-Negative_regulation)`, `genia_term_corpus_ner:B-ANDcell_typecell_type)`, `bionlp_st_2013_gro_ner:I-Tissue)`, `genia_term_corpus_ner:I-protein_substructure)`, `bionlp_st_2013_gro_ner:I-TranslationFactor)`, `scai_chemical_ner:B-SUM)`, `bionlp_st_2011_ge_ED:I-Gene_expression)`, `minimayosrs_sts:5)`, `medmentions_full_ner:B-T082)`, `bionlp_st_2011_epi_ED:B-Dehydroxylation)`, `genia_term_corpus_ner:B-mono_cell)`, `bionlp_st_2013_gro_ner:B-DNA)`, `medmentions_full_ner:I-T200)`, `medmentions_full_ner:I-T114)`, `ncbi_disease_ner:I-Modifier)`, `bionlp_st_2013_cg_EAE:Theme)`, `medmentions_full_ner:B-T079)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndRNA)`, `genetaggold_ner:B-NEWGENE)`, `mlee_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_ED:B-PositiveRegulation)`, `medmentions_full_ner:B-T196)`, `bio_sim_verb_sts:4)`, `bionlp_st_2013_gro_ner:B-Microorganism)`, `bionlp_st_2013_pc_ED:I-Binding)`, `biorelex_ner:B-process)`, `bionlp_st_2013_gro_RE:encodes)`, `biorelex_ner:B-fusion-protein)`, `mirna_ner:I-Non-Specific_miRNAs)`, `biorelex_ner:B-amino-acid)`, `bionlp_st_2013_ge_ED:I-Protein_catabolism)`, `bioinfer_ner:I-DNA_family_or_group)`, `mlee_COREF:None)`, `bionlp_st_2013_cg_ED:I-Positive_regulation)`, `mlee_ED:B-DNA_methylation)`, `bionlp_st_2013_gro_ner:I-Chemical)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfProtein)`, `mantra_gsc_en_patents_ner:I-DEVI)`, `bionlp_st_2013_gro_ED:B-CellGrowth)`, `mantra_gsc_en_medline_ner:O)`, `medmentions_full_ner:B-T043)`, `chemprot_RE:CPR:7)`, `bionlp_st_2013_gro_ED:B-Heterodimerization)`, `chia_ner:I-Value)`, `medmentions_full_ner:B-T046)`, `medmentions_full_ner:I-T048)`, `bionlp_st_2013_cg_EAE:Site)`, `gnormplus_ner:O)`, `chemprot_ner:B-GENE-Y)`, `bionlp_st_2013_gro_ED:I-SignalingPathway)`, `scicite_TEXT:result)`, `bionlp_st_2011_id_ner:I-Regulon-operon)`, `bionlp_st_2013_gro_ED:B-BindingOfDNABindingDomainOfProteinToDNA)`, `cellfinder_ner:I-CellLine)`, `ebm_pico_ner:I-Outcome_Adverse-effects)`, `medmentions_full_ner:I-T116)`, `bionlp_st_2013_gro_ner:I-DNABindingDomainOfProtein)`, `genia_term_corpus_ner:I-protein_domain_or_region)`, `bionlp_st_2013_gro_ner:B-Nucleosome)`, `medmentions_st21pv_ner:B-T168)`, `chemprot_ner:B-CHEMICAL)`, `bionlp_st_2013_gro_ED:I-CatabolicPathway)`, `bioinfer_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-bZIPTF)`, `genia_term_corpus_ner:B-body_part)`, `mirna_ner:I-GenesProteins)`, `chebi_nactem_abstr_ann1_ner:B-Protein)`, `an_em_ner:B-Organ)`, `bionlp_st_2013_ge_ED:I-Negative_regulation)`, `genia_term_corpus_ner:B-ANDprotein_family_or_groupprotein_family_or_group)`, `biorelex_ner:I-process)`, `mlee_ner:B-Tissue)`, `medmentions_full_ner:B-T041)`, `mlee_ner:I-Tissue)`, `bionlp_st_2013_gro_RE:hasFunction)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorActivity)`, `bionlp_st_2011_ge_ED:B-Negative_regulation)`, `biorelex_ner:B-protein-family)`, `bionlp_st_2011_epi_ED:I-Deacetylation)`, `ebm_pico_ner:I-Participant_Condition)`, `genia_term_corpus_ner:B-DNA_domain_or_region)`, `medmentions_full_ner:B-T125)`, `bionlp_st_2013_gro_ED:B-DevelopmentalProcess)`, `bionlp_st_2013_ge_ED:I-Ubiquitination)`, `bionlp_st_2013_gro_ED:B-Cleavage)`, `bionlp_st_2013_gro_ner:I-TATAbox)`, `bionlp_st_2013_cg_ner:B-Gene_or_gene_product)`, `cellfinder_ner:O)`, `bionlp_st_2013_gro_ED:B-CellularComponentOrganizationAndBiogenesis)`, `bionlp_st_2013_ge_ED:I-Regulation)`, `bionlp_st_2013_gro_ner:I-MutatedProtein)`, `bionlp_st_2013_gro_ner:I-bZIP)`, `spl_adr_200db_train_ner:O)`, `bionlp_st_2013_gro_ner:B-LivingEntity)`, `bionlp_st_2011_ge_ED:B-Protein_catabolism)`, `bionlp_st_2013_pc_ED:B-Conversion)`, `mantra_gsc_en_medline_ner:B-CHEM)`, `medmentions_full_ner:I-T026)`, `chebi_nactem_abstr_ann1_ner:I-Protein)`, `medmentions_full_ner:I-T085)`, `bionlp_st_2013_cg_ner:I-Organism_substance)`, `medmentions_full_ner:I-T045)`, `medmentions_full_ner:B-T067)`, `tmvar_v1_ner:B-SNP)`, `biorelex_ner:I-drug)`, `bionlp_st_2013_gro_ner:B-ExperimentalMethod)`, `bionlp_st_2013_cg_ED:I-Cell_death)`, `bionlp_st_2013_pc_ED:B-Hydroxylation)`, `bionlp_st_2013_gro_ner:B-ReporterGeneConstruction)`, `bionlp_st_2013_gro_ED:B-CellularDevelopmentalProcess)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivator)`, `bionlp_st_2013_gro_ED:I-CellCycle)`, `mantra_gsc_en_emea_ner:B-LIVB)`, `verspoor_2013_ner:B-disease)`, `mantra_gsc_en_patents_ner:B-PROC)`, `bc5cdr_ner:I-Chemical)`, `medmentions_full_ner:I-T056)`, `nlm_gene_ner:I-STARGENE)`, `medmentions_full_ner:B-T050)`, `scai_chemical_ner:B-TRIVIALVAR)`, `bionlp_st_2013_gro_ner:B-MolecularFunction)`, `medmentions_full_ner:B-T090)`, `bionlp_st_2013_pc_EAE:Theme)`, `bionlp_st_2013_gro_ED:B-CellCyclePhaseTransition)`, `chebi_nactem_fullpaper_ner:I-Species)`, `medmentions_full_ner:B-T170)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomain)`, `medmentions_full_ner:B-T060)`, `mlee_ED:I-Development)`, `medmentions_full_ner:I-T060)`, `bionlp_st_2013_gro_ner:B-Cell)`, `medmentions_full_ner:I-T037)`, `bionlp_st_2013_gro_ED:B-CellDeath)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelix)`, `bionlp_st_2013_gro_ner:B-InorganicChemical)`, `medmentions_full_ner:B-T037)`, `bionlp_st_2013_cg_ner:B-Organism_subdivision)`, `genia_term_corpus_ner:B-RNA_NA)`, `bionlp_st_2013_cg_ED:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ED:B-CellDifferentiation)`, `genia_term_corpus_ner:I-DNA_molecule)`, `bionlp_st_2013_gro_ED:B-IntraCellularProcess)`, `bionlp_st_2013_gro_ner:I-MessengerRNA)`, `bionlp_st_2013_pc_ED:B-Pathway)`, `medmentions_full_ner:I-T086)`, `bionlp_st_2013_ge_ED:I-Transcription)`, `bionlp_st_2019_bb_ner:O)`, `medmentions_full_ner:I-T001)`, `minimayosrs_sts:6)`, `medmentions_full_ner:I-T020)`, `an_em_RE:Part-of)`, `bionlp_shared_task_2009_ner:I-Protein)`, `an_em_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-Spliceosome)`, `chebi_nactem_fullpaper_ner:B-Species)`, `mirna_ner:O)`, `bioinfer_RE:PPI)`, `bionlp_st_2013_cg_ner:B-Protein_domain_or_region)`, `anat_em_ner:B-Organism_substance)`, `bionlp_st_2013_gro_ED:I-IntraCellularProcess)`, `bioscope_papers_ner:I-speculation)`, `ddi_corpus_ner:B-DRUG)`, `medmentions_full_ner:I-T078)`, `bionlp_st_2013_gro_ner:I-HMGTF)`, `medmentions_full_ner:B-T053)`, `bionlp_st_2013_gro_ner:B-HomeoBox)`, `minimayosrs_sts:3)`, `mlee_ner:B-Multi-tissue_structure)`, `biosses_sts:4)`, `mlee_ED:I-Gene_expression)`, `medmentions_full_ner:B-T004)`, `chia_ner:I-Drug)`, `bionlp_st_2013_gro_ner:B-FusionOfGeneWithReporterGene)`, `genia_term_corpus_ner:I-cell_line)`, `ddi_corpus_RE:ADVISE)`, `bioscope_abstracts_ner:I-speculation)`, `chebi_nactem_abstr_ann1_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-ExpressionProfiling)`, `medmentions_full_ner:B-T016)`, `bionlp_st_2013_gro_ner:I-Holoenzyme)`, `bionlp_st_2013_gro_ED:B-TranscriptionTermination)`, `bionlp_st_2013_cg_ner:I-Organ)`, `tmvar_v1_ner:B-DNAMutation)`, `bionlp_st_2013_ge_EAE:CSite)`, `genia_term_corpus_ner:B-RNA_substructure)`, `medmentions_full_ner:I-T170)`, `medmentions_full_ner:B-T093)`, `genia_term_corpus_ner:I-inorganic)`, `bionlp_st_2013_gro_ner:B-bHLH)`, `mlee_ED:B-Cell_proliferation)`, `bionlp_st_2013_gro_RE:hasPart)`, `bionlp_st_2013_cg_ED:B-Pathway)`, `bionlp_st_2013_gro_ner:B-BasicDomain)`, `bionlp_st_2013_gro_ED:I-PositiveRegulationOfGeneExpression)`, `mayosrs_sts:4)`, `medmentions_st21pv_ner:B-T037)`, `an_em_ner:B-Anatomical_system)`, `bionlp_st_2013_gro_ner:B-Conformation)`, `bionlp_st_2013_gro_ner:I-GeneRegion)`, `bionlp_st_2013_gro_ED:I-PosttranslationalModification)`, `genia_term_corpus_ner:I-RNA_NA)`, `bionlp_st_2011_ge_EAE:Cause)`, `medmentions_full_ner:B-T019)`, `medmentions_full_ner:I-T069)`, `scai_chemical_ner:B-TRIVIAL)`, `bionlp_st_2013_ge_ED:I-Protein_modification)`, `bionlp_st_2013_pc_ED:B-Degradation)`, `mlee_ner:B-Gene_or_gene_product)`, `bionlp_st_2013_gro_ED:I-Phosphorylation)`, `biosses_sts:3)`, `mlee_ED:B-Acetylation)`, `mlee_ED:I-Negative_regulation)`, `bionlp_st_2013_ge_ED:B-Protein_catabolism)`, `bionlp_st_2013_gro_ner:B-Promoter)`, `bionlp_shared_task_2009_ED:I-Phosphorylation)`, `medmentions_full_ner:B-T195)`, `bionlp_st_2013_cg_ED:I-Binding)`, `bionlp_st_2011_id_ner:I-Organism)`, `medmentions_full_ner:I-T073)`, `bionlp_st_2013_gro_ner:I-OrganicChemical)`, `ebm_pico_ner:B-Participant_Age)`, `verspoor_2013_ner:B-Concepts_Ideas)`, `biosses_sts:2)`, `bionlp_st_2013_cg_ED:B-Remodeling)`, `bionlp_st_2013_gro_ner:B-tRNA)`, `medmentions_full_ner:I-T043)`, `an_em_COREF:None)`, `bionlp_st_2011_epi_ED:B-Hydroxylation)`, `mlee_ner:I-Immaterial_anatomical_entity)`, `bionlp_st_2013_ge_ED:B-Ubiquitination)`, `medmentions_full_ner:B-T065)`, `bionlp_st_2019_bb_RE:None)`, `bionlp_st_2013_gro_ED:B-CellAging)`, `mlee_ED:B-Phosphorylation)`, `bionlp_st_2013_gro_ED:I-PositiveRegulationOfTranscriptionOfGene)`, `ebm_pico_ner:I-Participant_Sample-size)`, `biorelex_COREF:coref)`, `bionlp_shared_task_2009_ED:I-Protein_catabolism)`, `bionlp_st_2013_gro_ner:I-DNAMolecule)`, `bionlp_st_2013_gro_ner:I-Enzyme)`, `genia_term_corpus_ner:I-protein_family_or_group)`, `genia_term_corpus_ner:I-ANDprotein_moleculeprotein_molecule)`, `biorelex_ner:B-gene)`, `bionlp_st_2013_gro_ED:I-ProteinTransport)`, `bionlp_st_2013_gro_ED:B-MolecularProcess)`, `chebi_nactem_abstr_ann1_ner:O)`, `bionlp_st_2013_gro_ED:B-BindingOfProteinToProteinBindingSiteOfDNA)`, `chemprot_RE:None)`, `bionlp_st_2013_pc_ner:O)`, `mayosrs_sts:7)`, `bionlp_st_2013_pc_ED:B-Negative_regulation)`, `bionlp_st_2013_gro_ner:B-Sequence)`, `medmentions_full_ner:B-T103)`, `bionlp_st_2013_gro_ner:B-Gene)`, `chia_ner:B-Observation)`, `chia_ner:B-Scope)`, `an_em_COREF:coref)`, `ebm_pico_ner:B-Participant_Sex)`, `mlee_ED:B-Regulation)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndDNA)`, `bionlp_st_2013_gro_ner:B-Phenotype)`, `verspoor_2013_ner:I-age)`, `medmentions_full_ner:B-T120)`, `bionlp_st_2011_epi_ED:B-Deacetylation)`, `bionlp_st_2013_gro_ner:B-Tissue)`, `bionlp_st_2013_gro_ner:B-MolecularEntity)`, `bionlp_st_2013_ge_ED:I-Binding)`, `biorelex_ner:I-peptide)`, `medmentions_st21pv_ner:I-T097)`, `iepa_RE:None)`, `medmentions_full_ner:B-T001)`, `bionlp_shared_task_2009_ED:I-Regulation)`, `bionlp_st_2013_gro_ner:B-FusionProtein)`, `medmentions_full_ner:I-T194)`, `biorelex_ner:B-cell)`, `medmentions_full_ner:I-T096)`, `chebi_nactem_fullpaper_ner:I-Chemical_Structure)`, `medmentions_full_ner:I-T018)`, `medmentions_full_ner:B-T201)`, `chia_RE:None)`, `medmentions_full_ner:B-T054)`, `biorelex_RE:None)`, `ebm_pico_ner:I-Intervention_Pharmacological)`, `bionlp_st_2013_gro_ED:I-CellDifferentiation)`, `bionlp_st_2013_cg_ED:I-Cell_proliferation)`, `bionlp_st_2013_gro_EAE:hasPatient4)`, `bionlp_st_2011_id_EAE:Participant)`, `bionlp_st_2013_gro_ner:B-Substrate)`, `bionlp_st_2011_ge_ED:B-Transcription)`, `verspoor_2013_ner:B-cohort-patient)`, `ebm_pico_ner:B-Outcome_Other)`, `biorelex_ner:B-protein-motif)`, `bionlp_st_2013_gro_ner:B-Ion)`, `mlee_ED:B-Translation)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomain)`, `ebm_pico_ner:B-Participant_Condition)`, `bionlp_st_2011_ge_ED:B-Phosphorylation)`, `nlm_gene_ner:I-Gene)`, `bionlp_st_2013_gro_ner:B-Locus)`, `bionlp_st_2013_gro_ner:B-SecondMessenger)`, `bionlp_st_2013_cg_ED:B-Infection)`, `bionlp_st_2011_epi_EAE:Contextgene)`, `chia_ner:B-Drug)`, `bionlp_st_2019_bb_ner:I-Habitat)`, `bionlp_shared_task_2009_COREF:coref)`, `bionlp_st_2013_gro_ner:I-MolecularEntity)`, `mlee_ner:B-Cellular_component)`, `genia_term_corpus_ner:B-other_organic_compound)`, `bionlp_st_2013_gro_ED:I-CellAdhesion)`, `anat_em_ner:B-Cellular_component)`, `bionlp_st_2013_gro_ED:B-ProteinMetabolism)`, `seth_corpus_ner:B-SNP)`, `pcr_ner:O)`, `bionlp_st_2013_gro_ED:I-CellCyclePhase)`, `mlee_ner:B-DNA_domain_or_region)`, `mantra_gsc_en_emea_ner:B-PHYS)`, `bionlp_st_2013_cg_ner:B-Multi-tissue_structure)`, `genia_term_corpus_ner:I-virus)`, `bionlp_shared_task_2009_ED:I-Positive_regulation)`, `medmentions_full_ner:I-T122)`, `mantra_gsc_en_patents_ner:B-DISO)`, `bionlp_st_2013_gro_ner:B-Heterochromatin)`, `genia_term_corpus_ner:O)`, `mlee_ED:I-Positive_regulation)`, `an_em_ner:B-Cell)`, `bionlp_st_2013_cg_ner:B-Simple_chemical)`, `bionlp_st_2013_gro_ner:I-Peptide)`, `chemprot_RE:CPR:6)`, `chebi_nactem_abstr_ann1_ner:B-Chemical)`, `genia_term_corpus_ner:I-cell_type)`, `genia_term_corpus_ner:I-other_name)`, `bionlp_st_2013_cg_EAE:FromLoc)`, `bionlp_st_2013_gro_ner:B-RNAMolecule)`, `bionlp_st_2013_gro_ner:B-SequenceHomologyAnalysis)`, `medmentions_full_ner:I-T042)`, `tmvar_v1_ner:B-ProteinMutation)`, `pdr_ner:O)`, `bionlp_st_2013_gro_ED:B-MetabolicPathway)`, `medmentions_full_ner:I-T057)`, `bionlp_st_2011_ge_EAE:CSite)`, `bionlp_st_2013_gro_ED:B-BindingToProtein)`, `verspoor_2013_ner:B-size)`, `mlee_ED:B-Transcription)`, `bionlp_st_2013_gro_ner:I-BindingSiteOfProtein)`, `bionlp_st_2011_id_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:I-Ribosome)`, `verspoor_2013_ner:B-Phenomena)`, `medmentions_st21pv_ner:B-T017)`, `medmentions_full_ner:B-T028)`, `chia_ner:B-Temporal)`, `chia_ner:I-Temporal)`, `biorelex_ner:B-assay)`, `bionlp_st_2013_cg_ED:I-Pathway)`, `genia_term_corpus_ner:B-tissue)`, `nlmchem_ner:I-Chemical)`, `mirna_ner:I-Specific_miRNAs)`, `bionlp_st_2013_cg_ED:B-Negative_regulation)`, `medmentions_full_ner:I-T012)`, `mlee_ner:B-Organism_substance)`, `bionlp_st_2013_gro_ner:B-TranscriptionCoactivator)`, `genia_term_corpus_ner:I-tissue)`, `genia_term_corpus_ner:B-amino_acid_monomer)`, `mantra_gsc_en_patents_ner:I-ANAT)`, `medmentions_st21pv_ner:I-T082)`, `mantra_gsc_en_emea_ner:B-DEVI)`, `bionlp_st_2013_gro_RE:None)`, `medmentions_full_ner:I-T052)`, `bionlp_st_2011_ge_ED:I-Phosphorylation)`, `mqp_sts:3)`, `bionlp_st_2013_cg_ED:B-Glycosylation)`, `an_em_ner:B-Immaterial_anatomical_entity)`, `bionlp_st_2013_gro_ner:B-Chemical)`, `bionlp_st_2013_gro_ED:B-GeneSilencing)`, `bionlp_shared_task_2009_ED:B-Transcription)`, `genia_term_corpus_ner:B-other_artificial_source)`, `medmentions_full_ner:B-T072)`, `mantra_gsc_en_medline_ner:B-GEOG)`, `mirna_ner:B-Specific_miRNAs)`, `medmentions_full_ner:B-T190)`, `medmentions_full_ner:I-T031)`, `bionlp_st_2013_gro_ED:B-TranscriptionInitiation)`, `bionlp_st_2013_gro_ner:I-DoubleStrandDNA)`, `bionlp_st_2013_gro_ED:B-Translation)`, `scai_chemical_ner:I-IUPAC)`, `chemdner_ner:O)`, `bionlp_st_2013_gro_ED:B-G1Phase)`, `genia_term_corpus_ner:B-peptide)`, `bionlp_st_2013_gro_ED:B-PosttranslationalModification)`, `bionlp_st_2011_epi_EAE:Site)`, `an_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_cg_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_EAE:hasPatient3)`, `bionlp_st_2013_gro_ner:B-MessengerRNA)`, `medmentions_full_ner:B-T171)`, `bionlp_st_2013_ge_EAE:Theme2)`, `bionlp_st_2013_gro_ner:B-RNA)`, `genia_term_corpus_ner:I-amino_acid_monomer)`, `an_em_ner:B-Organism_substance)`, `bionlp_st_2013_gro_ED:I-RNAProcessing)`, `genia_term_corpus_ner:I-body_part)`, `medmentions_full_ner:B-T052)`, `chia_ner:B-Procedure)`, `bionlp_st_2013_gro_ner:B-Prokaryote)`, `bionlp_st_2011_ge_ED:I-Positive_regulation)`, `medmentions_full_ner:I-T061)`, `genia_term_corpus_ner:B-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:B-T096)`, `bionlp_st_2013_cg_ED:B-DNA_demethylation)`, `bionlp_st_2011_epi_ED:B-Deubiquitination)`, `medmentions_full_ner:B-T038)`, `medmentions_full_ner:I-T109)`, `bionlp_st_2013_gro_ED:I-SPhase)`, `bionlp_st_2013_gro_ner:I-EukaryoticCell)`, `pdr_ner:I-Plant)`, `bionlp_st_2013_gro_ED:I-Binding)`, `medmentions_full_ner:I-T092)`, `mantra_gsc_en_medline_ner:I-CHEM)`, `bionlp_st_2011_id_ED:B-Phosphorylation)`, `bionlp_st_2013_cg_ED:I-Metabolism)`, `bionlp_st_2013_gro_ED:B-PositiveRegulationOfGeneExpression)`, `chebi_nactem_fullpaper_ner:B-Biological_Activity)`, `ncbi_disease_ner:B-SpecificDisease)`, `mlee_ner:B-Organism)`, `medmentions_full_ner:B-T063)`, `bionlp_st_2013_cg_ED:B-Glycolysis)`, `medmentions_full_ner:I-T168)`, `medmentions_full_ner:I-T064)`, `bionlp_st_2013_gro_ner:B-DNAMolecule)`, `mlee_ED:B-Binding)`, `bioscope_abstracts_ner:O)`, `biorelex_ner:B-protein-complex)`, `bionlp_st_2013_gro_EAE:None)`, `mantra_gsc_en_medline_ner:I-PHEN)`, `bionlp_st_2013_cg_ner:B-Pathological_formation)`, `mlee_ED:I-Cell_proliferation)`, `bionlp_st_2013_pc_ner:I-Simple_chemical)`, `anat_em_ner:I-Cancer)`, `an_em_ner:I-Anatomical_system)`, `medmentions_full_ner:I-T072)`, `bionlp_st_2013_gro_ner:B-ProteinComplex)`, `bionlp_st_2013_gro_ED:I-NegativeRegulationOfGeneExpression)`, `bio_sim_verb_sts:2)`, `bionlp_st_2013_gro_ner:B-DoubleStrandDNA)`, `medmentions_full_ner:I-T066)`, `pdr_ED:B-Treatment_of_disease)`, `seth_corpus_ner:O)`, `bionlp_st_2013_ge_EAE:ToLoc)`, `bionlp_st_2013_gro_ED:B-Localization)`, `bionlp_st_2013_gro_ner:I-Exon)`, `medmentions_full_ner:B-T070)`, `biorelex_ner:I-experiment-tag)`, `medmentions_full_ner:B-T068)`, `medmentions_full_ner:I-T034)`, `cellfinder_ner:B-Species)`, `biorelex_ner:I-protein-RNA-complex)`, `medmentions_st21pv_ner:I-T201)`, `biosses_sts:0)`, `bionlp_st_2013_cg_ner:B-Organism_substance)`, `bionlp_st_2013_gro_ner:I-FusionGene)`, `genia_term_corpus_ner:B-protein_complex)`, `mantra_gsc_en_emea_ner:B-DISO)`, `bionlp_st_2013_gro_ED:I-RegulationOfGeneExpression)`, `medmentions_full_ner:I-T125)`, `bionlp_st_2013_ge_ner:I-Entity)`, `bionlp_st_2011_rel_ner:B-Entity)`, `medmentions_st21pv_ner:I-T031)`, `medmentions_full_ner:B-T099)`, `bionlp_st_2013_gro_ner:B-TATAbox)`, `bionlp_st_2013_gro_ner:I-BindingAssay)`, `bionlp_st_2019_bb_ner:I-Microorganism)`, `medmentions_full_ner:I-T059)`, `medmentions_full_ner:B-T114)`, `medmentions_st21pv_ner:I-T022)`, `bionlp_st_2013_pc_ED:B-Inactivation)`, `spl_adr_200db_train_ner:B-Factor)`, `bionlp_st_2013_gro_ner:B-Function)`, `bionlp_st_2013_gro_ner:B-GeneRegion)`, `medmentions_full_ner:I-T033)`, `bionlp_st_2013_cg_COREF:None)`, `bionlp_st_2013_gro_ner:B-HMG)`, `bionlp_shared_task_2009_ED:B-Binding)`, `bionlp_st_2013_gro_ner:B-Operon)`, `chemprot_ner:I-CHEMICAL)`, `ebm_pico_ner:I-Outcome_Pain)`, `medmentions_full_ner:I-T053)`, `bionlp_st_2013_gro_ner:B-Protein)`, `ebm_pico_ner:I-Outcome_Physical)`, `biorelex_ner:I-organelle)`, `verspoor_2013_ner:I-cohort-patient)`, `genia_term_corpus_ner:I-ANDprotein_family_or_groupprotein_family_or_group)`, `genia_term_corpus_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfDNA)`, `bionlp_st_2013_ge_ED:B-Protein_modification)`, `bionlp_st_2011_epi_ED:B-Dephosphorylation)`, `bionlp_st_2013_gro_ner:B-RNAPolymerase)`, `an_em_ner:I-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:B-CellComponent)`, `biorelex_ner:I-chemical)`, `bionlp_st_2013_gro_ED:B-Mutation)`, `gnormplus_ner:B-DomainMotif)`, `bionlp_st_2013_gro_ner:B-Peptide)`, `bionlp_st_2013_pc_ED:B-Translation)`, `biorelex_ner:B-tissue)`, `bionlp_st_2011_ge_EAE:AtLoc)`, `biorelex_ner:I-RNA)`, `bionlp_st_2013_pc_ED:B-Regulation)`, `pico_extraction_ner:B-participant)`, `chia_RE:Has_qualifier)`, `chia_ner:I-Visit)`, `medmentions_full_ner:I-T008)`, `bionlp_st_2013_ge_ED:B-Phosphorylation)`, `medmentions_full_ner:I-T016)`, `pdr_ner:I-Disease)`, `pdr_ED:B-Cause_of_disease)`, `verspoor_2013_RE:has)`, `verspoor_2013_ner:I-ethnicity)`, `bionlp_st_2013_pc_EAE:Participant)`, `genia_term_corpus_ner:I-protein_NA)`, `ehr_rel_sts:7)`, `medmentions_full_ner:I-T079)`, `bionlp_st_2013_gro_ner:I-SmallInterferingRNA)`, `bionlp_st_2013_cg_ED:O)`, `pico_extraction_ner:I-intervention)`, `biorelex_ner:I-protein-domain)`, `chebi_nactem_abstr_ann1_ner:I-Chemical)`, `medmentions_full_ner:I-T011)`, `bionlp_st_2013_gro_ED:B-RegulationOfFunction)`, `mlee_ner:O)`, `mqp_sts:1)`, `bioscope_papers_ner:O)`, `chia_RE:Has_scope)`, `an_em_ner:I-Pathological_formation)`, `bc5cdr_ner:B-Disease)`, `gnormplus_ner:I-DomainMotif)`, `bionlp_st_2013_gro_ner:I-OpenReadingFrame)`, `mlee_ner:I-Cellular_component)`, `medmentions_full_ner:I-T195)`, `spl_adr_200db_train_ner:B-AdverseReaction)`, `bionlp_st_2011_ge_ED:B-Positive_regulation)`, `muchmore_en_ner:O)`, `bionlp_st_2013_gro_ner:I-Promoter)`, `bionlp_st_2013_gro_EAE:hasPatient5)`, `bionlp_st_2013_gro_ner:I-RegulatoryDNARegion)`, `bionlp_st_2013_gro_ner:I-RuntLikeDomain)`, `bionlp_st_2013_cg_ED:B-Carcinogenesis)`, `medmentions_full_ner:B-T040)`, `medmentions_full_ner:I-T103)`, `medmentions_st21pv_ner:I-T037)`, `mlee_EAE:ToLoc)`, `mlee_EAE:Instrument)`, `medmentions_full_ner:B-T008)`, `ebm_pico_ner:B-Intervention_Psychological)`, `bionlp_st_2013_gro_ner:B-Stress)`, `biorelex_ner:B-protein-RNA-complex)`, `bionlp_st_2013_gro_ED:B-RNAProcessing)`, `bionlp_st_2013_gro_ED:B-SignalingPathway)`, `genia_term_corpus_ner:B-multi_cell)`, `bionlp_st_2013_gro_ner:B-ChromosomalDNA)`, `anat_em_ner:I-Cellular_component)`, `spl_adr_200db_train_ner:I-Negation)`, `medmentions_full_ner:I-T087)`, `bionlp_st_2013_ge_ED:B-Deacetylation)`, `bionlp_st_2013_gro_ner:B-RegulatoryDNARegion)`, `ebm_pico_ner:B-Outcome_Pain)`, `bionlp_st_2011_ge_EAE:None)`, `bionlp_st_2013_gro_ED:I-RNABiosynthesis)`, `bionlp_st_2013_gro_ner:I-HomeoboxTF)`, `mantra_gsc_en_patents_ner:I-LIVB)`, `bionlp_st_2013_gro_ner:I-UpstreamRegulatorySequence)`, `ddi_corpus_ner:I-DRUG)`, `bionlp_st_2011_ge_ED:O)`, `mantra_gsc_en_medline_ner:B-OBJC)`, `bionlp_st_2013_gro_ED:I-ProteinBiosynthesis)`, `mayosrs_sts:3)`, `linnaeus_filtered_ner:O)`, `chia_RE:Has_multiplier)`, `bionlp_st_2011_ge_ED:B-Localization)`, `medmentions_full_ner:B-T116)`, `bionlp_st_2013_cg_EAE:ToLoc)`, `cellfinder_ner:B-CellType)`, `medmentions_full_ner:B-T007)`, `ehr_rel_sts:3)`, `anat_em_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-MutantProtein)`, `bionlp_st_2013_gro_ED:B-NegativeRegulationOfGeneExpression)`, `chemprot_ner:B-GENE-N)`, `mlee_ED:B-Blood_vessel_development)`, `medmentions_full_ner:I-T077)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressorActivity)`, `biorelex_ner:B-brand)`, `medmentions_full_ner:B-T091)`, `bionlp_st_2011_id_ED:B-Positive_regulation)`, `ebm_pico_ner:B-Outcome_Mental)`, `bionlp_st_2013_gro_ner:B-EukaryoticCell)`, `bionlp_st_2013_pc_ED:I-Positive_regulation)`, `genia_term_corpus_ner:I-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:I-T184)`, `bionlp_st_2011_id_ner:B-Protein)`, `mayosrs_sts:1)`, `mantra_gsc_en_patents_ner:B-CHEM)`, `mlee_ED:B-Ubiquitination)`, `biorelex_ner:B-mutation)`, `mantra_gsc_en_medline_ner:I-DEVI)`, `bionlp_st_2013_ge_ED:I-Positive_regulation)`, `linnaeus_ner:O)`, `bionlp_st_2013_gro_ner:B-Enzyme)`, `medmentions_st21pv_ner:B-T201)`, `medmentions_full_ner:B-T056)`, `bionlp_st_2011_id_EAE:Cause)`, `bionlp_st_2013_gro_ED:B-BindingToRNA)`, `verspoor_2013_ner:B-Disorder)`, `tmvar_v1_ner:I-DNAMutation)`, `mantra_gsc_en_patents_ner:B-OBJC)`, `medmentions_full_ner:B-T073)`, `bionlp_st_2013_gro_ED:I-CellularProcess)`, `bionlp_st_2013_gro_ED:I-NegativeRegulation)`, `anat_em_ner:I-Tissue)`, `bioinfer_ner:I-Individual_protein)`, `medmentions_full_ner:B-T191)`, `cellfinder_ner:I-Anatomy)`, `chia_ner:I-Scope)`, `ncbi_disease_ner:B-Modifier)`, `bionlp_st_2013_cg_ED:I-Growth)`, `medmentions_st21pv_ner:B-T082)`, `bionlp_st_2013_gro_ED:I-GeneSilencing)`, `mlee_ED:B-Pathway)`, `bionlp_st_2013_cg_ner:I-Cellular_component)`, `medmentions_full_ner:I-T054)`, `chia_ner:B-Condition)`, `verspoor_2013_ner:B-ethnicity)`, `genia_term_corpus_ner:I-carbohydrate)`, `mlee_ner:B-Developing_anatomical_structure)`, `medmentions_full_ner:B-T012)`, `bionlp_st_2013_gro_ner:I-AP2EREBPRelatedDomain)`, `bionlp_st_2013_gro_ED:B-Silencing)`, `mayosrs_sts:5)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorComplex)`, `genia_term_corpus_ner:B-ANDprotein_substructureprotein_substructure)`, `bionlp_shared_task_2009_ED:B-Regulation)`, `medmentions_full_ner:B-T064)`, `bionlp_st_2013_cg_ner:I-Tissue)`, `bionlp_st_2013_gro_ner:B-Intron)`, `bionlp_st_2013_cg_ED:I-Catabolism)`, `mlee_ED:B-Localization)`, `genia_term_corpus_ner:I-DNA_domain_or_region)`, `chia_ner:B-Device)`, `medmentions_full_ner:B-T026)`, `genia_term_corpus_ner:B-carbohydrate)`, `nlmchem_ner:B-Chemical)`, `bionlp_st_2013_gro_ED:B-Disease)`, `anat_em_ner:I-Immaterial_anatomical_entity)`, `genia_term_corpus_ner:B-DNA_molecule)`, `medmentions_full_ner:I-T007)`, `bionlp_st_2013_gro_ner:I-DNAFragment)`, `genia_term_corpus_ner:I-RNA_domain_or_region)`, `bionlp_st_2013_gro_ner:B-MutatedProtein)`, `ebm_pico_ner:I-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-ProteinCodingRegion)`, `ebm_pico_ner:I-Intervention_Educational)`, `genia_term_corpus_ner:B-ANDcell_linecell_line)`, `spl_adr_200db_train_ner:I-AdverseReaction)`, `bionlp_st_2013_ge_EAE:Site)`, `bionlp_st_2013_cg_ED:I-Cell_transformation)`, `genia_term_corpus_ner:B-protein_substructure)`, `chia_ner:B-Mood)`, `bionlp_st_2013_gro_ED:I-Transport)`, `bionlp_st_2011_ge_ED:I-Negative_regulation)`, `medmentions_full_ner:I-T058)`, `biorelex_ner:B-parameter)`, `medmentions_st21pv_ner:O)`, `bionlp_st_2013_ge_ED:O)`, `bionlp_st_2013_pc_EAE:ToLoc)`, `cellfinder_ner:I-Species)`, `medmentions_full_ner:B-T069)`, `bionlp_st_2013_gro_ED:B-TranscriptionOfGene)`, `chia_ner:I-Condition)`, `mirna_ner:I-Relation_Trigger)`, `bionlp_st_2013_gro_ED:B-FormationOfProteinDNAComplex)`, `bionlp_st_2013_gro_ner:I-InorganicChemical)`, `bionlp_st_2011_id_ner:B-Entity)`, `bionlp_st_2013_gro_ner:B-PrimaryStructure)`, `an_em_ner:I-Cellular_component)`, `medmentions_full_ner:B-T021)`, `mlee_ner:B-Anatomical_system)`, `bionlp_st_2013_pc_ED:B-Localization)`, `chebi_nactem_fullpaper_ner:B-Spectral_Data)`, `mlee_EAE:CSite)`, `bionlp_st_2013_cg_ED:I-Negative_regulation)`, `mlee_ED:I-Breakdown)`, `bionlp_shared_task_2009_ED:B-Localization)`, `bionlp_shared_task_2009_ED:B-Phosphorylation)`, `medmentions_st21pv_ner:I-T170)`, `pico_extraction_ner:I-participant)`, `bionlp_st_2013_cg_ED:B-Breakdown)`, `bionlp_st_2013_gro_ner:I-Nucleotide)`, `chia_ner:B-Person)`, `medmentions_full_ner:B-T194)`, `chia_RE:Subsumes)`, `mlee_ED:B-Metabolism)`, `medmentions_full_ner:I-T099)`, `bionlp_st_2013_gro_ner:I-Protein)`, `an_em_ner:B-Tissue)`, `bioscope_papers_ner:B-speculation)`, `medmentions_st21pv_ner:B-T170)`, `bionlp_st_2013_gro_ED:B-ExperimentalIntervention)`, `bionlp_st_2011_epi_ED:I-Glycosylation)`, `mlee_ED:B-Gene_expression)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorActivity)`, `bionlp_st_2011_epi_ED:B-Phosphorylation)`, `mlee_ED:B-Breakdown)`, `mlee_RE:None)`, `bionlp_st_2013_pc_ED:B-Dephosphorylation)`, `mlee_ner:B-Organism_subdivision)`, `bionlp_st_2013_cg_EAE:Cause)`, `bionlp_st_2013_gro_ner:B-RNAPolymeraseII)`, `medmentions_st21pv_ner:B-T098)`, `bionlp_st_2013_ge_ED:I-Phosphorylation)`, `chia_RE:Has_negation)`, `spl_adr_200db_train_ner:I-Factor)`, `bionlp_st_2013_gro_ED:I-OrganismalProcess)`, `bionlp_shared_task_2009_ED:B-Protein_catabolism)`, `verspoor_2013_ner:I-mutation)`, `bionlp_st_2013_gro_ED:B-Phosphorylation)`, `bionlp_st_2013_ge_EAE:Site2)`, `medmentions_full_ner:B-T129)`, `seth_corpus_ner:B-RS)`, `ebm_pico_ner:I-Participant_Sex)`, `genia_term_corpus_ner:I-protein_molecule)`, `medmentions_full_ner:B-T192)`, `bionlp_st_2013_pc_EAE:None)`, `medmentions_full_ner:I-T094)`, `bionlp_st_2013_ge_ED:I-Gene_expression)`, `bionlp_st_2013_cg_ED:B-Mutation)`, `medmentions_st21pv_ner:B-T033)`, `mlee_ner:B-Drug_or_compound)`, `medmentions_full_ner:B-T061)`, `pcr_ner:I-Herb)`, `bionlp_st_2013_gro_ner:I-MolecularStructure)`, `bionlp_st_2013_cg_ED:I-Development)`, `medmentions_full_ner:B-T032)`, `bionlp_st_2013_pc_ED:B-Dissociation)`, `bionlp_st_2013_pc_ED:I-Localization)`, `genia_term_corpus_ner:B-nucleotide)`, `ebm_pico_ner:B-Outcome_Mortality)`, `bionlp_st_2011_rel_ner:O)`, `bionlp_st_2013_gro_ner:I-Cell)`, `medmentions_full_ner:I-T014)`, `mantra_gsc_en_emea_ner:B-ANAT)`, `medmentions_full_ner:I-T055)`, `medmentions_full_ner:B-T101)`, `bionlp_st_2013_gro_ED:I-RegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressor)`, `bionlp_st_2013_gro_ED:B-ProteinBiosynthesis)`, `biorelex_ner:I-cell)`, `verspoor_2013_RE:None)`, `bionlp_st_2013_cg_ED:I-Blood_vessel_development)`, `genia_term_corpus_ner:I-ANDcell_linecell_line)`, `bionlp_st_2011_id_ED:B-Transcription)`, `medmentions_full_ner:I-T204)`, `tmvar_v1_ner:I-SNP)`, `chia_RE:Has_value)`, `biorelex_ner:I-protein-family)`, `bionlp_st_2013_cg_ED:B-Death)`, `biorelex_ner:I-experimental-construct)`, `mantra_gsc_en_medline_ner:I-PHYS)`, `genia_term_corpus_ner:B-)`, `medmentions_full_ner:I-T203)`, `bionlp_st_2013_gro_ED:B-CellAdhesion)`, `bionlp_st_2013_gro_ner:B-TranslationFactor)`, `ebm_pico_ner:I-Intervention_Control)`, `bionlp_st_2011_ge_ED:I-Protein_catabolism)`, `bionlp_st_2013_gro_ner:B-BetaScaffoldDomain_WithMinorGrooveContacts)`, `bionlp_st_2013_gro_ED:I-BindingOfTFToTFBindingSiteOfProtein)`, `genia_term_corpus_ner:I-atom)`, `scai_chemical_ner:B-)`, `bionlp_st_2013_gro_ner:I-Stress)`, `bionlp_st_2013_pc_ED:I-Pathway)`, `bionlp_st_2011_epi_ED:I-Catalysis)`, `mlee_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-Exon)`, `medmentions_full_ner:I-T083)`, `bionlp_st_2013_cg_ED:B-Translation)`, `chia_ner:B-Measurement)`, `bionlp_st_2011_id_ner:B-Regulon-operon)`, `pdr_ED:I-Treatment_of_disease)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivatorActivity)`, `bionlp_st_2011_epi_ED:I-DNA_methylation)`, `osiris_ner:I-gene)`, `bionlp_st_2013_cg_ner:O)`, `pdr_ner:B-Plant)`, `bionlp_st_2013_gro_ED:B-PositiveRegulationOfTranscription)`, `mantra_gsc_en_patents_ner:B-ANAT)`, `medmentions_full_ner:I-T101)`, `ncbi_disease_ner:I-SpecificDisease)`, `medmentions_full_ner:B-T034)`, `linnaeus_filtered_ner:B-species)`, `bionlp_st_2011_ge_ED:B-Binding)`, `bionlp_st_2013_gro_ner:I-Histone)`, `bionlp_st_2013_cg_ED:I-Carcinogenesis)`, `medmentions_full_ner:I-T192)`, `medmentions_full_ner:B-T080)`, `bionlp_st_2013_ge_EAE:None)`, `bionlp_st_2013_gro_ner:B-BindingSiteOfProtein)`, `bionlp_st_2013_gro_ner:B-TranscriptionCorepressor)`, `ehr_rel_sts:4)`, `mlee_ner:I-Gene_or_gene_product)`, `ddi_corpus_RE:MECHANISM)`, `bionlp_st_2011_ge_ED:I-Localization)`, `bionlp_st_2013_gro_ED:I-CellularDevelopmentalProcess)`, `medmentions_full_ner:B-T098)`, `genia_term_corpus_ner:B-protein_subunit)`, `mantra_gsc_en_emea_ner:I-PROC)`, `bionlp_st_2013_gro_ner:I-ProteinCodingDNARegion)`, `scicite_TEXT:method)`, `bionlp_st_2013_gro_ner:I-CellComponent)`, `genia_term_corpus_ner:I-peptide)`, `medmentions_full_ner:B-T100)`, `bionlp_st_2013_pc_EAE:Cause)`, `medmentions_full_ner:B-T049)`, `bionlp_st_2013_gro_ED:B-Transport)`, `scai_chemical_ner:O)`, `medmentions_full_ner:B-T083)`, `diann_iber_eval_en_ner:I-Disability)`, `bionlp_st_2013_pc_ED:I-Translation)`, `medmentions_full_ner:I-T039)`, `anat_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:I-Ligand)`, `bionlp_st_2013_cg_ED:B-Metabolism)`, `bionlp_st_2013_pc_ED:I-Phosphorylation)`, `bionlp_st_2011_id_ner:O)`, `mantra_gsc_en_patents_ner:B-PHEN)`, `bionlp_st_2013_gro_ner:I-Nucleus)`, `biorelex_ner:I-fusion-protein)`, `bionlp_st_2013_gro_ED:B-Affecting)`, `bionlp_st_2013_gro_ner:I-ComplexOfProteinAndRNA)`, `bionlp_st_2013_gro_ED:B-Methylation)`, `bionlp_st_2013_gro_ner:I-NuclearReceptor)`, `bionlp_st_2013_gro_ED:B-Mitosis)`, `bionlp_st_2013_gro_ED:I-PositiveRegulation)`, `bionlp_st_2013_gro_ED:B-ModificationOfMolecularEntity)`, `pdr_ED:O)`, `bionlp_st_2013_cg_ner:B-Cell)`, `chia_RE:OR)`, `bionlp_st_2013_cg_ner:I-Gene_or_gene_product)`, `bionlp_st_2013_gro_ner:B-Holoenzyme)`, `bionlp_shared_task_2009_EAE:ToLoc)`, `verspoor_2013_ner:I-disease)`, `biorelex_ner:I-tissue)`, `muchmore_en_ner:B-umlsterm)`, `bionlp_st_2013_gro_ED:B-NegativeRegulationOfTranscriptionByTranscriptionRepressor)`, `ehr_rel_sts:5)`, `bionlp_shared_task_2009_ner:B-Protein)`, `mantra_gsc_en_patents_ner:B-LIVB)`, `medmentions_st21pv_ner:I-T038)`, `bionlp_st_2013_gro_ner:B-TranscriptionRegulator)`, `medmentions_full_ner:O)`, `medmentions_full_ner:I-T002)`, `bionlp_st_2013_gro_ner:I-DNARegion)`, `medmentions_full_ner:B-T089)`, `bionlp_st_2013_gro_ED:I-BindingToProtein)`, `bionlp_st_2013_cg_EAE:AtLoc)`, `medmentions_full_ner:B-T077)`, `mirna_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-TranscriptionRegulator)`, `bionlp_st_2013_gro_ner:I-tRNA)`, `bionlp_st_2013_gro_ner:I-Operon)`, `bionlp_st_2011_epi_ED:B-Deglycosylation)`, `chemprot_ner:O)`, `mlee_ner:I-Multi-tissue_structure)`, `genia_term_corpus_ner:B-AND_NOTcell_typecell_type)`, `medmentions_full_ner:I-T023)`, `medmentions_full_ner:B-T094)`, `chemprot_RE:CPR:1)`, `mlee_ED:B-Planned_process)`, `scai_chemical_ner:B-ABBREVIATION)`, `bionlp_st_2013_gro_ner:B-HomeoboxTF)`, `bionlp_st_2011_id_ED:B-Process)`, `bionlp_st_2013_gro_ner:I-Virus)`, `genia_term_corpus_ner:B-atom)`, `bionlp_st_2013_gro_RE:fromSpecies)`, `bionlp_st_2011_id_ED:B-Binding)`, `bionlp_st_2011_id_EAE:None)`, `medmentions_full_ner:B-T203)`, `bionlp_st_2013_gro_ner:B-ThreeDimensionalMolecularStructure)`, `muchmore_en_ner:I-umlsterm)`, `bionlp_st_2013_cg_ner:I-Developing_anatomical_structure)`, `bionlp_st_2013_pc_EAE:FromLoc)`, `genetaggold_ner:I-NEWGENE)`, `bionlp_st_2013_ge_EAE:Theme)`, `bionlp_st_2013_gro_ner:I-Attenuator)`, `nlm_gene_ner:I-Other)`, `medmentions_full_ner:B-T109)`, `osiris_ner:I-variant)`, `chia_ner:I-Mood)`, `medmentions_full_ner:I-T068)`, `minimayosrs_sts:4)`, `bionlp_st_2013_gro_ED:B-CellCyclePhase)`, `bionlp_st_2019_bb_ner:B-Habitat)`, `medmentions_full_ner:I-T097)`, `ehr_rel_sts:6)`, `bionlp_st_2011_epi_ED:I-Methylation)`, `bioinfer_ner:I-Protein_family_or_group)`, `medmentions_st21pv_ner:I-T098)`, `bionlp_st_2013_gro_ner:I-BetaScaffoldDomain_WithMinorGrooveContacts)`, `medmentions_full_ner:B-T047)`, `mlee_ED:B-Dephosphorylation)`, `mantra_gsc_en_emea_ner:I-PHYS)`, `pdr_ner:B-Disease)`, `genia_term_corpus_ner:I-)`, `chemdner_ner:I-Chemical)`, `bionlp_st_2013_gro_ED:B-PositiveRegulationOfTranscriptionOfGene)`, `mlee_ner:I-Protein_domain_or_region)`, `medmentions_full_ner:I-T104)`, `medmentions_full_ner:B-T039)`, `bio_sim_verb_sts:5)`, `chebi_nactem_abstr_ann1_ner:B-Biological_Activity)`, `bionlp_st_2011_epi_ED:I-DNA_demethylation)`, `nlm_gene_ner:I-GENERIF)`, `bionlp_st_2013_gro_ED:B-NegativeRegulationOfTranscription)`, `mantra_gsc_en_emea_ner:I-PHEN)`, `chebi_nactem_fullpaper_ner:B-Chemical_Structure)`, `genia_term_corpus_ner:B-RNA_molecule)`, `mlee_ner:B-Cell)`, `chia_ner:B-Qualifier)`, `bionlp_shared_task_2009_ED:B-Gene_expression)`, `bionlp_st_2013_gro_ner:I-Vitamin)`, `medmentions_full_ner:I-T013)`, `ehr_rel_sts:8)`, `medmentions_full_ner:I-T030)`, `diann_iber_eval_en_ner:O)`, `an_em_RE:frag)`, `genia_term_corpus_ner:I-DNA_substructure)`, `bionlp_st_2013_pc_EAE:Site)`, `genia_term_corpus_ner:I-ANDprotein_complexprotein_complex)`, `bionlp_st_2013_gro_ED:I-TranscriptionInitiation)`, `bionlp_st_2013_gro_ner:B-Polymerase)`, `medmentions_full_ner:I-T004)`, `bionlp_st_2013_gro_ED:B-NegativeRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_gro_ner:B-FusionGene)`, `bionlp_st_2011_ge_ED:I-Binding)`, `bionlp_st_2013_cg_ner:B-DNA_domain_or_region)`, `chia_ner:B-Negation)`, `bionlp_st_2013_gro_ner:I-FusionProtein)`, `minimayosrs_sts:8)`, `chebi_nactem_fullpaper_ner:B-Protein)`, `bionlp_st_2013_gro_ner:B-Enhancer)`, `bionlp_st_2013_gro_ED:B-NegativeRegulation)`, `medmentions_full_ner:I-T041)`, `mantra_gsc_en_emea_ner:O)`, `biorelex_ner:I-protein-motif)`, `bionlp_st_2011_epi_COREF:coref)`, `medmentions_full_ner:I-T093)`, `medmentions_full_ner:B-T200)`, `bionlp_st_2013_gro_ner:B-OpenReadingFrame)`, `bionlp_st_2013_cg_ED:I-Localization)`, `bionlp_st_2013_cg_ner:B-Tissue)`, `bionlp_st_2013_pc_COREF:None)`, `medmentions_full_ner:I-T123)`, `mlee_ED:O)`, `bionlp_st_2013_gro_ner:O)`, `bionlp_st_2013_gro_ner:B-ComplexMolecularEntity)`, `bionlp_st_2013_pc_ED:B-Transcription)`, `anat_em_ner:B-Pathological_formation)`, `diann_iber_eval_en_ner:B-Neg)`, `bionlp_st_2013_ge_ner:I-Protein)`, `scai_chemical_ner:I-TRIVIAL)`, `bionlp_st_2013_gro_ner:B-RibosomalRNA)`, `an_em_ner:B-Organism_subdivision)`, `mlee_ED:I-Remodeling)`, `genia_term_corpus_ner:B-RNA_domain_or_region)`, `bionlp_st_2013_gro_ner:B-BindingAssay)`, `medmentions_full_ner:B-T017)`, `mlee_ED:I-Translation)`, `bionlp_st_2013_gro_ner:B-CpGIsland)`, `bionlp_st_2013_pc_ner:I-Gene_or_gene_product)`, `bionlp_st_2013_gro_ner:I-HMG)`, `bionlp_st_2013_gro_ED:B-FormationOfTranscriptionFactorComplex)`, `mlee_ner:I-Organism_substance)`, `medmentions_full_ner:I-T075)`, `nlm_gene_ner:B-Domain)`, `anat_em_ner:I-Anatomical_system)`, `medmentions_full_ner:B-T057)`, `bionlp_st_2013_gro_ner:I-SecondMessenger)`, `bionlp_st_2013_gro_ner:B-GeneProduct)`, `ebm_pico_ner:I-Outcome_Other)`, `bionlp_st_2013_gro_ED:B-ProteinModification)`, `bionlp_st_2013_gro_ED:B-Modification)`, `bioinfer_ner:B-Protein_family_or_group)`, `medmentions_full_ner:B-T059)`, `bionlp_st_2013_gro_ner:B-Ligand)`, `gnormplus_ner:I-FamilyName)`, `mantra_gsc_en_emea_ner:B-CHEM)`, `bionlp_st_2013_gro_ED:I-CellGrowth)`, `genia_term_corpus_ner:B-DNA_NA)`, `mantra_gsc_en_medline_ner:B-LIVB)`, `verspoor_2013_ner:B-gender)`, `bio_sim_verb_sts:6)`, `spl_adr_200db_train_ner:B-Severity)`, `bionlp_st_2013_cg_ED:I-Breakdown)`, `ddi_corpus_ner:I-BRAND)`, `medmentions_st21pv_ner:B-T097)`, `biorelex_ner:B-experimental-construct)`, `bionlp_st_2013_ge_ED:B-Transcription)`, `chia_ner:I-Multiplier)`, `bionlp_st_2013_gro_ner:I-DNA)`, `geokhoj_v1_TEXT:0)`, `bionlp_st_2013_gro_RE:locatedIn)`, `genia_term_corpus_ner:B-virus)`, `bionlp_st_2013_gro_ner:I-SequenceHomologyAnalysis)`, `bionlp_st_2013_gro_ED:B-RegulatoryProcess)`, `bionlp_st_2013_pc_ED:B-Activation)`, `anat_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-RuntLikeDomain)`, `bioinfer_ner:I-Protein_complex)`, `bionlp_st_2013_gro_ED:I-Increase)`, `anat_em_ner:I-Cell)`, `medmentions_full_ner:B-T131)`, `bionlp_st_2013_gro_ner:B-ProteinDomain)`, `bionlp_st_2013_gro_ner:I-ProteinCodingRegion)`, `bionlp_st_2013_gro_ner:I-PrimaryStructure)`, `seth_corpus_RE:None)`, `genia_term_corpus_ner:I-mono_cell)`, `bioscope_papers_ner:I-negation)`, `genia_term_corpus_ner:I-other_artificial_source)`, `medmentions_full_ner:I-T098)`, `bionlp_st_2013_gro_ner:I-Enhancer)`, `bionlp_st_2013_gro_ner:I-PositiveTranscriptionRegulator)`, `genia_term_corpus_ner:I-polynucleotide)`, `bionlp_st_2011_ge_ED:B-Gene_expression)`, `medmentions_full_ner:B-T121)`, `bionlp_st_2011_id_ED:I-Transcription)`, `biorelex_ner:I-protein-region)`, `chebi_nactem_fullpaper_ner:B-Metabolite)`, `diann_iber_eval_en_ner:B-Disability)`, `bionlp_st_2013_cg_ED:B-Dissociation)`, `medmentions_st21pv_ner:B-T204)`, `genia_term_corpus_ner:I-protein_subunit)`, `medmentions_full_ner:B-T023)`, `bionlp_st_2013_gro_ED:B-Splicing)`, `bionlp_st_2013_gro_ED:I-Silencing)`, `biorelex_ner:B-peptide)`, `bionlp_st_2013_gro_ED:B-BindingOfTFToTFBindingSiteOfProtein)`, `biorelex_ner:I-assay)`, `medmentions_full_ner:B-T048)`, `an_em_ner:I-Organism_substance)`, `bionlp_st_2013_gro_ner:I-Function)`, `spl_adr_200db_train_ner:B-Animal)`, `genia_term_corpus_ner:I-DNA_NA)`, `medmentions_full_ner:I-T070)`, `mlee_ner:I-Anatomical_system)`, `bioinfer_ner:B-Individual_protein)`, `biorelex_ner:B-organelle)`, `verspoor_2013_ner:I-Physiology)`, `bionlp_st_2013_gro_ner:I-ProteinComplex)`, `genia_term_corpus_ner:I-RNA_molecule)`, `mlee_ner:I-DNA_domain_or_region)`, `mlee_ED:I-Pathway)`, `bionlp_st_2013_gro_ED:B-ActivationOfProcess)`, `pico_extraction_ner:B-outcome)`, `minimayosrs_sts:7)`, `medmentions_full_ner:I-T038)`, `verspoor_2013_ner:I-size)`, `ebm_pico_ner:B-Intervention_Other)`, `bionlp_st_2013_gro_ED:B-RNABiosynthesis)`, `bionlp_st_2013_cg_ner:I-Simple_chemical)`, `mantra_gsc_en_medline_ner:I-LIVB)`, `seth_corpus_ner:B-Gene)`, `biorelex_ner:I-reagent)`, `bionlp_st_2013_cg_ED:B-Phosphorylation)`, `bionlp_st_2013_gro_ner:B-Attenuator)`, `pdr_EAE:None)`, `bionlp_st_2011_epi_ED:B-DNA_methylation)`, `bionlp_st_2013_cg_ED:I-Translation)`, `bionlp_st_2013_gro_ED:B-Transcription)`, `medmentions_st21pv_ner:I-T074)`, `bionlp_st_2013_gro_ED:B-ProteinCatabolism)`, `bionlp_st_2013_gro_ED:B-Growth)`, `chia_RE:AND)`, `bionlp_st_2013_pc_ED:I-Transcription)`, `medmentions_full_ner:I-T191)`, `medmentions_full_ner:I-T028)`, `bionlp_st_2013_cg_ED:I-Glycolysis)`, `bionlp_st_2013_ge_ED:B-Localization)`, `mlee_ner:I-Organ)`, `medmentions_full_ner:B-T033)`, `ebm_pico_ner:I-Intervention_Other)`, `bionlp_st_2013_gro_ner:B-NuclearReceptor)`, `genia_term_corpus_ner:B-ANDprotein_complexprotein_complex)`, `an_em_ner:B-Cellular_component)`, `medmentions_full_ner:I-T100)`, `geokhoj_v1_TEXT:1)`, `genia_term_corpus_ner:I-BUT_NOTother_nameother_name)`, `bionlp_st_2013_cg_ED:B-Cell_death)`, `gnormplus_ner:B-Gene)`, `genia_term_corpus_ner:I-RNA_substructure)`, `medmentions_full_ner:I-T190)`, `bionlp_st_2013_gro_ED:B-Homodimerization)`, `medmentions_full_ner:B-T051)`, `genia_term_corpus_ner:B-lipid)`, `bioinfer_ner:B-GeneproteinRNA)`, `bioinfer_ner:B-Gene)`, `medmentions_full_ner:B-T184)`, `anat_em_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelixTF)`, `bionlp_st_2013_cg_ner:I-Protein_domain_or_region)`, `genia_term_corpus_ner:I-other_organic_compound)`, `bionlp_st_2013_gro_ner:B-SmallInterferingRNA)`, `bionlp_st_2013_cg_ED:B-Growth)`, `bionlp_st_2013_cg_ED:B-Synthesis)`, `chia_RE:Has_index)`, `chia_ner:I-Device)`, `ddi_corpus_ner:B-GROUP)`, `bionlp_shared_task_2009_ED:I-Gene_expression)`, `bionlp_st_2013_gro_ner:B-MutantProtein)`, `genia_term_corpus_ner:B-DNA_substructure)`, `biorelex_ner:I-disease)`, `biorelex_ner:I-amino-acid)`, `medmentions_full_ner:B-T127)`, `ebm_pico_ner:I-Intervention_Psychological)`, `mlee_ED:I-Planned_process)`, `pubmed_qa_labeled_fold0_CLF:no)`, `mlee_ner:I-Drug_or_compound)`, `medmentions_full_ner:I-T185)`, `minimayosrs_sts:1)`, `bionlp_st_2011_epi_ED:B-DNA_demethylation)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorBindingSiteOfDNA)`, `bionlp_st_2013_gro_ED:I-ResponseProcess)`, `medmentions_full_ner:I-T201)`, `bionlp_st_2011_ge_ED:I-Transcription)`, `bionlp_st_2013_cg_ED:I-Mutation)`, `tmvar_v1_ner:I-ProteinMutation)`, `medmentions_full_ner:I-T063)`, `verspoor_2013_ner:I-Phenomena)`, `bionlp_st_2011_id_ED:B-Negative_regulation)`, `chemprot_RE:CPR:2)`, `bionlp_st_2013_gro_ner:B-ProteinSubunit)`, `medmentions_full_ner:B-T011)`, `genia_term_corpus_ner:I-ANDother_nameother_name)`, `an_em_ner:I-Tissue)`, `bionlp_st_2013_gro_ner:B-bHLHTF)`, `pico_extraction_ner:B-intervention)`, `bionlp_st_2013_gro_ED:B-Increase)`, `mlee_ner:I-Organism)`, `mantra_gsc_en_emea_ner:I-CHEM)`, `bionlp_st_2013_cg_ner:I-Organism)`, `bionlp_st_2013_gro_ner:I-ProteinDomain)`, `medmentions_full_ner:B-T185)`, `mantra_gsc_en_patents_ner:I-PROC)`, `medmentions_full_ner:I-T120)`, `bionlp_st_2013_gro_ED:B-CellularMetabolicProcess)`, `scai_chemical_ner:I-ABBREVIATION)`, `bionlp_st_2013_cg_ED:I-Planned_process)`, `bionlp_st_2013_cg_ner:B-Anatomical_system)`, `chia_ner:I-Procedure)`, `genia_term_corpus_ner:I-ANDcell_typecell_type)`, `scai_chemical_ner:I-)`, `biorelex_ner:B-experiment-tag)`, `genia_term_corpus_ner:B-ORDNA_domain_or_regionDNA_domain_or_region)`, `medmentions_full_ner:B-T044)`, `mirna_ner:B-Non-Specific_miRNAs)`, `mlee_ED:B-Cell_division)`, `bionlp_st_2011_id_ner:I-Entity)`, `bionlp_st_2013_cg_ED:B-Cell_proliferation)`, `bionlp_st_2011_epi_EAE:None)`, `bionlp_st_2013_cg_ED:B-DNA_methylation)`, `bionlp_st_2013_gro_ED:O)`, `bionlp_st_2013_gro_ED:B-Producing)`, `bionlp_st_2013_cg_EAE:Instrument)`, `bionlp_st_2013_gro_ED:B-Stabilization)`, `pcr_ner:B-Chemical)`, `bionlp_st_2013_cg_ED:B-Development)`, `ebm_pico_ner:B-Intervention_Physical)`, `bionlp_st_2011_ge_ED:I-Regulation)`, `bionlp_st_2013_pc_ED:B-Demethylation)`, `bionlp_st_2011_epi_ner:B-Protein)`, `chemprot_RE:CPR:0)`, `medmentions_full_ner:B-T055)`, `bionlp_st_2013_gro_ED:B-Decrease)`, `spl_adr_200db_train_ner:I-Severity)`, `bionlp_st_2013_gro_ner:I-Ion)`, `bionlp_st_2013_pc_ner:B-Gene_or_gene_product)`, `genia_term_corpus_ner:B-inorganic)`, `chia_ner:O)`, `linnaeus_ner:B-species)`, `biorelex_ner:I-protein)`, `mantra_gsc_en_medline_ner:B-PROC)`, `medmentions_full_ner:B-T078)`, `medmentions_full_ner:I-T062)`, `medmentions_full_ner:I-T081)`, `mantra_gsc_en_emea_ner:B-PHEN)`, `medmentions_st21pv_ner:B-T022)`, `bc5cdr_ner:I-Disease)`, `chia_ner:B-Multiplier)`, `bionlp_st_2013_gro_ner:I-bHLH)`, `bionlp_st_2013_gro_ED:B-CellularProcess)`, `bionlp_st_2013_gro_ED:B-Acetylation)`, `genia_term_corpus_ner:B-RNA_family_or_group)`, `bionlp_st_2013_gro_ED:I-IntraCellularTransport)`, `bionlp_st_2013_gro_ner:B-Chromatin)`, `bionlp_st_2013_ge_ED:B-Binding)`, `bionlp_st_2013_gro_ner:I-AminoAcid)`, `bionlp_st_2013_gro_ED:B-CellFateDetermination)`, `medmentions_full_ner:I-T091)`, `medmentions_full_ner:B-T066)`, `medmentions_full_ner:B-T022)`, `genetaggold_ner:O)`, `medmentions_full_ner:B-T074)`, `bionlp_st_2013_pc_ED:I-Gene_expression)`, `bionlp_st_2013_gro_ED:I-Disease)`, `biosses_sts:7)`, `medmentions_full_ner:B-T071)`, `medmentions_full_ner:B-T086)`, `biorelex_ner:I-protein-complex)`, `mlee_ED:B-Remodeling)`, `medmentions_st21pv_ner:I-T007)`, `bionlp_st_2011_id_ED:I-Regulation)`, `biorelex_ner:B-drug)`, `bionlp_st_2013_gro_ED:I-Transcription)`, `bionlp_st_2011_epi_EAE:Theme)`, `mantra_gsc_en_patents_ner:I-DISO)`, `anat_em_ner:I-Organ)`, `scai_chemical_ner:I-PARTIUPAC)`, `bionlp_st_2013_cg_ED:I-Metastasis)`, `medmentions_full_ner:I-T197)`, `bionlp_st_2013_pc_ED:O)`, `medmentions_st21pv_ner:B-T092)`, `bionlp_shared_task_2009_ED:B-Positive_regulation)`, `medmentions_full_ner:B-T045)`, `chemprot_RE:CPR:8)`, `bionlp_st_2013_cg_ED:B-Localization)`, `nlm_gene_ner:I-Domain)`, `verspoor_2013_ner:B-age)`, `bionlp_st_2011_epi_ED:O)`, `chebi_nactem_abstr_ann1_ner:B-Species)`, `medmentions_full_ner:B-T122)`, `bionlp_st_2011_id_ner:I-Protein)`, `bionlp_st_2013_gro_ED:I-BindingOfProteinToDNA)`, `bionlp_st_2013_gro_ner:I-RNAPolymeraseII)`, `medmentions_full_ner:I-T050)`, `genia_term_corpus_ner:B-ANDother_nameother_name)`, `nlm_gene_ner:B-STARGENE)`, `bionlp_st_2013_gro_ED:B-BindingOfMolecularEntity)`, `mirna_ner:B-GenesProteins)`, `scai_chemical_ner:B-MODIFIER)`, `mantra_gsc_en_emea_ner:B-OBJC)`, `mirna_ner:B-Diseases)`, `bionlp_st_2013_cg_ED:I-Death)`, `mantra_gsc_en_emea_ner:I-DISO)`, `bionlp_st_2013_gro_ED:I-Decrease)`, `bionlp_st_2013_gro_ner:B-DNABindingDomainOfProtein)`, `bioinfer_ner:O)`, `anat_em_ner:I-Multi-tissue_structure)`, `osiris_ner:O)`, `bionlp_st_2013_cg_EAE:None)`, `medmentions_st21pv_ner:B-T062)`, `medmentions_full_ner:B-T075)`, `genia_term_corpus_ner:I-AND_NOTcell_typecell_type)`, `bionlp_st_2013_gro_ED:B-CellCycle)`, `medmentions_full_ner:B-UnknownType)`, `bionlp_st_2013_cg_ner:I-Cancer)`, `medmentions_full_ner:I-T005)`, `genia_term_corpus_ner:I-protein_complex)`, `bionlp_st_2013_cg_ED:B-Cell_transformation)` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bigbio_mtl_en_5.2.0_3.0_1699290919040.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bigbio_mtl_en_5.2.0_3.0_1699290919040.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bigbio_mtl","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bigbio_mtl","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_bigscience_biomedical").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bigbio_mtl| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/bigscience-biomedical/bigbio-mtl \ No newline at end of file From 73614e0a83e62ac478e52468c83fd6a80b97279c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:20:57 +0700 Subject: [PATCH 037/667] Add model 2023-11-06-bert_ner_deid_bert_i2b2_en --- .../2023-11-06-bert_ner_deid_bert_i2b2_en.md | 121 ++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_deid_bert_i2b2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deid_bert_i2b2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deid_bert_i2b2_en.md new file mode 100644 index 00000000000000..4e633768767a39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deid_bert_i2b2_en.md @@ -0,0 +1,121 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from obi) +author: John Snow Labs +name: bert_ner_deid_bert_i2b2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `deid_bert_i2b2` is a English model originally trained by `obi`. + +## Predicted Entities + +`L-HOSP`, `L-DATE`, `L-AGE`, `HOSP`, `DATE`, `PATIENT`, `U-DATE`, `PHONE`, `U-HOSP`, `ID`, `U-LOC`, `U-OTHERPHI`, `U-ID`, `U-PATIENT`, `U-EMAIL`, `U-PHONE`, `LOC`, `L-EMAIL`, `U-PATORG`, `L-PHONE`, `EMAIL`, `AGE`, `L-PATIENT`, `L-OTHERPHI`, `L-LOC`, `U-STAFF`, `L-PATORG`, `L-STAFF`, `PATORG`, `U-AGE`, `L-ID`, `OTHERPHI`, `STAFF` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_deid_bert_i2b2_en_5.2.0_3.0_1699291149565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_deid_bert_i2b2_en_5.2.0_3.0_1699291149565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_deid_bert_i2b2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_deid_bert_i2b2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_obi").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_deid_bert_i2b2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/obi/deid_bert_i2b2 +- https://github.com/obi-ml-public/ehr_deidentification/tree/master/steps/forward_pass +- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978170/ +- https://arxiv.org/pdf/1904.03323.pdf +- https://github.com/obi-ml-public/ehr_deidentification/tree/master/steps/train +- https://github.com/obi-ml-public/ehr_deidentification/blob/master/AnnotationGuidelines.md +- https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html +- https://github.com/obi-ml-public/ehr_deidentification \ No newline at end of file From e74d0385fbff83d07497d2ddbcee2d4ba143f7d5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:21:58 +0700 Subject: [PATCH 038/667] Add model 2023-11-06-bert_ner_biobert_ner_ncbi_disease_en --- ...06-bert_ner_biobert_ner_ncbi_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_ncbi_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_ncbi_disease_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_ncbi_disease_en.md new file mode 100644 index 00000000000000..8e49147f63faa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ner_ncbi_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biobert_ner_ncbi_disease BertForTokenClassification from drAbreu +author: John Snow Labs +name: bert_ner_biobert_ner_ncbi_disease +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biobert_ner_ncbi_disease` is a English model originally trained by drAbreu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_ner_ncbi_disease_en_5.2.0_3.0_1699291144471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_ner_ncbi_disease_en_5.2.0_3.0_1699291144471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_ner_ncbi_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biobert_ner_ncbi_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_ner_ncbi_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/drAbreu/bioBERT-NER-NCBI_disease \ No newline at end of file From ffed322d85d95582ca057ba89547550f53fd0e62 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:22:58 +0700 Subject: [PATCH 039/667] Add model 2023-11-06-bert_ner_codeswitch_spaeng_lid_lince_en --- ...bert_ner_codeswitch_spaeng_lid_lince_en.md | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_lid_lince_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_lid_lince_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_lid_lince_en.md new file mode 100644 index 00000000000000..7f7333c5a345ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_lid_lince_en.md @@ -0,0 +1,116 @@ +--- +layout: model +title: English Named Entity Recognition (from sagorsarker) +author: John Snow Labs +name: bert_ner_codeswitch_spaeng_lid_lince +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `codeswitch-spaeng-lid-lince` is a English model orginally trained by `sagorsarker`. + +## Predicted Entities + +`mixed`, `other`, `unk`, `en`, `ambiguous`, `spa`, `ne`, `fw` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_spaeng_lid_lince_en_5.2.0_3.0_1699291315591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_spaeng_lid_lince_en_5.2.0_3.0_1699291315591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_spaeng_lid_lince","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_spaeng_lid_lince","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.codeswitch_spaeng_lid_lince.by_sagorsarker").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_codeswitch_spaeng_lid_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sagorsarker/codeswitch-spaeng-lid-lince +- https://ritual.uh.edu/lince/home +- https://github.com/sagorbrur/codeswitch \ No newline at end of file From 0385dea5b882a218b6f37122ee3e6800e18f82e4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:23:58 +0700 Subject: [PATCH 040/667] Add model 2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_en --- ...rt_ner_dshvadskiy_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..879e27d5a1ddc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from dshvadskiy) +author: John Snow Labs +name: bert_ner_dshvadskiy_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `dshvadskiy`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_dshvadskiy_bert_finetuned_ner_en_5.2.0_3.0_1699291414329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_dshvadskiy_bert_finetuned_ner_en_5.2.0_3.0_1699291414329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dshvadskiy_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dshvadskiy_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_dshvadskiy").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_dshvadskiy_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dshvadskiy/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2002 \ No newline at end of file From 7d2e2220f83120783dc735544536e52652c014b0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:24:58 +0700 Subject: [PATCH 041/667] Add model 2023-11-06-bert_ner_biobert_chemical_ner_en --- ...-11-06-bert_ner_biobert_chemical_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_chemical_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_chemical_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_chemical_ner_en.md new file mode 100644 index 00000000000000..c9567e87763b1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_chemical_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from alvaroalon2) +author: John Snow Labs +name: bert_ner_biobert_chemical_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biobert_chemical_ner` is a English model originally trained by `alvaroalon2`. + +## Predicted Entities + +`CHEMICAL` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_chemical_ner_en_5.2.0_3.0_1699291347294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_chemical_ner_en_5.2.0_3.0_1699291347294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_chemical_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_chemical_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert.chemical.").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_chemical_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/alvaroalon2/biobert_chemical_ner +- https://github.com/librairy/bio-ner \ No newline at end of file From eba3e86796b736b9e57ddf38f7abb952baaadcad Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:25:58 +0700 Subject: [PATCH 042/667] Add model 2023-11-06-bert_ner_animalthemuppet_bert_finetuned_ner_en --- ...r_animalthemuppet_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_animalthemuppet_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_animalthemuppet_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_animalthemuppet_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..d067ddb9922395 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_animalthemuppet_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from animalthemuppet) +author: John Snow Labs +name: bert_ner_animalthemuppet_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `animalthemuppet`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_animalthemuppet_bert_finetuned_ner_en_5.2.0_3.0_1699282678624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_animalthemuppet_bert_finetuned_ner_en_5.2.0_3.0_1699282678624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_animalthemuppet_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_animalthemuppet_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_animalthemuppet").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_animalthemuppet_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/animalthemuppet/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 48055da808a66a0795a97d31558e986508a0794e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:26:59 +0700 Subject: [PATCH 043/667] Add model 2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t2_en --- ..._original_scibert_bc5cdr_chemical_t2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t2_en.md new file mode 100644 index 00000000000000..4ce8e8204015e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_scibert_bc5cdr_chemical_t2 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_scibert_bc5cdr_chemical_t2 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_scibert_bc5cdr_chemical_t2` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc5cdr_chemical_t2_en_5.2.0_3.0_1699282572917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc5cdr_chemical_t2_en_5.2.0_3.0_1699282572917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_scibert_bc5cdr_chemical_t2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_scibert_bc5cdr_chemical_t2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_scibert_bc5cdr_chemical_t2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-SciBERT-BC5CDR-Chemical-T2 \ No newline at end of file From 9b047a43feb7a3be0d46ff09413843524ebe75b2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:27:59 +0700 Subject: [PATCH 044/667] Add model 2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en --- ...adskiy_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..8dde70f48428cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from dshvadskiy) +author: John Snow Labs +name: bert_ner_dshvadskiy_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `dshvadskiy`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699291502933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_dshvadskiy_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699291502933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dshvadskiy_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dshvadskiy_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_dshvadskiy").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_dshvadskiy_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dshvadskiy/bert-finetuned-ner-accelerate \ No newline at end of file From 2c546ff999e161753d8082c191b91d169b9701e6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:28:59 +0700 Subject: [PATCH 045/667] Add model 2023-11-06-bert_ner_bert_degree_major_ner_1000_en --- ...-bert_ner_bert_degree_major_ner_1000_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_degree_major_ner_1000_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_degree_major_ner_1000_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_degree_major_ner_1000_en.md new file mode 100644 index 00000000000000..9a9f0385815879 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_degree_major_ner_1000_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from pkushiqiang) +author: John Snow Labs +name: bert_ner_bert_degree_major_ner_1000 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-degree-major-ner-1000` is a English model originally trained by `pkushiqiang`. + +## Predicted Entities + +`degree`, `major` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_degree_major_ner_1000_en_5.2.0_3.0_1699286418066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_degree_major_ner_1000_en_5.2.0_3.0_1699286418066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_degree_major_ner_1000","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_degree_major_ner_1000","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.degree_major_ner_1000.by_pkushiqiang").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_degree_major_ner_1000| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/pkushiqiang/bert-degree-major-ner-1000 \ No newline at end of file From b049a8ad11337d94d3aa8efd083ef15fd393742e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:29:59 +0700 Subject: [PATCH 046/667] Add model 2023-11-06-bert_ner_gk07_wikineural_multilingual_ner_en --- ...ner_gk07_wikineural_multilingual_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_gk07_wikineural_multilingual_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_gk07_wikineural_multilingual_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_gk07_wikineural_multilingual_ner_en.md new file mode 100644 index 00000000000000..9133f891275583 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_gk07_wikineural_multilingual_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from gk07) +author: John Snow Labs +name: bert_ner_gk07_wikineural_multilingual_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `wikineural-multilingual-ner` is a English model originally trained by `gk07`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_gk07_wikineural_multilingual_ner_en_5.2.0_3.0_1699291765022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_gk07_wikineural_multilingual_ner_en_5.2.0_3.0_1699291765022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_gk07_wikineural_multilingual_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_gk07_wikineural_multilingual_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.wikineural.multilingual.by_gk07").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_gk07_wikineural_multilingual_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/gk07/wikineural-multilingual-ner \ No newline at end of file From 5e163a6cb23be05978961a1bce6a072089256217 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:31:00 +0700 Subject: [PATCH 047/667] Add model 2023-11-06-bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga --- ...rt_base_irish_cased_v1_finetuned_ner_ga.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga.md new file mode 100644 index 00000000000000..869334fbf67859 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Irish BertForTokenClassification Base Cased model (from jimregan) +author: John Snow Labs +name: bert_ner_bert_base_irish_cased_v1_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, ga, onnx] +task: Named Entity Recognition +language: ga +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-irish-cased-v1-finetuned-ner` is a Irish model originally trained by `jimregan`. + +## Predicted Entities + +`ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga_5.2.0_3.0_1699286896975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_irish_cased_v1_finetuned_ner_ga_5.2.0_3.0_1699286896975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_irish_cased_v1_finetuned_ner","ga") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Is breá liom Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_irish_cased_v1_finetuned_ner","ga") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Is breá liom Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("ga.ner.bert.wikiann.cased_base_finetuned").predict("""Is breá liom Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_irish_cased_v1_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ga| +|Size:|406.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/jimregan/bert-base-irish-cased-v1-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=wikiann \ No newline at end of file From 62d34b4c8b4ad46246f052a6e9daf167c51b44b7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:32:00 +0700 Subject: [PATCH 048/667] Add model 2023-11-06-bert_ner_biobert_ncbi_disease_ner_en --- ...06-bert_ner_biobert_ncbi_disease_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ncbi_disease_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ncbi_disease_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ncbi_disease_ner_en.md new file mode 100644 index 00000000000000..4d0d25b0ae66d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_ncbi_disease_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from ugaray96) +author: John Snow Labs +name: bert_ner_biobert_ncbi_disease_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biobert_ncbi_disease_ner` is a English model originally trained by `ugaray96`. + +## Predicted Entities + +`No Disease`, `Disease Continuation`, `Disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_ncbi_disease_ner_en_5.2.0_3.0_1699291798160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_ncbi_disease_ner_en_5.2.0_3.0_1699291798160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_ncbi_disease_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_ncbi_disease_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert.ncbi.disease.by_ugaray96").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_ncbi_disease_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ugaray96/biobert_ncbi_disease_ner \ No newline at end of file From dc3a28d69df6c213ca0516fb1a09cfba41bbc48f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:33:00 +0700 Subject: [PATCH 049/667] Add model 2023-11-06-bert_ner_gro_ner_2_en --- .../2023-11-06-bert_ner_gro_ner_2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_gro_ner_2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_gro_ner_2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_gro_ner_2_en.md new file mode 100644 index 00000000000000..58bb605209444b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_gro_ner_2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mirikwa) +author: John Snow Labs +name: bert_ner_gro_ner_2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `gro-ner-2` is a English model originally trained by `mirikwa`. + +## Predicted Entities + +`METRIC`, `REGION`, `ITEM` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_gro_ner_2_en_5.2.0_3.0_1699291798204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_gro_ner_2_en_5.2.0_3.0_1699291798204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_gro_ner_2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_gro_ner_2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_mirikwa").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_gro_ner_2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mirikwa/gro-ner-2 \ No newline at end of file From 4b139d6736ce63a711504fde3d591a6b8a5ea5b2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:34:00 +0700 Subject: [PATCH 050/667] Add model 2023-11-06-bert_ner_ner_rubert_per_loc_org_en --- ...1-06-bert_ner_ner_rubert_per_loc_org_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_rubert_per_loc_org_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_rubert_per_loc_org_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_rubert_per_loc_org_en.md new file mode 100644 index 00000000000000..667cbc823103c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_rubert_per_loc_org_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_ner_rubert_per_loc_org BertForTokenClassification from tesemnikov-av +author: John Snow Labs +name: bert_ner_ner_rubert_per_loc_org +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ner_rubert_per_loc_org` is a English model originally trained by tesemnikov-av. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_rubert_per_loc_org_en_5.2.0_3.0_1699278244013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_rubert_per_loc_org_en_5.2.0_3.0_1699278244013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_rubert_per_loc_org","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_ner_rubert_per_loc_org", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_rubert_per_loc_org| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|43.8 MB| + +## References + +https://huggingface.co/tesemnikov-av/NER-RUBERT-Per-Loc-Org \ No newline at end of file From 94e93b080b540bf3819ab4f1751b7f9c3c40a3ba Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:35:00 +0700 Subject: [PATCH 051/667] Add model 2023-11-06-bert_ner_original_scibert_bc4chemd_en --- ...6-bert_ner_original_scibert_bc4chemd_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_en.md new file mode 100644 index 00000000000000..d5587b267c960e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_scibert_bc4chemd BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_scibert_bc4chemd +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_scibert_bc4chemd` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc4chemd_en_5.2.0_3.0_1699282728006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc4chemd_en_5.2.0_3.0_1699282728006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_scibert_bc4chemd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_scibert_bc4chemd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_scibert_bc4chemd| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-SciBERT-BC4CHEMD \ No newline at end of file From 418f232db687def1ca8288e84787b86751815e1c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:36:00 +0700 Subject: [PATCH 052/667] Add model 2023-11-06-bert_ner_ehelpbertpt_en --- .../2023-11-06-bert_ner_ehelpbertpt_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ehelpbertpt_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ehelpbertpt_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ehelpbertpt_en.md new file mode 100644 index 00000000000000..2f0e7bfe090aec --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ehelpbertpt_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_ehelpbertpt BertForTokenClassification from pucpr +author: John Snow Labs +name: bert_ner_ehelpbertpt +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ehelpbertpt` is a English model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ehelpbertpt_en_5.2.0_3.0_1699292038956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ehelpbertpt_en_5.2.0_3.0_1699292038956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ehelpbertpt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_ehelpbertpt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ehelpbertpt| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.2 MB| + +## References + +https://huggingface.co/pucpr/eHelpBERTpt \ No newline at end of file From 901e7a2ca99ffc5567f9d8a9c412a51ca251299a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:37:01 +0700 Subject: [PATCH 053/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv --- ...wedish_small_set_health_and_standart_sv.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv.md new file mode 100644 index 00000000000000..54f9e43f1d8a8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Swedish BertForTokenClassification Small Cased model (from Nonzerophilip) +author: John Snow Labs +name: bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart +date: 2023-11-06 +tags: [bert, ner, open_source, sv, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner_swedish_small_set_health_and_standart` is a Swedish model originally trained by `Nonzerophilip`. + +## Predicted Entities + +`PER`, `ORG`, `LOC`, `HEALTH`, `relation`, `PHARMA_DRUGS` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv_5.2.0_3.0_1699288719693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart_sv_5.2.0_3.0_1699288719693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart","sv") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Jag älskar Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart","sv") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Jag älskar Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sv.ner.bert.small_finetuned").predict("""Jag älskar Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner_swedish_small_set_health_and_standart| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Nonzerophilip/bert-finetuned-ner_swedish_small_set_health_and_standart \ No newline at end of file From 92dc71ee770f779784c87499bb9b21c1227bfd9d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:38:01 +0700 Subject: [PATCH 054/667] Add model 2023-11-06-bert_ner_bert_mention_french_vera_pro_fr --- ...ert_ner_bert_mention_french_vera_pro_fr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_french_vera_pro_fr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_french_vera_pro_fr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_french_vera_pro_fr.md new file mode 100644 index 00000000000000..34f137a6d706f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_french_vera_pro_fr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: French bert_ner_bert_mention_french_vera_pro BertForTokenClassification from vera-pro +author: John Snow Labs +name: bert_ner_bert_mention_french_vera_pro +date: 2023-11-06 +tags: [bert, fr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: fr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_mention_french_vera_pro` is a French model originally trained by vera-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mention_french_vera_pro_fr_5.2.0_3.0_1699288386379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mention_french_vera_pro_fr_5.2.0_3.0_1699288386379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_mention_french_vera_pro","fr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_mention_french_vera_pro", "fr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_mention_french_vera_pro| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|fr| +|Size:|665.1 MB| + +## References + +https://huggingface.co/vera-pro/bert-mention-fr \ No newline at end of file From 949d6fda65c9ce1851472fff4d8b8138c2d623b8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:39:01 +0700 Subject: [PATCH 055/667] Add model 2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx --- ...sed_v1.2_finetuned_ner_craft_english_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx.md new file mode 100644 index 00000000000000..f3a7234037dd01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english BertForTokenClassification from StivenLancheros +author: John Snow Labs +name: bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english +date: 2023-11-06 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english` is a Multilingual model originally trained by StivenLancheros. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx_5.2.0_3.0_1699289677819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english_xx_5.2.0_3.0_1699289677819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_base_cased_v1.2_finetuned_ner_craft_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|403.7 MB| + +## References + +https://huggingface.co/StivenLancheros/biobert-base-cased-v1.2-finetuned-ner-CRAFT_English \ No newline at end of file From 9e8f037fd11f8d4ee41852c35ba61c2614aaad0c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:40:01 +0700 Subject: [PATCH 056/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv --- ...rt_finetuned_ner_swedish_test_numb_2_sv.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv.md new file mode 100644 index 00000000000000..ed494845aacb23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Swedish bert_ner_bert_finetuned_ner_swedish_test_numb_2 BertForTokenClassification from Nonzerophilip +author: John Snow Labs +name: bert_ner_bert_finetuned_ner_swedish_test_numb_2 +date: 2023-11-06 +tags: [bert, sv, open_source, token_classification, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_finetuned_ner_swedish_test_numb_2` is a Swedish model originally trained by Nonzerophilip. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv_5.2.0_3.0_1699289976673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_test_numb_2_sv_5.2.0_3.0_1699289976673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_test_numb_2","sv") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_finetuned_ner_swedish_test_numb_2", "sv") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner_swedish_test_numb_2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.2 MB| + +## References + +https://huggingface.co/Nonzerophilip/bert-finetuned-ner_swedish_test_NUMb_2 \ No newline at end of file From 2fa31188e59fef09c32c1b7da88bf6d208caaeb3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:41:01 +0700 Subject: [PATCH 057/667] Add model 2023-11-06-bert_ner_envoy_en --- .../2023-11-06-bert_ner_envoy_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_envoy_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_envoy_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_envoy_en.md new file mode 100644 index 00000000000000..fc861107f13741 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_envoy_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from fagner) +author: John Snow Labs +name: bert_ner_envoy +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `envoy` is a English model originally trained by `fagner`. + +## Predicted Entities + +`Disease`, `Anatomy`, `Chemical` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_envoy_en_5.2.0_3.0_1699292316494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_envoy_en_5.2.0_3.0_1699292316494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_envoy","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_envoy","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_fagner").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_envoy| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/fagner/envoy \ No newline at end of file From 562408ffadd87701cd2c53795480a428f4a1ce1a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:42:02 +0700 Subject: [PATCH 058/667] Add model 2023-11-06-bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en --- ...5cdr_disease_imbalanced_biobert_v1.1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en.md new file mode 100644 index 00000000000000..fb443d5feaf781 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en_5.2.0_3.0_1699274225547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1_en_5.2.0_3.0_1699274225547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc5cdr_disease_imbalanced_biobert_v1.1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/BC5CDR-Disease-imbalanced-biobert-v1.1 \ No newline at end of file From 755000842b4758dedcae785833a6af47d9675e66 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:43:02 +0700 Subject: [PATCH 059/667] Add model 2023-11-06-bert_ner_original_bluebert_bc5cdr_chemical_en --- ...er_original_bluebert_bc5cdr_chemical_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_bc5cdr_chemical_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_bc5cdr_chemical_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_bc5cdr_chemical_en.md new file mode 100644 index 00000000000000..3cfa47e065d061 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_bc5cdr_chemical_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_bluebert_bc5cdr_chemical BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_bluebert_bc5cdr_chemical +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_bluebert_bc5cdr_chemical` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_bluebert_bc5cdr_chemical_en_5.2.0_3.0_1699282192810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_bluebert_bc5cdr_chemical_en_5.2.0_3.0_1699282192810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_bluebert_bc5cdr_chemical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_bluebert_bc5cdr_chemical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_bluebert_bc5cdr_chemical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-BlueBERT-BC5CDR-Chemical \ No newline at end of file From e102862f187a3e05a5b41959f7e54aaf5be9312a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:44:02 +0700 Subject: [PATCH 060/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_sv --- ..._ner_bert_finetuned_ner_swedish_test_sv.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_sv.md new file mode 100644 index 00000000000000..cd8985bacb77a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_sv.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Swedish BertForTokenClassification Cased model (from Nonzerophilip) +author: John Snow Labs +name: bert_ner_bert_finetuned_ner_swedish_test +date: 2023-11-06 +tags: [bert, ner, open_source, sv, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner_swedish_test` is a Swedish model originally trained by `Nonzerophilip`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_test_sv_5.2.0_3.0_1699286684416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_test_sv_5.2.0_3.0_1699286684416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_test","sv") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Jag älskar Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_test","sv") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Jag älskar Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sv.ner.bert.finetuned").predict("""Jag älskar Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner_swedish_test| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Nonzerophilip/bert-finetuned-ner_swedish_test \ No newline at end of file From 2504ed1abb7885e0765457819ed115f6ad2eb6f2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:45:02 +0700 Subject: [PATCH 061/667] Add model 2023-11-06-bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl --- ...se_dutch_cased_finetuned_udlassy_ner_nl.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl.md new file mode 100644 index 00000000000000..55d8b0a9073043 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Dutch BertForTokenClassification Base Cased model (from wietsedv) +author: John Snow Labs +name: bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner +date: 2023-11-06 +tags: [bert, ner, open_source, nl, onnx] +task: Named Entity Recognition +language: nl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-dutch-cased-finetuned-udlassy-ner` is a Dutch model originally trained by `wietsedv`. + +## Predicted Entities + +`TIME`, `WORK_OF_ART`, `FAC`, `NORP`, `PERCENT`, `DATE`, `PRODUCT`, `LANGUAGE`, `CARDINAL`, `EVENT`, `MONEY`, `LAW`, `QUANTITY`, `GPE`, `ORDINAL`, `ORG`, `PERSON`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl_5.2.0_3.0_1699284199218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner_nl_5.2.0_3.0_1699284199218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner","nl") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ik hou van Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner","nl") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ik hou van Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("nl.ner.bert.cased_base_finetuned.by_wietsedv").predict("""Ik hou van Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_dutch_cased_finetuned_udlassy_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|406.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/wietsedv/bert-base-dutch-cased-finetuned-udlassy-ner \ No newline at end of file From 9b1bb7fee1552513320dcafd5049d6dd9add9a0a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:46:03 +0700 Subject: [PATCH 062/667] Add model 2023-11-06-bert_ner_body_site_en --- .../2023-11-06-bert_ner_body_site_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_body_site_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_body_site_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_body_site_en.md new file mode 100644 index 00000000000000..8f5dad3c26d028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_body_site_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Maaly) +author: John Snow Labs +name: bert_ner_body_site +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `body-site` is a English model originally trained by `Maaly`. + +## Predicted Entities + +`anatomy` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_body_site_en_5.2.0_3.0_1699292606882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_body_site_en_5.2.0_3.0_1699292606882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_body_site","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_body_site","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.body_site.by_maaly").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_body_site| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Maaly/body-site +- https://gitlab.com/maaly7/emerald_metagenomics_annotations \ No newline at end of file From e74bf35cb5adfc22b6a7d24096702a5c3426d004 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:47:03 +0700 Subject: [PATCH 063/667] Add model 2023-11-06-bert_ner_orignal_scibert_ncbi_en --- ...-11-06-bert_ner_orignal_scibert_ncbi_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_orignal_scibert_ncbi_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_orignal_scibert_ncbi_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_orignal_scibert_ncbi_en.md new file mode 100644 index 00000000000000..f227aa53b18e2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_orignal_scibert_ncbi_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_orignal_scibert_ncbi BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_orignal_scibert_ncbi +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_orignal_scibert_ncbi` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_orignal_scibert_ncbi_en_5.2.0_3.0_1699282752411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_orignal_scibert_ncbi_en_5.2.0_3.0_1699282752411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_orignal_scibert_ncbi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_orignal_scibert_ncbi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_orignal_scibert_ncbi| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/Orignal-SciBERT-NCBI \ No newline at end of file From cd78b0eb77715161e5d00e4fde6732557fea0cda Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:48:03 +0700 Subject: [PATCH 064/667] Add model 2023-11-06-bert_ner_dbert_ner_en --- .../2023-11-06-bert_ner_dbert_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbert_ner_en.md new file mode 100644 index 00000000000000..3dd53210908801 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbert_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from deeq) +author: John Snow Labs +name: bert_ner_dbert_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `dbert-ner` is a English model originally trained by `deeq`. + +## Predicted Entities + +`FLD-B`, `CVL-I`, `PLT-B`, `AFW-B`, `AFW-I`, `ORG-B`, `ORG-I`, `EVT-B`, `ANM-B`, `PER-I`, `NUM-B`, `MAT-I`, `PLT-I`, `PER-B`, `TIM-B`, `FLD-I`, `CVL-B`, `DAT-B`, `LOC-B`, `TRM-B`, `EVT-I`, `LOC-I`, `NUM-I`, `DAT-I`, `MAT-B`, `ANM-I`, `TRM-I`, `TIM-I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_dbert_ner_en_5.2.0_3.0_1699292826245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_dbert_ner_en_5.2.0_3.0_1699292826245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dbert_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dbert_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_deeq").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_dbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|421.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/deeq/dbert-ner \ No newline at end of file From 48280848a7fbd8ae06ec2897a22de4d1c9a72cb7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:49:03 +0700 Subject: [PATCH 065/667] Add model 2023-11-06-bert_ner_bert_german_ner_de --- .../2023-11-06-bert_ner_bert_german_ner_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_german_ner_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_german_ner_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_german_ner_de.md new file mode 100644 index 00000000000000..dcc10cf71d02b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_german_ner_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German bert_ner_bert_german_ner BertForTokenClassification from fhswf +author: John Snow Labs +name: bert_ner_bert_german_ner +date: 2023-11-06 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_german_ner` is a German model originally trained by fhswf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_german_ner_de_5.2.0_3.0_1699288005409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_german_ner_de_5.2.0_3.0_1699288005409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_german_ner","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_german_ner", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_german_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.9 MB| + +## References + +https://huggingface.co/fhswf/bert_de_ner \ No newline at end of file From 26607cc9954b33b23636c6781e1c920ba2353031 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:50:03 +0700 Subject: [PATCH 066/667] Add model 2023-11-06-bert_ner_brjezierski_bert_finetuned_ner_en --- ...t_ner_brjezierski_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_brjezierski_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_brjezierski_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_brjezierski_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..878da66c8664d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_brjezierski_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from brjezierski) +author: John Snow Labs +name: bert_ner_brjezierski_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `brjezierski`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_brjezierski_bert_finetuned_ner_en_5.2.0_3.0_1699292901234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_brjezierski_bert_finetuned_ner_en_5.2.0_3.0_1699292901234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_brjezierski_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_brjezierski_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_brjezierski").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_brjezierski_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/brjezierski/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 228ef983d95dcd9b3ea73bf25c2b6eac38a8a78a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:51:03 +0700 Subject: [PATCH 067/667] Add model 2023-11-06-bert_ner_host_en --- .../2023-11-06-bert_ner_host_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_host_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_host_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_host_en.md new file mode 100644 index 00000000000000..f4b12d649e7fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_host_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Maaly) +author: John Snow Labs +name: bert_ner_host +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `host` is a English model originally trained by `Maaly`. + +## Predicted Entities + +`host` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_host_en_5.2.0_3.0_1699293027968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_host_en_5.2.0_3.0_1699293027968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_host","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_host","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.host.by_maaly").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_host| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Maaly/host +- https://gitlab.com/maaly7/emerald_metagenomics_annotations \ No newline at end of file From 147f2732e68d4e5216d2621b9ae65ccbfe1dc720 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:52:04 +0700 Subject: [PATCH 068/667] Add model 2023-11-06-bert_ner_biored_dis_modified_pubmedbert_256_5_en --- ...biored_dis_modified_pubmedbert_256_5_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_256_5_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_256_5_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_256_5_en.md new file mode 100644 index 00000000000000..30b90c9962ff02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_256_5_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biored_dis_modified_pubmedbert_256_5 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_biored_dis_modified_pubmedbert_256_5 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biored_dis_modified_pubmedbert_256_5` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_modified_pubmedbert_256_5_en_5.2.0_3.0_1699276417238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_modified_pubmedbert_256_5_en_5.2.0_3.0_1699276417238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biored_dis_modified_pubmedbert_256_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biored_dis_modified_pubmedbert_256_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biored_dis_modified_pubmedbert_256_5| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioRed-Dis-Modified-PubMedBERT-256-5 \ No newline at end of file From 2920a8eb5abe919d685c94eab8a2888c01867065 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:53:04 +0700 Subject: [PATCH 069/667] Add model 2023-11-06-bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en --- ...t_tonga_tonga_islands_distilbert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en.md new file mode 100644 index 00000000000000..8bd2b3547e4889 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner BertForTokenClassification from kushaljoseph +author: John Snow Labs +name: bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner` is a English model originally trained by kushaljoseph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en_5.2.0_3.0_1699293125116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner_en_5.2.0_3.0_1699293125116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_kushaljoseph_bert_tonga_tonga_islands_distilbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/kushaljoseph/bert-to-distilbert-NER \ No newline at end of file From ebb0c590de21d67dc3b78911dc36108f06c06981 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:54:04 +0700 Subject: [PATCH 070/667] Add model 2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx --- ...concat_craft_spanish_stivenlancheros_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx.md new file mode 100644 index 00000000000000..34b4c76151bc5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros BertForTokenClassification from StivenLancheros +author: John Snow Labs +name: bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros +date: 2023-11-06 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros` is a Multilingual model originally trained by StivenLancheros. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx_5.2.0_3.0_1699289847969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros_xx_5.2.0_3.0_1699289847969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_base_cased_v1.2_finetuned_ner_concat_craft_spanish_stivenlancheros| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|403.7 MB| + +## References + +https://huggingface.co/StivenLancheros/biobert-base-cased-v1.2-finetuned-ner-Concat_CRAFT_es \ No newline at end of file From 227c14020cdf71cea7895ea6da63e38f0d93bb3a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:55:05 +0700 Subject: [PATCH 071/667] Add model 2023-11-06-bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en --- ..._imbalanced_biobert_base_casesd_v1.1_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en.md new file mode 100644 index 00000000000000..66b534f2c304aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Base Cased model (from ghadeermobasher) +author: John Snow Labs +name: bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bc4chemd-imbalanced-biobert-base-casesd-v1.1` is a English model originally trained by `ghadeermobasher`. + +## Predicted Entities + +`Chemical` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en_5.2.0_3.0_1699285462210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1_en_5.2.0_3.0_1699285462210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert.chemical.base_imbalanced").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc4chemd_imbalanced_biobert_base_casesd_v1.1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ghadeermobasher/bc4chemd-imbalanced-biobert-base-casesd-v1.1 \ No newline at end of file From f089bcf1a22d2416bc40b8bdd1c0980d76d300d6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:56:05 +0700 Subject: [PATCH 072/667] Add model 2023-11-06-bert_ner_jatinshah_bert_finetuned_ner_en --- ...ert_ner_jatinshah_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_jatinshah_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jatinshah_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jatinshah_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..cb98ac1c6980dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jatinshah_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from jatinshah) +author: John Snow Labs +name: bert_ner_jatinshah_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `jatinshah`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_jatinshah_bert_finetuned_ner_en_5.2.0_3.0_1699293312284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_jatinshah_bert_finetuned_ner_en_5.2.0_3.0_1699293312284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_jatinshah_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_jatinshah_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_jatinshah").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_jatinshah_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/jatinshah/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 230650404d541e1a612ddc7b189b05f0646a37ad Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:57:05 +0700 Subject: [PATCH 073/667] Add model 2023-11-06-bert_ner_codeswitch_hineng_ner_lince_hi --- ...bert_ner_codeswitch_hineng_ner_lince_hi.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_ner_lince_hi.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_ner_lince_hi.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_ner_lince_hi.md new file mode 100644 index 00000000000000..91676e0beb872b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_hineng_ner_lince_hi.md @@ -0,0 +1,110 @@ +--- +layout: model +title: Hindi Named Entity Recognition (from sagorsarker) +author: John Snow Labs +name: bert_ner_codeswitch_hineng_ner_lince +date: 2023-11-06 +tags: [bert, ner, token_classification, hi, open_source, onnx] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `codeswitch-hineng-ner-lince` is a Hindi model orginally trained by `sagorsarker`. + +## Predicted Entities + +`PERSON`, `ORGANISATION`, `PLACE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_hineng_ner_lince_hi_5.2.0_3.0_1699293356568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_hineng_ner_lince_hi_5.2.0_3.0_1699293356568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_hineng_ner_lince","hi") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["मुझे स्पार्क एनएलपी बहुत पसंद है"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_hineng_ner_lince","hi") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("मुझे स्पार्क एनएलपी बहुत पसंद है").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_codeswitch_hineng_ner_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|hi| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sagorsarker/codeswitch-hineng-ner-lince +- https://ritual.uh.edu/lince/home +- https://github.com/sagorbrur/codeswitch \ No newline at end of file From 76020cfd13755e33ba3dee87e546fe1ea2b41e33 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:58:06 +0700 Subject: [PATCH 074/667] Add model 2023-11-06-bert_ner_original_pubmedbert_ncbi_en --- ...06-bert_ner_original_pubmedbert_ncbi_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_ncbi_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_ncbi_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_ncbi_en.md new file mode 100644 index 00000000000000..e70dc50389ddd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_ncbi_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_pubmedbert_ncbi BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_pubmedbert_ncbi +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_pubmedbert_ncbi` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_pubmedbert_ncbi_en_5.2.0_3.0_1699281818269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_pubmedbert_ncbi_en_5.2.0_3.0_1699281818269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_pubmedbert_ncbi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_pubmedbert_ncbi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_pubmedbert_ncbi| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-PubMedBERT-NCBI \ No newline at end of file From 2dcc3c069fb9d05fcce59adbd79d384d85a0a595 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 00:59:06 +0700 Subject: [PATCH 075/667] Add model 2023-11-06-bert_ner_bert_base_german_cased_fine_tuned_ner_de --- ...ert_base_german_cased_fine_tuned_ner_de.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_fine_tuned_ner_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_fine_tuned_ner_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_fine_tuned_ner_de.md new file mode 100644 index 00000000000000..585921c7c11c12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_fine_tuned_ner_de.md @@ -0,0 +1,115 @@ +--- +layout: model +title: German BertForTokenClassification Base Cased model (from domischwimmbeck) +author: John Snow Labs +name: bert_ner_bert_base_german_cased_fine_tuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, de, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-german-cased-fine-tuned-ner` is a German model originally trained by `domischwimmbeck`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `OTH` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_german_cased_fine_tuned_ner_de_5.2.0_3.0_1699285743061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_german_cased_fine_tuned_ner_de_5.2.0_3.0_1699285743061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_german_cased_fine_tuned_ner","de") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ich liebe Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_german_cased_fine_tuned_ner","de") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ich liebe Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.bert.cased_base.by_domischwimmbeck").predict("""Ich liebe Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_german_cased_fine_tuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|406.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/domischwimmbeck/bert-base-german-cased-fine-tuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=germa_ner \ No newline at end of file From 4591bd8bf7e6ab9b889b5642fbfd944283791a27 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:00:06 +0700 Subject: [PATCH 076/667] Add model 2023-11-06-bert_ner_danish_bert_ner_da --- .../2023-11-06-bert_ner_danish_bert_ner_da.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_danish_bert_ner_da.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_danish_bert_ner_da.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_danish_bert_ner_da.md new file mode 100644 index 00000000000000..b03e2e1083cf81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_danish_bert_ner_da.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Danish bert_ner_danish_bert_ner BertForTokenClassification from DaNLP +author: John Snow Labs +name: bert_ner_danish_bert_ner +date: 2023-11-06 +tags: [bert, da, open_source, token_classification, onnx] +task: Named Entity Recognition +language: da +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_danish_bert_ner` is a Danish model originally trained by DaNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_danish_bert_ner_da_5.2.0_3.0_1699292560480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_danish_bert_ner_da_5.2.0_3.0_1699292560480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_danish_bert_ner","da") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_danish_bert_ner", "da") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_danish_bert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|da| +|Size:|412.3 MB| + +## References + +https://huggingface.co/DaNLP/da-bert-ner \ No newline at end of file From 15b97e6e2433fe97f7373b0df077008f44d850ca Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:01:06 +0700 Subject: [PATCH 077/667] Add model 2023-11-06-bert_ner_docusco_bert_en --- .../2023-11-06-bert_ner_docusco_bert_en.md | 122 ++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_docusco_bert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_docusco_bert_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_docusco_bert_en.md new file mode 100644 index 00000000000000..a925bf92e09560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_docusco_bert_en.md @@ -0,0 +1,122 @@ +--- +layout: model +title: English Named Entity Recognition (from browndw) +author: John Snow Labs +name: bert_ner_docusco_bert +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `docusco-bert` is a English model orginally trained by `browndw`. + +## Predicted Entities + +`Interactive`, `AcademicTerms`, `InformationChange`, `MetadiscourseCohesive`, `FirstPerson`, `InformationPlace`, `Updates`, `InformationChangeneritive`, `Reasoning`, `PublicTerms`, `Citation`, `Future`, `CitationHedged`, `InformationExnerition`, `Contingent`, `Strategic`, `PAD`, `CitationAuthority`, `Facilitate`, `Positive`, `ConfidenceHigh`, `InformationStates`, `AcademicWritingMoves`, `Uncertainty`, `SyntacticComplexity`, `Responsibility`, `Character`, `Narrative`, `MetadiscourseInteractive`, `InformationTopics`, `ConfidenceLow`, `ConfidenceHedged`, `ForceStressed`, `Negative`, `InformationChangeNegative`, `Description`, `Inquiry`, `InformationReportVerbs` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_docusco_bert_en_5.2.0_3.0_1699291798166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_docusco_bert_en_5.2.0_3.0_1699291798166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_docusco_bert","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_docusco_bert","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_browndw").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_docusco_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/browndw/docusco-bert +- https://www.english-corpora.org/coca/ +- https://www.cmu.edu/dietrich/english/research-and-publications/docuscope.html +- https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=docuscope&btnG= +- https://graphics.cs.wisc.edu/WP/vep/2017/02/14/guest-post-data-mining-king-lear/ +- https://journals.sagepub.com/doi/full/10.1177/2055207619844865 +- https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging) +- https://www.english-corpora.org/coca/ +- https://arxiv.org/pdf/1810.04805 \ No newline at end of file From f24f17a555988028812f941ee039ef37833a5d5e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:02:06 +0700 Subject: [PATCH 078/667] Add model 2023-11-06-bert_ner_keyword_tag_model_4000_en --- ...1-06-bert_ner_keyword_tag_model_4000_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_en.md new file mode 100644 index 00000000000000..e25f93983928d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_4000 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-4000` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_4000_en_5.2.0_3.0_1699292700679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_4000_en_5.2.0_3.0_1699292700679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_4000","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_4000","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.keyword_tag_model_4000.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_4000| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-4000 \ No newline at end of file From 9810f062ea19ed9f2fbc970ad0f4ae15bb1b845e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:03:07 +0700 Subject: [PATCH 079/667] Add model 2023-11-06-bert_ner_dsghrg_bert_finetuned_ner_en --- ...6-bert_ner_dsghrg_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_dsghrg_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dsghrg_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dsghrg_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..c9f7a622173510 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dsghrg_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from dsghrg) +author: John Snow Labs +name: bert_ner_dsghrg_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `dsghrg`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_dsghrg_bert_finetuned_ner_en_5.2.0_3.0_1699293628203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_dsghrg_bert_finetuned_ner_en_5.2.0_3.0_1699293628203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dsghrg_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dsghrg_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_dsghrg").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_dsghrg_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dsghrg/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 5ab54380a0dcd9e85c324629044cd3132acd7fae Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:04:07 +0700 Subject: [PATCH 080/667] Add model 2023-11-06-bert_ner_marathi_ner_mr --- .../2023-11-06-bert_ner_marathi_ner_mr.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_marathi_ner_mr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_marathi_ner_mr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_marathi_ner_mr.md new file mode 100644 index 00000000000000..ff9d032e9d8eae --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_marathi_ner_mr.md @@ -0,0 +1,110 @@ +--- +layout: model +title: Marathi Named Entity Recognition (from l3cube-pune) +author: John Snow Labs +name: bert_ner_marathi_ner +date: 2023-11-06 +tags: [bert, ner, token_classification, mr, open_source, onnx] +task: Named Entity Recognition +language: mr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `marathi-ner` is a Marathi model orginally trained by `l3cube-pune`. + +## Predicted Entities + +`Location`, `Time`, `Organization`, `Designation`, `Person`, `Other`, `Measure`, `Date` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_marathi_ner_mr_5.2.0_3.0_1699293776206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_marathi_ner_mr_5.2.0_3.0_1699293776206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_marathi_ner","mr") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["मला स्पार्क एनएलपी आवडते"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_marathi_ner","mr") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("मला स्पार्क एनएलपी आवडते").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_marathi_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mr| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/l3cube-pune/marathi-ner +- https://github.com/l3cube-pune/MarathiNLP +- https://arxiv.org/abs/2204.06029 \ No newline at end of file From c66e01ba5a5b23e98a6660ff4352ef8af505743e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:05:07 +0700 Subject: [PATCH 081/667] Add model 2023-11-06-bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en --- ...drasutrisnotjhong_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..caf6a70acd73dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from chandrasutrisnotjhong) +author: John Snow Labs +name: bert_ner_chandrasutrisnotjhong_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `chandrasutrisnotjhong`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en_5.2.0_3.0_1699290113067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_chandrasutrisnotjhong_bert_finetuned_ner_en_5.2.0_3.0_1699290113067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_chandrasutrisnotjhong_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_chandrasutrisnotjhong_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_chandrasutrisnotjhong").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_chandrasutrisnotjhong_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/chandrasutrisnotjhong/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 9a45182228abf6620bbfa8df7201d64787749871 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:06:07 +0700 Subject: [PATCH 082/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner1_en --- ...3-11-06-bert_ner_bert_finetuned_ner1_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner1_en.md new file mode 100644 index 00000000000000..efefe8d6ffaf49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner1_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Wende) +author: John Snow Labs +name: bert_ner_bert_finetuned_ner1 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner1` is a English model originally trained by `Wende`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner1_en_5.2.0_3.0_1699289487602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner1_en_5.2.0_3.0_1699289487602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner1","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner1","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned_v2.by_Wende").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Wende/bert-finetuned-ner1 +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From b6f7f1b904f8ca201f9044a12dfedcd05e058ab8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:07:07 +0700 Subject: [PATCH 083/667] Add model 2023-11-06-bert_ner_jrubin01_bert_finetuned_ner_en --- ...bert_ner_jrubin01_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_jrubin01_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jrubin01_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jrubin01_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..292f5adcac6b86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jrubin01_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from jrubin01) +author: John Snow Labs +name: bert_ner_jrubin01_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `jrubin01`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_jrubin01_bert_finetuned_ner_en_5.2.0_3.0_1699293922897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_jrubin01_bert_finetuned_ner_en_5.2.0_3.0_1699293922897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_jrubin01_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_jrubin01_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_jrubin01").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_jrubin01_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/jrubin01/bert-finetuned-ner \ No newline at end of file From 8cdef2a5bb90b02aebd99c6fdab7902f77dc40ea Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:08:08 +0700 Subject: [PATCH 084/667] Add model 2023-11-06-bert_ner_bionlp13cg_chem_original_pubmedbert_512_en --- ...nlp13cg_chem_original_pubmedbert_512_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_original_pubmedbert_512_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_original_pubmedbert_512_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_original_pubmedbert_512_en.md new file mode 100644 index 00000000000000..4ab5ca7bf71e2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_original_pubmedbert_512_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bionlp13cg_chem_original_pubmedbert_512 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bionlp13cg_chem_original_pubmedbert_512 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bionlp13cg_chem_original_pubmedbert_512` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bionlp13cg_chem_original_pubmedbert_512_en_5.2.0_3.0_1699275329371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bionlp13cg_chem_original_pubmedbert_512_en_5.2.0_3.0_1699275329371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bionlp13cg_chem_original_pubmedbert_512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bionlp13cg_chem_original_pubmedbert_512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bionlp13cg_chem_original_pubmedbert_512| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioNLP13CG-Chem-Original-PubMedBERT-512 \ No newline at end of file From 88ab904318346bbc2cf700af8bc73bc92de6de27 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:09:08 +0700 Subject: [PATCH 085/667] Add model 2023-11-06-bert_ner_datauma_bert_finetuned_ner_en --- ...-bert_ner_datauma_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_datauma_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_datauma_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_datauma_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..ab7b2ed229ee6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_datauma_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from datauma) +author: John Snow Labs +name: bert_ner_datauma_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `datauma`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_datauma_bert_finetuned_ner_en_5.2.0_3.0_1699293996752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_datauma_bert_finetuned_ner_en_5.2.0_3.0_1699293996752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_datauma_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_datauma_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_datauma").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_datauma_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/datauma/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From e8e40300f064d318b18d6cc68aa29d038d664259 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:10:08 +0700 Subject: [PATCH 086/667] Add model 2023-11-06-bert_ner_fancyerii_bert_finetuned_ner_en --- ...ert_ner_fancyerii_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_fancyerii_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_fancyerii_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_fancyerii_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..6351c22d2d0858 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_fancyerii_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from fancyerii) +author: John Snow Labs +name: bert_ner_fancyerii_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `fancyerii`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_fancyerii_bert_finetuned_ner_en_5.2.0_3.0_1699294144516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_fancyerii_bert_finetuned_ner_en_5.2.0_3.0_1699294144516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_fancyerii_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_fancyerii_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_fancyerii").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_fancyerii_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/fancyerii/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 42adcf5cf3033cf316eb2975888e9bb668dd1610 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:11:08 +0700 Subject: [PATCH 087/667] Add model 2023-11-06-bert_ner_keyword_tag_model_6000_en --- ...1-06-bert_ner_keyword_tag_model_6000_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_en.md new file mode 100644 index 00000000000000..e4ff00bf96f535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_6000 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-6000` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_6000_en_5.2.0_3.0_1699294209815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_6000_en_5.2.0_3.0_1699294209815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_6000","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_6000","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.keyword_tag_model_6000.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_6000| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-6000 \ No newline at end of file From 0a10f1bd2583802e203b2e1af6d6b3a73809a93b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:12:08 +0700 Subject: [PATCH 088/667] Add model 2023-11-06-bert_ner_bert_base_uncased_clinical_ner_en --- ...t_ner_bert_base_uncased_clinical_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_clinical_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_clinical_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_clinical_ner_en.md new file mode 100644 index 00000000000000..baa491c3b818fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_clinical_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Base Uncased model (from samrawal) +author: John Snow Labs +name: bert_ner_bert_base_uncased_clinical_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-uncased_clinical-ner` is a English model originally trained by `samrawal`. + +## Predicted Entities + +`treatment`, `problem`, `test` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_uncased_clinical_ner_en_5.2.0_3.0_1699288511788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_uncased_clinical_ner_en_5.2.0_3.0_1699288511788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_uncased_clinical_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_uncased_clinical_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.clinical.uncased_base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_uncased_clinical_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|false| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/samrawal/bert-base-uncased_clinical-ner +- https://n2c2.dbmi.hms.harvard.edu \ No newline at end of file From 0204520f7832fc0f6191cceca67556f77e150aa4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:13:08 +0700 Subject: [PATCH 089/667] Add model 2023-11-06-bert_ner_autonlp_prodigy_10_3362554_en --- ...-bert_ner_autonlp_prodigy_10_3362554_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_prodigy_10_3362554_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_prodigy_10_3362554_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_prodigy_10_3362554_en.md new file mode 100644 index 00000000000000..e48e8cbe021a91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_prodigy_10_3362554_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English Named Entity Recognition (from abhishek) +author: John Snow Labs +name: bert_ner_autonlp_prodigy_10_3362554 +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `autonlp-prodigy-10-3362554` is a English model orginally trained by `abhishek`. + +## Predicted Entities + +`LOCATION`, `PERSON`, `ORG`, `PRODUCT` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_autonlp_prodigy_10_3362554_en_5.2.0_3.0_1699285552337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_autonlp_prodigy_10_3362554_en_5.2.0_3.0_1699285552337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_autonlp_prodigy_10_3362554","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_autonlp_prodigy_10_3362554","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.prodigy").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_autonlp_prodigy_10_3362554| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/abhishek/autonlp-prodigy-10-3362554 \ No newline at end of file From 1e69e911945b832e44a0c36327ebd60bb2844d44 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:14:09 +0700 Subject: [PATCH 090/667] Add model 2023-11-06-bert_ner_biobert_genetic_ner_en --- ...3-11-06-bert_ner_biobert_genetic_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_genetic_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_genetic_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_genetic_ner_en.md new file mode 100644 index 00000000000000..9287980ea28a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_genetic_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from alvaroalon2) +author: John Snow Labs +name: bert_ner_biobert_genetic_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biobert_genetic_ner` is a English model originally trained by `alvaroalon2`. + +## Predicted Entities + +`GENETIC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_genetic_ner_en_5.2.0_3.0_1699291599788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_genetic_ner_en_5.2.0_3.0_1699291599788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_genetic_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_genetic_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_genetic_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/alvaroalon2/biobert_genetic_ner +- https://github.com/librairy/bio-ner \ No newline at end of file From ae9e2ea1d215570d9e2c2365e1434eac9fc0dcec Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:15:09 +0700 Subject: [PATCH 091/667] Add model 2023-11-06-bert_ner_dpuccine_bert_finetuned_ner_en --- ...bert_ner_dpuccine_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_dpuccine_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dpuccine_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dpuccine_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..7f1a7c5349e39f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dpuccine_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from dpuccine) +author: John Snow Labs +name: bert_ner_dpuccine_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `dpuccine`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_dpuccine_bert_finetuned_ner_en_5.2.0_3.0_1699294296104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_dpuccine_bert_finetuned_ner_en_5.2.0_3.0_1699294296104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dpuccine_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dpuccine_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_dpuccine").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_dpuccine_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dpuccine/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 1e6b5bfbb7da0f47ccf296cb7e92a1f6fdbe4d5d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:16:09 +0700 Subject: [PATCH 092/667] Add model 2023-11-06-bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en --- ..._tag_model_2000_9_16_more_ingredient_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en.md new file mode 100644 index 00000000000000..fd51b67c43d60e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_2000_9_16_more_ingredient +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-2000-9-16_more_ingredient` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`negingredient`, `occasion`, `mealcourse`, `cuisines`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en_5.2.0_3.0_1699294455704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_2000_9_16_more_ingredient_en_5.2.0_3.0_1699294455704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_2000_9_16_more_ingredient","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_2000_9_16_more_ingredient","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.ingredient.2000_9_16.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_2000_9_16_more_ingredient| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-2000-9-16_more_ingredient \ No newline at end of file From 32ba8d74b9377f31b61120273cf3f1ea711c891a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:17:09 +0700 Subject: [PATCH 093/667] Add model 2023-11-06-bert_ner_biored_dis_modified_pubmedbert_320_8_10_en --- ...red_dis_modified_pubmedbert_320_8_10_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_320_8_10_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_320_8_10_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_320_8_10_en.md new file mode 100644 index 00000000000000..b0a8781c86812c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_320_8_10_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biored_dis_modified_pubmedbert_320_8_10 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_biored_dis_modified_pubmedbert_320_8_10 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biored_dis_modified_pubmedbert_320_8_10` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_modified_pubmedbert_320_8_10_en_5.2.0_3.0_1699278243282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_modified_pubmedbert_320_8_10_en_5.2.0_3.0_1699278243282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biored_dis_modified_pubmedbert_320_8_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biored_dis_modified_pubmedbert_320_8_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biored_dis_modified_pubmedbert_320_8_10| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioRed-Dis-Modified-PubMedBERT-320-8-10 \ No newline at end of file From d82523bbaa3aae32a38c2c72157d3b9e7989a086 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:18:09 +0700 Subject: [PATCH 094/667] Add model 2023-11-06-bert_ner_keyword_tag_model_6000_v2_en --- ...6-bert_ner_keyword_tag_model_6000_v2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_v2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_v2_en.md new file mode 100644 index 00000000000000..e42d219f31a43f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_v2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_6000_v2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-6000-v2` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_6000_v2_en_5.2.0_3.0_1699294516084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_6000_v2_en_5.2.0_3.0_1699294516084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_6000_v2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_6000_v2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.v2.6000_v2.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_6000_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-6000-v2 \ No newline at end of file From 83a914711c3af869107e27a2e514a7604ba46e7f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:19:09 +0700 Subject: [PATCH 095/667] Add model 2023-11-06-bert_ner_german_press_bert_de --- ...023-11-06-bert_ner_german_press_bert_de.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_german_press_bert_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_german_press_bert_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_german_press_bert_de.md new file mode 100644 index 00000000000000..0f3b7cc039c348 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_german_press_bert_de.md @@ -0,0 +1,114 @@ +--- +layout: model +title: German BertForTokenClassification Cased model (from severinsimmler) +author: John Snow Labs +name: bert_ner_german_press_bert +date: 2023-11-06 +tags: [bert, ner, open_source, de, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `german-press-bert` is a German model originally trained by `severinsimmler`. + +## Predicted Entities + +`PER`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_german_press_bert_de_5.2.0_3.0_1699294652431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_german_press_bert_de_5.2.0_3.0_1699294652431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_german_press_bert","de") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ich liebe Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_german_press_bert","de") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ich liebe Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.bert.by_severinsimmler").predict("""Ich liebe Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_german_press_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.8 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/severinsimmler/german-press-bert \ No newline at end of file From 84937b06f1376b2b4ea72a34ba5a36be4d3a7ca5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:20:10 +0700 Subject: [PATCH 096/667] Add model 2023-11-06-bert_ner_keyword_tag_model_3000_v2_en --- ...6-bert_ner_keyword_tag_model_3000_v2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_3000_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_3000_v2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_3000_v2_en.md new file mode 100644 index 00000000000000..3abde52b5e774a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_3000_v2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_3000_v2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-3000-v2` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_3000_v2_en_5.2.0_3.0_1699294714859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_3000_v2_en_5.2.0_3.0_1699294714859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_3000_v2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_3000_v2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.v2.3000_v2.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_3000_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-3000-v2 \ No newline at end of file From 27cb0ddd2b35377853bdd81822ad86c2c5148541 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:21:10 +0700 Subject: [PATCH 097/667] Add model 2023-11-06-bert_ner_roberta_base_finetuned_cluener2020_chinese_zh --- ...a_base_finetuned_cluener2020_chinese_zh.md | 118 ++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_roberta_base_finetuned_cluener2020_chinese_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_roberta_base_finetuned_cluener2020_chinese_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_roberta_base_finetuned_cluener2020_chinese_zh.md new file mode 100644 index 00000000000000..ab156bbaa3ca6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_roberta_base_finetuned_cluener2020_chinese_zh.md @@ -0,0 +1,118 @@ +--- +layout: model +title: Chinese Named Entity Recognition (from uer) +author: John Snow Labs +name: bert_ner_roberta_base_finetuned_cluener2020_chinese +date: 2023-11-06 +tags: [bert, ner, token_classification, zh, open_source, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `roberta-base-finetuned-cluener2020-chinese` is a Chinese model orginally trained by `uer`. + +## Predicted Entities + +`position`, `company`, `address`, `movie`, `organization`, `game`, `name`, `book`, `government`, `scene` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_roberta_base_finetuned_cluener2020_chinese_zh_5.2.0_3.0_1699294710756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_roberta_base_finetuned_cluener2020_chinese_zh_5.2.0_3.0_1699294710756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_roberta_base_finetuned_cluener2020_chinese","zh") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_roberta_base_finetuned_cluener2020_chinese","zh") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("zh.ner.bert.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_roberta_base_finetuned_cluener2020_chinese| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|380.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/uer/roberta-base-finetuned-cluener2020-chinese +- https://github.com/dbiir/UER-py/wiki/Modelzoo +- https://github.com/CLUEbenchmark/CLUENER2020 +- https://github.com/dbiir/UER-py/ +- https://cloud.tencent.com/ \ No newline at end of file From 9775c5465c6fb0b187025e3a59c5cd0571f46a10 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:22:10 +0700 Subject: [PATCH 098/667] Add model 2023-11-06-bert_ner_hebert_ner_he --- .../2023-11-06-bert_ner_hebert_ner_he.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_hebert_ner_he.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hebert_ner_he.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hebert_ner_he.md new file mode 100644 index 00000000000000..75c720c073ec2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hebert_ner_he.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Hebrew bert_ner_hebert_ner BertForTokenClassification from avichr +author: John Snow Labs +name: bert_ner_hebert_ner +date: 2023-11-06 +tags: [bert, he, open_source, token_classification, onnx] +task: Named Entity Recognition +language: he +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_hebert_ner` is a Hebrew model originally trained by avichr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_hebert_ner_he_5.2.0_3.0_1699294856711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_hebert_ner_he_5.2.0_3.0_1699294856711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_hebert_ner","he") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_hebert_ner", "he") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_hebert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|he| +|Size:|408.1 MB| + +## References + +https://huggingface.co/avichr/heBERT_NER \ No newline at end of file From 8b24a500264134cea295c96bd53d0410d5fc5352 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:23:11 +0700 Subject: [PATCH 099/667] Add model 2023-11-06-bert_ner_scibert_scivocab_cased_sdu21_ai_en --- ..._ner_scibert_scivocab_cased_sdu21_ai_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_sdu21_ai_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_sdu21_ai_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_sdu21_ai_en.md new file mode 100644 index 00000000000000..03c896205037a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_sdu21_ai_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_scibert_scivocab_cased_sdu21_ai BertForTokenClassification from napsternxg +author: John Snow Labs +name: bert_ner_scibert_scivocab_cased_sdu21_ai +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_scibert_scivocab_cased_sdu21_ai` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_cased_sdu21_ai_en_5.2.0_3.0_1699294901728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_cased_sdu21_ai_en_5.2.0_3.0_1699294901728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_cased_sdu21_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_scibert_scivocab_cased_sdu21_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_scibert_scivocab_cased_sdu21_ai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/napsternxg/scibert_scivocab_cased_SDU21_AI \ No newline at end of file From 7f93139306ab4edc83781096001d2b648b532733 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:24:11 +0700 Subject: [PATCH 100/667] Add model 2023-11-06-bert_ner_ludoviciarraga_bert_finetuned_ner_en --- ...er_ludoviciarraga_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ludoviciarraga_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ludoviciarraga_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ludoviciarraga_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..0567b87b031ae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ludoviciarraga_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from ludoviciarraga) +author: John Snow Labs +name: bert_ner_ludoviciarraga_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `ludoviciarraga`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ludoviciarraga_bert_finetuned_ner_en_5.2.0_3.0_1699294875593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ludoviciarraga_bert_finetuned_ner_en_5.2.0_3.0_1699294875593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ludoviciarraga_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ludoviciarraga_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_ludoviciarraga").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ludoviciarraga_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ludoviciarraga/bert-finetuned-ner \ No newline at end of file From d51ca5b8f8e9efbb3796365f77932d68b9c8a7df Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:25:11 +0700 Subject: [PATCH 101/667] Add model 2023-11-06-bert_ner_bionlp13cg_modified_scibert_uncased_latest_en --- ...13cg_modified_scibert_uncased_latest_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_modified_scibert_uncased_latest_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_modified_scibert_uncased_latest_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_modified_scibert_uncased_latest_en.md new file mode 100644 index 00000000000000..2f1ae54999e4a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_modified_scibert_uncased_latest_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bionlp13cg_modified_scibert_uncased_latest BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bionlp13cg_modified_scibert_uncased_latest +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bionlp13cg_modified_scibert_uncased_latest` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bionlp13cg_modified_scibert_uncased_latest_en_5.2.0_3.0_1699275701881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bionlp13cg_modified_scibert_uncased_latest_en_5.2.0_3.0_1699275701881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bionlp13cg_modified_scibert_uncased_latest","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bionlp13cg_modified_scibert_uncased_latest", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bionlp13cg_modified_scibert_uncased_latest| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioNLP13CG-Modified-scibert-uncased_latest \ No newline at end of file From 34b7484563b2e8c6cf46e6ede59cfdc08252c101 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:26:11 +0700 Subject: [PATCH 102/667] Add model 2023-11-06-bert_ner_original_pubmedbert_bc5cdr_disease_en --- ...r_original_pubmedbert_bc5cdr_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_bc5cdr_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_bc5cdr_disease_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_bc5cdr_disease_en.md new file mode 100644 index 00000000000000..62d49b9fd3ea83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_pubmedbert_bc5cdr_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_pubmedbert_bc5cdr_disease BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_pubmedbert_bc5cdr_disease +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_pubmedbert_bc5cdr_disease` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_pubmedbert_bc5cdr_disease_en_5.2.0_3.0_1699279985989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_pubmedbert_bc5cdr_disease_en_5.2.0_3.0_1699279985989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_pubmedbert_bc5cdr_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_pubmedbert_bc5cdr_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_pubmedbert_bc5cdr_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-PubMedBERT-BC5CDR-disease \ No newline at end of file From 405d4866e967720d6ba557b205e8645c5d35f04d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:27:11 +0700 Subject: [PATCH 103/667] Add model 2023-11-06-bert_ner_distilbert_finetuned_ner_en --- ...06-bert_ner_distilbert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_distilbert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_distilbert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_distilbert_finetuned_ner_en.md new file mode 100644 index 00000000000000..7e983f81ecd10f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_distilbert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from EhsanYB) +author: John Snow Labs +name: bert_ner_distilbert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `distilbert-finetuned-ner` is a English model originally trained by `EhsanYB`. + +## Predicted Entities + +`PER`, `ORG`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_distilbert_finetuned_ner_en_5.2.0_3.0_1699293383077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_distilbert_finetuned_ner_en_5.2.0_3.0_1699293383077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_distilbert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_distilbert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.distilled_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_distilbert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/EhsanYB/distilbert-finetuned-ner \ No newline at end of file From ae6679cfabb4d40020c406901a80223310e3f58a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:28:12 +0700 Subject: [PATCH 104/667] Add model 2023-11-06-bert_ner_mbert_base_uncased_kinyarwanda_kin --- ..._ner_mbert_base_uncased_kinyarwanda_kin.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_kinyarwanda_kin.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_kinyarwanda_kin.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_kinyarwanda_kin.md new file mode 100644 index 00000000000000..34159f08ea0c7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_kinyarwanda_kin.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Kinyarwanda bert_ner_mbert_base_uncased_kinyarwanda BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_mbert_base_uncased_kinyarwanda +date: 2023-11-06 +tags: [bert, kin, open_source, token_classification, onnx] +task: Named Entity Recognition +language: kin +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_uncased_kinyarwanda` is a Kinyarwanda model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_kinyarwanda_kin_5.2.0_3.0_1699295120508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_kinyarwanda_kin_5.2.0_3.0_1699295120508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_uncased_kinyarwanda","kin") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_uncased_kinyarwanda", "kin") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_uncased_kinyarwanda| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|kin| +|Size:|665.1 MB| + +## References + +https://huggingface.co/arnolfokam/mbert-base-uncased-kin \ No newline at end of file From 42fef5128bd160a3eaf3684c6591cf112ec7bd7c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:29:12 +0700 Subject: [PATCH 105/667] Add model 2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en --- ...cal_cases_ner_mbert_cased_fine_tuned_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en.md new file mode 100644 index 00000000000000..e793378e8af1c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en_5.2.0_3.0_1699280588888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned_en_5.2.0_3.0_1699280588888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nlp_cic_wfu_clinical_cases_ner_mbert_cased_fine_tuned| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/NLP-CIC-WFU_Clinical_Cases_NER_mBERT_cased_fine_tuned \ No newline at end of file From b02180f2185b47d141e25a9aebc0654bfd00fbf6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:30:12 +0700 Subject: [PATCH 106/667] Add model 2023-11-06-bert_ner_meddocan_beto_ner_es --- ...023-11-06-bert_ner_meddocan_beto_ner_es.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_meddocan_beto_ner_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_meddocan_beto_ner_es.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_meddocan_beto_ner_es.md new file mode 100644 index 00000000000000..91803145a4fe25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_meddocan_beto_ner_es.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Spanish BertForTokenClassification Cased model (from rjuez00) +author: John Snow Labs +name: bert_ner_meddocan_beto_ner +date: 2023-11-06 +tags: [bert, ner, open_source, es, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `meddocan-beto-ner` is a Spanish model originally trained by `rjuez00`. + +## Predicted Entities + +`CALLE`, `NUMERO_FAX`, `FECHAS`, `CENTRO_SALUD`, `INSTITUCION`, `PROFESION`, `ID_EMPLEO_PERSONAL_SANITARIO`, `SEXO_SUJETO_ASISTENCIA`, `PAIS`, `FAMILIARES_SUJETO_ASISTENCIA`, `EDAD_SUJETO_ASISTENCIA`, `CORREO_ELECTRONICO`, `NUMERO_TELEFONO`, `HOSPITAL`, `ID_CONTACTO_ASISTENCIAL`, `ID_ASEGURAMIENTO`, `OTROS_SUJETO_ASISTENCIA`, `NOMBRE_SUJETO_ASISTENCIA`, `ID_SUJETO_ASISTENCIA`, `NOMBRE_PERSONAL_SANITARIO`, `ID_TITULACION_PERSONAL_SANITARIO`, `TERRITORIO` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_meddocan_beto_ner_es_5.2.0_3.0_1699294266340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_meddocan_beto_ner_es_5.2.0_3.0_1699294266340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_meddocan_beto_ner","es") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Amo Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_meddocan_beto_ner","es") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Amo Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("es.ner.beto_bert").predict("""Amo Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_meddocan_beto_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/rjuez00/meddocan-beto-ner \ No newline at end of file From 2f5cd6600747d5597a1a1f20acc1b912c06947e5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:31:12 +0700 Subject: [PATCH 107/667] Add model 2023-11-06-bert_ner_keyword_tag_model_2000_9_16_en --- ...bert_ner_keyword_tag_model_2000_9_16_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_en.md new file mode 100644 index 00000000000000..a22f3291d44bf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_9_16_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_2000_9_16 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-2000-9-16` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`negingredient`, `occasion`, `mealcourse`, `cuisines`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_2000_9_16_en_5.2.0_3.0_1699295306624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_2000_9_16_en_5.2.0_3.0_1699295306624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_2000_9_16","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_2000_9_16","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.keyword_tag_model_2000_9_16.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_2000_9_16| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-2000-9-16 \ No newline at end of file From d49e4f15ca8df5878f023011b6c623dec5a73837 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:32:12 +0700 Subject: [PATCH 108/667] Add model 2023-11-06-bert_ner_core_term_ner_v1_en --- ...2023-11-06-bert_ner_core_term_ner_v1_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_core_term_ner_v1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_core_term_ner_v1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_core_term_ner_v1_en.md new file mode 100644 index 00000000000000..bfed20f55dafe6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_core_term_ner_v1_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from leemeng) +author: John Snow Labs +name: bert_ner_core_term_ner_v1 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `core-term-ner-v1` is a English model originally trained by `leemeng`. + +## Predicted Entities + +`CORE`, `E-CORE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_core_term_ner_v1_en_5.2.0_3.0_1699293639702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_core_term_ner_v1_en_5.2.0_3.0_1699293639702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_core_term_ner_v1","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_core_term_ner_v1","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_leemeng").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_core_term_ner_v1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/leemeng/core-term-ner-v1 \ No newline at end of file From 746664ddd01e5ff3146075c22b5c3deae1864f08 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:33:13 +0700 Subject: [PATCH 109/667] Add model 2023-11-06-bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en --- ...t_tonga_tonga_islands_distilbert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en.md new file mode 100644 index 00000000000000..42a2996f6a4c34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner BertForTokenClassification from kaushalkhator +author: John Snow Labs +name: bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner` is a English model originally trained by kaushalkhator. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en_5.2.0_3.0_1699295523105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner_en_5.2.0_3.0_1699295523105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_kaushalkhator_bert_tonga_tonga_islands_distilbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/kaushalkhator/bert-to-distilbert-NER \ No newline at end of file From 9f6715c84075197c918a5a8704fd50876b5e1fa6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:34:13 +0700 Subject: [PATCH 110/667] Add model 2023-11-06-bert_ner_jdang_bert_finetuned_ner_en --- ...06-bert_ner_jdang_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_jdang_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jdang_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jdang_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..ca3db9b0b93850 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_jdang_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from jdang) +author: John Snow Labs +name: bert_ner_jdang_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `jdang`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_jdang_bert_finetuned_ner_en_5.2.0_3.0_1699293584567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_jdang_bert_finetuned_ner_en_5.2.0_3.0_1699293584567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_jdang_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_jdang_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_jdang").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_jdang_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/jdang/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 6a8295f5942869784c48571753d1933d1263e302 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:35:13 +0700 Subject: [PATCH 111/667] Add model 2023-11-06-bert_ner_codeswitch_spaeng_ner_lince_en --- ...bert_ner_codeswitch_spaeng_ner_lince_en.md | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_ner_lince_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_ner_lince_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_ner_lince_en.md new file mode 100644 index 00000000000000..96356ddb2d18b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_codeswitch_spaeng_ner_lince_en.md @@ -0,0 +1,116 @@ +--- +layout: model +title: English Named Entity Recognition (from sagorsarker) +author: John Snow Labs +name: bert_ner_codeswitch_spaeng_ner_lince +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `codeswitch-spaeng-ner-lince` is a English model orginally trained by `sagorsarker`. + +## Predicted Entities + +`LOC`, `TIME`, `PER`, `PROD`, `TITLE`, `OTHER`, `GROUP`, `ORG`, `EVENT` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_spaeng_ner_lince_en_5.2.0_3.0_1699292369140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_codeswitch_spaeng_ner_lince_en_5.2.0_3.0_1699292369140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_spaeng_ner_lince","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_codeswitch_spaeng_ner_lince","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.codeswitch_spaeng_ner_lince.by_sagorsarker").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_codeswitch_spaeng_ner_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sagorsarker/codeswitch-spaeng-ner-lince +- https://ritual.uh.edu/lince/home +- https://github.com/sagorbrur/codeswitch \ No newline at end of file From 1b72971bac539f682c58f0950ca22765200cd18f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:36:13 +0700 Subject: [PATCH 112/667] Add model 2023-11-06-bert_ner_mbert_base_biomedical_ner_en --- ...6-bert_ner_mbert_base_biomedical_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_biomedical_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_biomedical_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_biomedical_ner_en.md new file mode 100644 index 00000000000000..53258ae77485ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_biomedical_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_mbert_base_biomedical_ner BertForTokenClassification from StivenLancheros +author: John Snow Labs +name: bert_ner_mbert_base_biomedical_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_biomedical_ner` is a English model originally trained by StivenLancheros. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_biomedical_ner_en_5.2.0_3.0_1699295535389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_biomedical_ner_en_5.2.0_3.0_1699295535389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_biomedical_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_biomedical_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_biomedical_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/StivenLancheros/mBERT-base-Biomedical-NER \ No newline at end of file From 93ded4604ca128206d476fcbb6e641236d825ffc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:37:13 +0700 Subject: [PATCH 113/667] Add model 2023-11-06-bert_ner_bert_spanish_cased_finetuned_ner_es --- ...ner_bert_spanish_cased_finetuned_ner_es.md | 118 ++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_spanish_cased_finetuned_ner_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_spanish_cased_finetuned_ner_es.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_spanish_cased_finetuned_ner_es.md new file mode 100644 index 00000000000000..a9ce8c143c5aab --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_spanish_cased_finetuned_ner_es.md @@ -0,0 +1,118 @@ +--- +layout: model +title: Spanish Named Entity Recognition (from mrm8488) +author: John Snow Labs +name: bert_ner_bert_spanish_cased_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, token_classification, es, open_source, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `bert-spanish-cased-finetuned-ner` is a Spanish model orginally trained by `mrm8488`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_spanish_cased_finetuned_ner_es_5.2.0_3.0_1699290603269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_spanish_cased_finetuned_ner_es_5.2.0_3.0_1699290603269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_spanish_cased_finetuned_ner","es") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Amo Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_spanish_cased_finetuned_ner","es") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Amo Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("es.ner.bert.cased_finetuned").predict("""Amo Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_spanish_cased_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner +- https://www.kaggle.com/nltkdata/conll-corpora +- https://github.com/dccuchile/beto +- https://www.kaggle.com/nltkdata/conll-corpora +- https://twitter.com/mrm8488 \ No newline at end of file From eaf3b80056fe8ae609f1f652ccfaa4ea22aacc12 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:38:14 +0700 Subject: [PATCH 114/667] Add model 2023-11-06-bert_ner_kurianbenoy_bert_finetuned_ner_en --- ...t_ner_kurianbenoy_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurianbenoy_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurianbenoy_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurianbenoy_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..0e802a1c85e9e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurianbenoy_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from kurianbenoy) +author: John Snow Labs +name: bert_ner_kurianbenoy_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `kurianbenoy`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_kurianbenoy_bert_finetuned_ner_en_5.2.0_3.0_1699295792667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_kurianbenoy_bert_finetuned_ner_en_5.2.0_3.0_1699295792667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kurianbenoy_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kurianbenoy_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_kurianbenoy").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_kurianbenoy_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/kurianbenoy/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From c43c4d59151e1f94bb48b99c77a37f2a51cbbadc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:39:14 +0700 Subject: [PATCH 115/667] Add model 2023-11-06-bert_ner_mattchurgin_bert_finetuned_ner_en --- ...t_ner_mattchurgin_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mattchurgin_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mattchurgin_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mattchurgin_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..2cff29e54b2149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mattchurgin_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mattchurgin) +author: John Snow Labs +name: bert_ner_mattchurgin_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `mattchurgin`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mattchurgin_bert_finetuned_ner_en_5.2.0_3.0_1699295792178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mattchurgin_bert_finetuned_ner_en_5.2.0_3.0_1699295792178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mattchurgin_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mattchurgin_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_mattchurgin").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mattchurgin_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mattchurgin/bert-finetuned-ner \ No newline at end of file From d0c17bc95420d0e8f7abee9ffd4d6e5de92e9878 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:40:14 +0700 Subject: [PATCH 116/667] Add model 2023-11-06-bert_ner_legalbert_clause_combined_en --- ...6-bert_ner_legalbert_clause_combined_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_clause_combined_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_clause_combined_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_clause_combined_en.md new file mode 100644 index 00000000000000..88fa8df5211e1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_clause_combined_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Anery) +author: John Snow Labs +name: bert_ner_legalbert_clause_combined +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `legalbert_clause_combined` is a English model originally trained by `Anery`. + +## Predicted Entities + +`AC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_legalbert_clause_combined_en_5.2.0_3.0_1699293384321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_legalbert_clause_combined_en_5.2.0_3.0_1699293384321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_legalbert_clause_combined","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_legalbert_clause_combined","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.legal.by_anery").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_legalbert_clause_combined| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|130.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Anery/legalbert_clause_combined \ No newline at end of file From 8ccd4f4e45adea40597bd9ebe852312df9d5f816 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:41:14 +0700 Subject: [PATCH 117/667] Add model 2023-11-06-bert_ner_bert_base_german_cased_own_data_ner_de --- ..._bert_base_german_cased_own_data_ner_de.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_own_data_ner_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_own_data_ner_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_own_data_ner_de.md new file mode 100644 index 00000000000000..8aaf6d79db3a6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_german_cased_own_data_ner_de.md @@ -0,0 +1,114 @@ +--- +layout: model +title: German BertForTokenClassification Base Cased model (from domischwimmbeck) +author: John Snow Labs +name: bert_ner_bert_base_german_cased_own_data_ner +date: 2023-11-06 +tags: [bert, ner, open_source, de, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-german-cased-own-data-ner` is a German model originally trained by `domischwimmbeck`. + +## Predicted Entities + +`PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_german_cased_own_data_ner_de_5.2.0_3.0_1699286002849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_german_cased_own_data_ner_de_5.2.0_3.0_1699286002849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_german_cased_own_data_ner","de") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ich liebe Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_german_cased_own_data_ner","de") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ich liebe Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.bert.own_data.cased_base.by_domischwimmbeck").predict("""Ich liebe Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_german_cased_own_data_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|406.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/domischwimmbeck/bert-base-german-cased-own-data-ner \ No newline at end of file From 48bd7baf41e4ed7246dd3b1a15f14f254fa67762 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:42:14 +0700 Subject: [PATCH 118/667] Add model 2023-11-06-bert_ner_testingmodel_en --- .../2023-11-06-bert_ner_testingmodel_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_testingmodel_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_testingmodel_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_testingmodel_en.md new file mode 100644 index 00000000000000..40b81b76744f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_testingmodel_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from superman) +author: John Snow Labs +name: bert_ner_testingmodel +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `testingmodel` is a English model originally trained by `superman`. + +## Predicted Entities + +`EPI`, `LOC`, `STAT` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_testingmodel_en_5.2.0_3.0_1699295175206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_testingmodel_en_5.2.0_3.0_1699295175206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_testingmodel","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_testingmodel","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_superman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_testingmodel| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/superman/testingmodel \ No newline at end of file From 0a72e299b05a01df4f5c51116bc80c696352e77c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:43:15 +0700 Subject: [PATCH 119/667] Add model 2023-11-06-bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en --- ..._tag_model_8000_9_16_more_ingredient_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en.md new file mode 100644 index 00000000000000..414d8530e79260 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_8000_9_16_more_ingredient +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-8000-9-16_more_ingredient` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`negingredient`, `occasion`, `mealcourse`, `cuisines`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en_5.2.0_3.0_1699296079463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_8000_9_16_more_ingredient_en_5.2.0_3.0_1699296079463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_8000_9_16_more_ingredient","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_8000_9_16_more_ingredient","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.ingredient.8000_9_16.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_8000_9_16_more_ingredient| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-8000-9-16_more_ingredient \ No newline at end of file From 1c0e3bfd35beaaee04ea8c676608dbe7032ef173 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:44:15 +0700 Subject: [PATCH 120/667] Add model 2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_en --- ...er_biobert_v1.1_pubmed_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_en.md new file mode 100644 index 00000000000000..afd7350ea5d303 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biobert_v1.1_pubmed_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from fidukm34) +author: John Snow Labs +name: bert_ner_biobert_v1.1_pubmed_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biobert_v1.1_pubmed-finetuned-ner` is a English model originally trained by `fidukm34`. + +## Predicted Entities + +`Disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_v1.1_pubmed_finetuned_ner_en_5.2.0_3.0_1699289492921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biobert_v1.1_pubmed_finetuned_ner_en_5.2.0_3.0_1699289492921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_v1.1_pubmed_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biobert_v1.1_pubmed_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert.pubmed.finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biobert_v1.1_pubmed_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/fidukm34/biobert_v1.1_pubmed-finetuned-ner \ No newline at end of file From b3c49a0b780c2084d9526b98a1361515516da423 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:45:15 +0700 Subject: [PATCH 121/667] Add model 2023-11-06-bert_ner_bert_large_cased_finetuned_ner_en --- ...t_ner_bert_large_cased_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_cased_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_cased_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_cased_finetuned_ner_en.md new file mode 100644 index 00000000000000..f0ba1c9ce21f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_large_cased_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Large Cased model (from dpalominop) +author: John Snow Labs +name: bert_ner_bert_large_cased_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-large-cased-finetuned-ner` is a English model originally trained by `dpalominop`. + +## Predicted Entities + +`OCC`, `DIS`, `RES` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_large_cased_finetuned_ner_en_5.2.0_3.0_1699288071995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_large_cased_finetuned_ner_en_5.2.0_3.0_1699288071995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_large_cased_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_large_cased_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.cased_large_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_large_cased_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dpalominop/bert-large-cased-finetuned-ner \ No newline at end of file From c7be6349513c3b83f5aa75a061d2ccbaa7a0a1cf Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:46:16 +0700 Subject: [PATCH 122/667] Add model 2023-11-06-bert_ner_mbert_base_uncased_ner_kinyarwanda_kin --- ..._mbert_base_uncased_ner_kinyarwanda_kin.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_kinyarwanda_kin.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_kinyarwanda_kin.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_kinyarwanda_kin.md new file mode 100644 index 00000000000000..5973c9818e2804 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_kinyarwanda_kin.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Kinyarwanda bert_ner_mbert_base_uncased_ner_kinyarwanda BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_mbert_base_uncased_ner_kinyarwanda +date: 2023-11-06 +tags: [bert, kin, open_source, token_classification, onnx] +task: Named Entity Recognition +language: kin +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_uncased_ner_kinyarwanda` is a Kinyarwanda model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_ner_kinyarwanda_kin_5.2.0_3.0_1699296325986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_ner_kinyarwanda_kin_5.2.0_3.0_1699296325986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_uncased_ner_kinyarwanda","kin") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_uncased_ner_kinyarwanda", "kin") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_uncased_ner_kinyarwanda| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|kin| +|Size:|665.1 MB| + +## References + +https://huggingface.co/arnolfokam/mbert-base-uncased-ner-kin \ No newline at end of file From 75939371e84cc9d307f724a45af03def5394fdcf Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:47:16 +0700 Subject: [PATCH 123/667] Add model 2023-11-06-bert_ner_keyword_tag_model_9000_v2_en --- ...6-bert_ner_keyword_tag_model_9000_v2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_9000_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_9000_v2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_9000_v2_en.md new file mode 100644 index 00000000000000..85e3e2b091bd37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_9000_v2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_9000_v2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-9000-v2` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_9000_v2_en_5.2.0_3.0_1699296381549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_9000_v2_en_5.2.0_3.0_1699296381549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_9000_v2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_9000_v2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.v2.9000_v2.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_9000_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-9000-v2 \ No newline at end of file From 6ff253ef6eb03ee27c506ebb301b24ed6d78f01c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:48:16 +0700 Subject: [PATCH 124/667] Add model 2023-11-06-bert_ner_nepal_bhasa_test_model2_en --- ...-06-bert_ner_nepal_bhasa_test_model2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model2_en.md new file mode 100644 index 00000000000000..f5ad024c4597df --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_nepal_bhasa_test_model2 BertForTokenClassification from kSaluja +author: John Snow Labs +name: bert_ner_nepal_bhasa_test_model2 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_nepal_bhasa_test_model2` is a English model originally trained by kSaluja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nepal_bhasa_test_model2_en_5.2.0_3.0_1699296436373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nepal_bhasa_test_model2_en_5.2.0_3.0_1699296436373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nepal_bhasa_test_model2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_nepal_bhasa_test_model2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nepal_bhasa_test_model2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kSaluja/new-test-model2 \ No newline at end of file From 50a8fb5e07cd9847f45327862b9666e65d7e782e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:49:17 +0700 Subject: [PATCH 125/667] Add model 2023-11-06-bert_ner_tolgahanturker_bert_finetuned_ner_en --- ...er_tolgahanturker_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tolgahanturker_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tolgahanturker_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tolgahanturker_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..309ae31b221619 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tolgahanturker_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from tolgahanturker) +author: John Snow Labs +name: bert_ner_tolgahanturker_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `tolgahanturker`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tolgahanturker_bert_finetuned_ner_en_5.2.0_3.0_1699295510686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tolgahanturker_bert_finetuned_ner_en_5.2.0_3.0_1699295510686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tolgahanturker_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tolgahanturker_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_tolgahanturker").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tolgahanturker_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/tolgahanturker/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From ec635f50a7af6912edf9241dc48589bba34fc0db Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:50:17 +0700 Subject: [PATCH 126/667] Add model 2023-11-06-bert_ner_deformer_en --- .../2023-11-06-bert_ner_deformer_en.md | 119 ++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_deformer_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deformer_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deformer_en.md new file mode 100644 index 00000000000000..363d16fa8bd7a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deformer_en.md @@ -0,0 +1,119 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Lauler) +author: John Snow Labs +name: bert_ner_deformer +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `deformer` is a English model originally trained by `Lauler`. + +## Predicted Entities + +`DE`, `ord`, `DEM` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_deformer_en_5.2.0_3.0_1699293114150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_deformer_en_5.2.0_3.0_1699293114150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_deformer","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_deformer","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_lauler").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_deformer| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|465.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Lauler/deformer +- https://opus.nlpl.eu/download.php?f=wikimedia/v20210402/mono/sv.txt.gz +- https://opus.nlpl.eu/download.php?f=JRC-Acquis/mono/JRC-Acquis.raw.sv.gz +- https://opus.nlpl.eu/ +- https://opus.nlpl.eu/download.php?f=Europarl/v8/mono/sv.txt.gz +- https://www4.isof.se/cgi-bin/srfl/visasvar.py?sok=dem%20som&svar=79718&log_id=705355 \ No newline at end of file From 6c8f032a35c7aec375b043e2a02ddb00ef8cca60 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:51:17 +0700 Subject: [PATCH 127/667] Add model 2023-11-06-bert_ner_mdroth_bert_finetuned_ner_en --- ...6-bert_ner_mdroth_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..5d048e18d6abd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mdroth) +author: John Snow Labs +name: bert_ner_mdroth_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `mdroth`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mdroth_bert_finetuned_ner_en_5.2.0_3.0_1699296633696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mdroth_bert_finetuned_ner_en_5.2.0_3.0_1699296633696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mdroth_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mdroth_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_mdroth").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mdroth_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mdroth/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From d7a21d0dc23513ad7860b491be6180e21e9ab1b5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:52:17 +0700 Subject: [PATCH 128/667] Add model 2023-11-06-bert_ner_nielsr_bert_finetuned_ner_en --- ...6-bert_ner_nielsr_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nielsr_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nielsr_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nielsr_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..35d5333b4e5803 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nielsr_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from nielsr) +author: John Snow Labs +name: bert_ner_nielsr_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `nielsr`. + +## Predicted Entities + +`geo`, `org`, `per`, `tim`, `gpe` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nielsr_bert_finetuned_ner_en_5.2.0_3.0_1699296703770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nielsr_bert_finetuned_ner_en_5.2.0_3.0_1699296703770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nielsr_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nielsr_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_nielsr").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nielsr_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/nielsr/bert-finetuned-ner +- https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BERT/Custom_Named_Entity_Recognition_with_BERT.ipynb \ No newline at end of file From c4b1b860c9829787d68d3e942bd8085c9ff33930 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:53:18 +0700 Subject: [PATCH 129/667] Add model 2023-11-06-bert_ner_literary_german_bert_de --- ...-11-06-bert_ner_literary_german_bert_de.md | 120 ++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_literary_german_bert_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_literary_german_bert_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_literary_german_bert_de.md new file mode 100644 index 00000000000000..045464a979d465 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_literary_german_bert_de.md @@ -0,0 +1,120 @@ +--- +layout: model +title: German Named Entity Recognition (from severinsimmler) +author: John Snow Labs +name: bert_ner_literary_german_bert +date: 2023-11-06 +tags: [bert, ner, token_classification, de, open_source, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `literary-german-bert` is a German model orginally trained by `severinsimmler`. + +## Predicted Entities + +`PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_literary_german_bert_de_5.2.0_3.0_1699296715278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_literary_german_bert_de_5.2.0_3.0_1699296715278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_literary_german_bert","de") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ich liebe Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_literary_german_bert","de") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ich liebe Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.literary.bert.by_severinsimmler").predict("""Ich liebe Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_literary_german_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.8 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/severinsimmler/literary-german-bert +- https://figshare.com/articles/Corpus_of_German-Language_Fiction_txt_/4524680/1 +- https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release +- https://figshare.com/articles/Corpus_of_German-Language_Fiction_txt_/4524680/1 +- https://opus.bibliothek.uni-wuerzburg.de/opus4-wuerzburg/frontdoor/deliver/index/docId/14333/file/Jannidis_Figurenerkennung_Roman.pdf +- http://webdoc.sub.gwdg.de/pub/mon/dariah-de/dwp-2018-27.pdf +- https://opus.bibliothek.uni-wuerzburg.de/opus4-wuerzburg/frontdoor/deliver/index/docId/14333/file/Jannidis_Figurenerkennung_Roman.pdf \ No newline at end of file From 152e5309cbb62ee07b90045f06c8ea8412217417 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:54:18 +0700 Subject: [PATCH 130/667] Add model 2023-11-06-bert_ner_mbert_base_albanian_cased_ner_sq --- ...rt_ner_mbert_base_albanian_cased_ner_sq.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_albanian_cased_ner_sq.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_albanian_cased_ner_sq.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_albanian_cased_ner_sq.md new file mode 100644 index 00000000000000..599357f3068144 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_albanian_cased_ner_sq.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Albanian BertForTokenClassification Base Cased model (from akdeniz27) +author: John Snow Labs +name: bert_ner_mbert_base_albanian_cased_ner +date: 2023-11-06 +tags: [bert, ner, open_source, sq, onnx] +task: Named Entity Recognition +language: sq +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `mbert-base-albanian-cased-ner` is a Albanian model originally trained by `akdeniz27`. + +## Predicted Entities + +`PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_albanian_cased_ner_sq_5.2.0_3.0_1699296753654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_albanian_cased_ner_sq_5.2.0_3.0_1699296753654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_albanian_cased_ner","sq") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["E dua shkëndijën nlp"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_albanian_cased_ner","sq") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("E dua shkëndijën nlp").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sq.ner.bert.cased_base").predict("""E dua shkëndijën nlp""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_albanian_cased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sq| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/akdeniz27/mbert-base-albanian-cased-ner +- https://aclanthology.org/P17-1178.pdf \ No newline at end of file From 3063f99e6f28b827fbaf5dd60719cb6052c5d788 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:55:18 +0700 Subject: [PATCH 131/667] Add model 2023-11-06-bert_sayula_popoluca_13.05.2022.ssccvspantagger_en --- ..._popoluca_13.05.2022.ssccvspantagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_13.05.2022.ssccvspantagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_13.05.2022.ssccvspantagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_13.05.2022.ssccvspantagger_en.md new file mode 100644 index 00000000000000..3d5d4453365d0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_13.05.2022.ssccvspantagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_13.05.2022.ssccvspantagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_13.05.2022.ssccvspantagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_13.05.2022.ssccvspantagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_13.05.2022.ssccvspantagger_en_5.2.0_3.0_1699296736661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_13.05.2022.ssccvspantagger_en_5.2.0_3.0_1699296736661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_13.05.2022.ssccvspantagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_13.05.2022.ssccvspantagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_13.05.2022.ssccvspantagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/RJ3vans/13.05.2022.SSCCVspanTagger \ No newline at end of file From 28f6bfb19e816cfd897d1b8291ee142d6d0e2c15 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:56:18 +0700 Subject: [PATCH 132/667] Add model 2023-11-06-bert_ner_michojan_bert_finetuned_ner_en --- ...bert_ner_michojan_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_michojan_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_michojan_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_michojan_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..dd53acb5091ef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_michojan_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from michojan) +author: John Snow Labs +name: bert_ner_michojan_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `michojan`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_michojan_bert_finetuned_ner_en_5.2.0_3.0_1699296913438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_michojan_bert_finetuned_ner_en_5.2.0_3.0_1699296913438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_michojan_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_michojan_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_michojan").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_michojan_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/michojan/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From c18a879859b6a6113e8757fdcd4b258ab085e160 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:57:19 +0700 Subject: [PATCH 133/667] Add model 2023-11-06-bert_ner_romainlhardy_bert_finetuned_ner_en --- ..._ner_romainlhardy_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_romainlhardy_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_romainlhardy_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_romainlhardy_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..69a604de05d958 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_romainlhardy_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from romainlhardy) +author: John Snow Labs +name: bert_ner_romainlhardy_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `romainlhardy`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_romainlhardy_bert_finetuned_ner_en_5.2.0_3.0_1699296980720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_romainlhardy_bert_finetuned_ner_en_5.2.0_3.0_1699296980720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_romainlhardy_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_romainlhardy_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_romainlhardy").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_romainlhardy_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/romainlhardy/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From fa8aa824ded3b00c9e31edee523f1dd9010b9f4f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:58:19 +0700 Subject: [PATCH 134/667] Add model 2023-11-06-bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en --- ...iobert_base_cased_v1.2_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en.md new file mode 100644 index 00000000000000..5b5c43e5876312 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Base Cased model (from hossay) +author: John Snow Labs +name: bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biobert-base-cased-v1.2-finetuned-ner` is a English model originally trained by `hossay`. + +## Predicted Entities + +`Disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en_5.2.0_3.0_1699292074948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner_en_5.2.0_3.0_1699292074948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.biobert.cased_base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_hossay_biobert_base_cased_v1.2_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/hossay/biobert-base-cased-v1.2-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=ncbi_disease \ No newline at end of file From 41b0cfc9415e410d2b45a87b6b311715f38cee85 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 01:59:19 +0700 Subject: [PATCH 135/667] Add model 2023-11-06-bert_sayula_popoluca_cmv1spantagger_en --- ...-bert_sayula_popoluca_cmv1spantagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmv1spantagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmv1spantagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmv1spantagger_en.md new file mode 100644 index 00000000000000..790470970456f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmv1spantagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_cmv1spantagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_cmv1spantagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_cmv1spantagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_cmv1spantagger_en_5.2.0_3.0_1699297054050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_cmv1spantagger_en_5.2.0_3.0_1699297054050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_cmv1spantagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_cmv1spantagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_cmv1spantagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/RJ3vans/CMV1spanTagger \ No newline at end of file From d928719a0d906e50e0ae685f53c906d35891e4f4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:00:19 +0700 Subject: [PATCH 136/667] Add model 2023-11-06-bert_ner_rubert_tiny2_sentence_compression_en --- ...er_rubert_tiny2_sentence_compression_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_tiny2_sentence_compression_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_tiny2_sentence_compression_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_tiny2_sentence_compression_en.md new file mode 100644 index 00000000000000..3d93de08a9222e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_tiny2_sentence_compression_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Tiny Cased model (from cointegrated) +author: John Snow Labs +name: bert_ner_rubert_tiny2_sentence_compression +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `rubert-tiny2-sentence-compression` is a English model originally trained by `cointegrated`. + +## Predicted Entities + +`drop`, `keep` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_rubert_tiny2_sentence_compression_en_5.2.0_3.0_1699297171069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_rubert_tiny2_sentence_compression_en_5.2.0_3.0_1699297171069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rubert_tiny2_sentence_compression","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rubert_tiny2_sentence_compression","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.tiny").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_rubert_tiny2_sentence_compression| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|109.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/cointegrated/rubert-tiny2-sentence-compression +- https://www.dialog-21.ru/media/5106/kuvshinovat-050.pdf \ No newline at end of file From cc62450c4f28353e0f85874594cab65adb92f53e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:01:26 +0700 Subject: [PATCH 137/667] Add model 2023-11-06-bert_ner_yv_bert_finetuned_ner_accelerate_en --- ...ner_yv_bert_finetuned_ner_accelerate_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..42c772253c469d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_yv_bert_finetuned_ner_accelerate BertForTokenClassification from Yv +author: John Snow Labs +name: bert_ner_yv_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_yv_bert_finetuned_ner_accelerate` is a English model originally trained by Yv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_yv_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699284471838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_yv_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699284471838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_yv_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_yv_bert_finetuned_ner_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_yv_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Yv/bert-finetuned-ner-accelerate \ No newline at end of file From daab324826fddbe3b298e0c4bff899cb86b619e3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:02:20 +0700 Subject: [PATCH 138/667] Add model 2023-11-06-bert_ner_nbailab_base_ner_scandi_xx --- ...-06-bert_ner_nbailab_base_ner_scandi_xx.md | 118 ++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nbailab_base_ner_scandi_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nbailab_base_ner_scandi_xx.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nbailab_base_ner_scandi_xx.md new file mode 100644 index 00000000000000..0419c13ba61cdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nbailab_base_ner_scandi_xx.md @@ -0,0 +1,118 @@ +--- +layout: model +title: Multilingual BertForTokenClassification Base Cased model (from saattrupdan) +author: John Snow Labs +name: bert_ner_nbailab_base_ner_scandi +date: 2023-11-06 +tags: [bert, ner, open_source, da, nb, nn, "no", sv, is, fo, xx, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `nbailab-base-ner-scandi` is a Multilingual model originally trained by `saattrupdan`. + +## Predicted Entities + +`LOC`, `ORG`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nbailab_base_ner_scandi_xx_5.2.0_3.0_1699297224666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nbailab_base_ner_scandi_xx_5.2.0_3.0_1699297224666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nbailab_base_ner_scandi","xx") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nbailab_base_ner_scandi","xx") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.bert.wikiann.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nbailab_base_ner_scandi| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|666.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/saattrupdan/nbailab-base-ner-scandi +- https://aclanthology.org/P17-1178/ +- https://arxiv.org/abs/1911.12146 +- https://aclanthology.org/2020.lrec-1.565/ +- https://spraakbanken.gu.se/en/resources/suc3 \ No newline at end of file From 766a0da21017a808d65fb6f2f9be7b728e488a2b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:03:20 +0700 Subject: [PATCH 139/667] Add model 2023-11-06-bert_ner_bert_finetuned_protagonist_en --- ...-bert_ner_bert_finetuned_protagonist_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_protagonist_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_protagonist_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_protagonist_en.md new file mode 100644 index 00000000000000..79b156884c88ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_protagonist_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from airi) +author: John Snow Labs +name: bert_ner_bert_finetuned_protagonist +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-protagonist` is a English model originally trained by `airi`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_protagonist_en_5.2.0_3.0_1699289429567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_protagonist_en_5.2.0_3.0_1699289429567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_protagonist","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_protagonist","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.protagonist.by_airi").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_protagonist| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/airi/bert-finetuned-protagonist \ No newline at end of file From 0b6a8972334cea03bf17a02753e77910c46b76c0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:04:20 +0700 Subject: [PATCH 140/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar --- ...ic_camelbert_mix_sayula_popoluca_egy_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar.md new file mode 100644 index 00000000000000..f360a64d29421d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar_5.2.0_3.0_1699297434626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy_ar_5.2.0_3.0_1699297434626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_egy| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.7 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-mix-pos-egy \ No newline at end of file From 829d46be69348872f4e606fdef2798981bb7e273 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:05:21 +0700 Subject: [PATCH 141/667] Add model 2023-11-06-bert_ner_russellc_bert_finetuned_ner_en --- ...bert_ner_russellc_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..77d72bb6f183ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from russellc) +author: John Snow Labs +name: bert_ner_russellc_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `russellc`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_russellc_bert_finetuned_ner_en_5.2.0_3.0_1699297478470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_russellc_bert_finetuned_ner_en_5.2.0_3.0_1699297478470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_russellc_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_russellc_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_russellc").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_russellc_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/russellc/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 60626b2ff77fc1751b64b91c1daeb523da5164bc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:06:21 +0700 Subject: [PATCH 142/667] Add model 2023-11-06-bert_ner_ner_2006_en --- .../2023-11-06-bert_ner_ner_2006_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_2006_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_2006_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_2006_en.md new file mode 100644 index 00000000000000..9d138906be4142 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_2006_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from yihahn) +author: John Snow Labs +name: bert_ner_ner_2006 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `ner_2006` is a English model originally trained by `yihahn`. + +## Predicted Entities + +`PHONE`, `ID`, `PATIENT`, `DATE`, `AGE`, `LOCATION`, `HOSPITAL`, `DOCTOR` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_2006_en_5.2.0_3.0_1699297522148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_2006_en_5.2.0_3.0_1699297522148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_2006","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_2006","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_yihahn").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_2006| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/yihahn/ner_2006 \ No newline at end of file From 284fc90971f0f7731cb836f5c21a9e87196cf192 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:07:21 +0700 Subject: [PATCH 143/667] Add model 2023-11-06-bert_ner_bioformer_cased_v1.0_ncbi_disease_en --- ...er_bioformer_cased_v1.0_ncbi_disease_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_ncbi_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_ncbi_disease_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_ncbi_disease_en.md new file mode 100644 index 00000000000000..dc3895ac28b4aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_ncbi_disease_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from bioformers) +author: John Snow Labs +name: bert_ner_bioformer_cased_v1.0_ncbi_disease +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bioformer-cased-v1.0-ncbi-disease` is a English model originally trained by `bioformers`. + +## Predicted Entities + +`bio` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bioformer_cased_v1.0_ncbi_disease_en_5.2.0_3.0_1699290311864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bioformer_cased_v1.0_ncbi_disease_en_5.2.0_3.0_1699290311864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bioformer_cased_v1.0_ncbi_disease","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bioformer_cased_v1.0_ncbi_disease","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bioformer.ncbi.cased_disease").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bioformer_cased_v1.0_ncbi_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|158.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/bioformers/bioformer-cased-v1.0-ncbi-disease +- https://doi.org/10.1016/j.jbi.2013.12.006 \ No newline at end of file From 9872fdf8006c173f72b81121497178c90799ac0e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:08:21 +0700 Subject: [PATCH 144/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar --- ...ic_camelbert_mix_sayula_popoluca_msa_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar.md new file mode 100644 index 00000000000000..a0b7c1c87b64d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar_5.2.0_3.0_1699297635239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa_ar_5.2.0_3.0_1699297635239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_msa| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.7 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-mix-pos-msa \ No newline at end of file From 5c138b6f5be609268b8d4afaa7b9c2b89de285f4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:09:21 +0700 Subject: [PATCH 145/667] Add model 2023-11-06-bert_ner_bert_base_arabic_camelbert_msa_ner_ar --- ...r_bert_base_arabic_camelbert_msa_ner_ar.md | 119 ++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_arabic_camelbert_msa_ner_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_arabic_camelbert_msa_ner_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_arabic_camelbert_msa_ner_ar.md new file mode 100644 index 00000000000000..e1ffc875df4fd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_arabic_camelbert_msa_ner_ar.md @@ -0,0 +1,119 @@ +--- +layout: model +title: Arabic Named Entity Recognition (Modern Standard Arabic-MSA) +author: John Snow Labs +name: bert_ner_bert_base_arabic_camelbert_msa_ner +date: 2023-11-06 +tags: [bert, ner, token_classification, ar, open_source, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `bert-base-arabic-camelbert-msa-ner` is a Arabic model orginally trained by `CAMeL-Lab`. + +## Predicted Entities + +`ORG`, `LOC`, `PERS`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_arabic_camelbert_msa_ner_ar_5.2.0_3.0_1699285984142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_arabic_camelbert_msa_ner_ar_5.2.0_3.0_1699285984142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ +.setInputCol("text") \ +.setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ +.setInputCols(["document"])\ +.setOutputCol("sentence") + +tokenizer = Tokenizer() \ +.setInputCols("sentence") \ +.setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_arabic_camelbert_msa_ner","ar") \ +.setInputCols(["sentence", "token"]) \ +.setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["أنا أحب الشرارة NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() +.setInputCol("text") +.setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") +.setInputCols(Array("document")) +.setOutputCol("sentence") + +val tokenizer = new Tokenizer() +.setInputCols(Array("sentence")) +.setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_arabic_camelbert_msa_ner","ar") +.setInputCols(Array("sentence", "token")) +.setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("أنا أحب الشرارة NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("ar.ner.arabic_camelbert_msa_ner").predict("""أنا أحب الشرارة NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_arabic_camelbert_msa_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-ner +- https://camel.abudhabi.nyu.edu/anercorp/ +- https://arxiv.org/abs/2103.06678 +- https://github.com/CAMeL-Lab/CAMeLBERT +- https://github.com/CAMeL-Lab/camel_tools +- https://github.com/CAMeL-Lab/camel_tools \ No newline at end of file From 99974d143da43ebe5254affe8dffecd62815e89f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:10:21 +0700 Subject: [PATCH 146/667] Add model 2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en --- ...nts_tokenized_mbert_cased_fine_tuned_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en.md new file mode 100644 index 00000000000000..52c1b335b6b31d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en_5.2.0_3.0_1699280341642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned_en_5.2.0_3.0_1699280341642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nlp_cic_wfu_clinical_cases_ner_sents_tokenized_mbert_cased_fine_tuned| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/NLP-CIC-WFU_Clinical_Cases_NER_Sents_tokenized_mBERT_cased_fine_tuned \ No newline at end of file From 38b12be083656fb1d0a60e8839ff2b7a9ff6d899 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:11:22 +0700 Subject: [PATCH 147/667] Add model 2023-11-06-bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en --- ..._mcdzwil_bert_base_ner_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en.md new file mode 100644 index 00000000000000..909f77e03ec598 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_mcdzwil_bert_base_ner_finetuned_ner BertForTokenClassification from mcdzwil +author: John Snow Labs +name: bert_ner_mcdzwil_bert_base_ner_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mcdzwil_bert_base_ner_finetuned_ner` is a English model originally trained by mcdzwil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en_5.2.0_3.0_1699296958384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mcdzwil_bert_base_ner_finetuned_ner_en_5.2.0_3.0_1699296958384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mcdzwil_bert_base_ner_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mcdzwil_bert_base_ner_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mcdzwil_bert_base_ner_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mcdzwil/bert-base-NER-finetuned-ner \ No newline at end of file From 09cb8c2973bd8a64a0bdd24270c0602ceecb3cdc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:12:22 +0700 Subject: [PATCH 148/667] Add model 2023-11-06-bert_ner_scibert_scivocab_cased_ner_jnlpba_en --- ...er_scibert_scivocab_cased_ner_jnlpba_en.md | 119 ++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_ner_jnlpba_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_ner_jnlpba_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_ner_jnlpba_en.md new file mode 100644 index 00000000000000..7113d647645768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_cased_ner_jnlpba_en.md @@ -0,0 +1,119 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from fran-martinez) +author: John Snow Labs +name: bert_ner_scibert_scivocab_cased_ner_jnlpba +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `scibert_scivocab_cased_ner_jnlpba` is a English model originally trained by `fran-martinez`. + +## Predicted Entities + +`RNA`, `cell_type`, `protein`, `cell_line`, `DNA` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_cased_ner_jnlpba_en_5.2.0_3.0_1699297794145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_cased_ner_jnlpba_en_5.2.0_3.0_1699297794145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_cased_ner_jnlpba","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_cased_ner_jnlpba","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.scibert.scibert.cased.by_fran_martinez").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_scibert_scivocab_cased_ner_jnlpba| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/fran-martinez/scibert_scivocab_cased_ner_jnlpba +- https://github.com/fran-martinez/bio_ner_bert +- http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004 +- https://arxiv.org/pdf/1903.10676.pdf +- https://www.semanticscholar.org/ +- https://allenai.org/ \ No newline at end of file From d572b540ce105f7bc15471c22c1e2e36e5692654 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:13:22 +0700 Subject: [PATCH 149/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx --- ...base_dutch_cased_upos_alpino_frisian_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx.md new file mode 100644 index 00000000000000..51be1bb8b9b721 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian BertForTokenClassification from GroNLP +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian +date: 2023-11-06 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian` is a Multilingual model originally trained by GroNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx_5.2.0_3.0_1699297970707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian_xx_5.2.0_3.0_1699297970707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_frisian| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|349.0 MB| + +## References + +https://huggingface.co/GroNLP/bert-base-dutch-cased-upos-alpino-frisian \ No newline at end of file From 63fc232ae40f42eea9ae43f10e59b6f331046d67 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:14:23 +0700 Subject: [PATCH 150/667] Add model 2023-11-06-bert_ner_mdroth_bert_finetuned_ner_accelerate_en --- ...mdroth_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..d91cd42e2d937a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mdroth_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mdroth) +author: John Snow Labs +name: bert_ner_mdroth_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `mdroth`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mdroth_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699298017829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mdroth_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699298017829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mdroth_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mdroth_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_mdroth").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mdroth_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mdroth/bert-finetuned-ner-accelerate \ No newline at end of file From 3bfba7938b20b0fbe1654b00a88ce65267a12f6d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:15:23 +0700 Subject: [PATCH 151/667] Add model 2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_large_set_sv --- ...finetuned_ner_swedish_test_large_set_sv.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_large_set_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_large_set_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_large_set_sv.md new file mode 100644 index 00000000000000..cb20ddae9abb30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_finetuned_ner_swedish_test_large_set_sv.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Swedish BertForTokenClassification Large Cased model (from Nonzerophilip) +author: John Snow Labs +name: bert_ner_bert_finetuned_ner_swedish_test_large_set +date: 2023-11-06 +tags: [bert, ner, open_source, sv, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner_swedish_test_large_set` is a Swedish model originally trained by `Nonzerophilip`. + +## Predicted Entities + +`MISC`, `inst`, `person`, `NAN`, `place` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_test_large_set_sv_5.2.0_3.0_1699288994399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_finetuned_ner_swedish_test_large_set_sv_5.2.0_3.0_1699288994399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_test_large_set","sv") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Jag älskar Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_finetuned_ner_swedish_test_large_set","sv") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Jag älskar Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sv.ner.bert.large_finetuned").predict("""Jag älskar Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_finetuned_ner_swedish_test_large_set| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Nonzerophilip/bert-finetuned-ner_swedish_test_large_set \ No newline at end of file From e9e165e8f121716bcfb4eb4e7f4fe873a20ab074 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:16:23 +0700 Subject: [PATCH 152/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_japanese_luw_upos_ja --- ...popoluca_bert_base_japanese_luw_upos_ja.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_luw_upos_ja.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_luw_upos_ja.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_luw_upos_ja.md new file mode 100644 index 00000000000000..8d680af6cf763c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_luw_upos_ja.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Japanese bert_sayula_popoluca_bert_base_japanese_luw_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_japanese_luw_upos +date: 2023-11-06 +tags: [bert, ja, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ja +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_japanese_luw_upos` is a Japanese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_japanese_luw_upos_ja_5.2.0_3.0_1699298148873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_japanese_luw_upos_ja_5.2.0_3.0_1699298148873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_japanese_luw_upos","ja") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_japanese_luw_upos", "ja") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_japanese_luw_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ja| +|Size:|338.3 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-base-japanese-luw-upos \ No newline at end of file From a88c259bc4fef43e4c5199a4bdc3f65bd2639b0e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:17:23 +0700 Subject: [PATCH 153/667] Add model 2023-11-06-bert_ner_kalex_bert_finetuned_ner_en --- ...06-bert_ner_kalex_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_kalex_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kalex_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kalex_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..18e7ba137121d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kalex_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from kalex) +author: John Snow Labs +name: bert_ner_kalex_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `kalex`. + +## Predicted Entities + +`Disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_kalex_bert_finetuned_ner_en_5.2.0_3.0_1699294162666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_kalex_bert_finetuned_ner_en_5.2.0_3.0_1699294162666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kalex_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kalex_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_kalex").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_kalex_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/kalex/bert-finetuned-ner \ No newline at end of file From 53c1ee3e93d0076604df1fc3263b9a31d244d4a1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:18:23 +0700 Subject: [PATCH 154/667] Add model 2023-11-06-bert_ner_biored_chem_modified_pubmedbert_384_8_10_en --- ...ed_chem_modified_pubmedbert_384_8_10_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_chem_modified_pubmedbert_384_8_10_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_chem_modified_pubmedbert_384_8_10_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_chem_modified_pubmedbert_384_8_10_en.md new file mode 100644 index 00000000000000..fbde2f1d8e7998 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_chem_modified_pubmedbert_384_8_10_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biored_chem_modified_pubmedbert_384_8_10 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_biored_chem_modified_pubmedbert_384_8_10 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biored_chem_modified_pubmedbert_384_8_10` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biored_chem_modified_pubmedbert_384_8_10_en_5.2.0_3.0_1699276859764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biored_chem_modified_pubmedbert_384_8_10_en_5.2.0_3.0_1699276859764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biored_chem_modified_pubmedbert_384_8_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biored_chem_modified_pubmedbert_384_8_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biored_chem_modified_pubmedbert_384_8_10| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioRed-Chem-Modified-PubMedBERT-384-8-10 \ No newline at end of file From d356d82fc52d254614196b28ba4d6e1ee38aa807 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:19:24 +0700 Subject: [PATCH 155/667] Add model 2023-11-06-bert_ner_suonbo_bert_finetuned_ner_en --- ...6-bert_ner_suonbo_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_suonbo_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_suonbo_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_suonbo_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..dd6308a13eadfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_suonbo_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from suonbo) +author: John Snow Labs +name: bert_ner_suonbo_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `suonbo`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_suonbo_bert_finetuned_ner_en_5.2.0_3.0_1699298273027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_suonbo_bert_finetuned_ner_en_5.2.0_3.0_1699298273027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_suonbo_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_suonbo_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_suonbo").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_suonbo_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/suonbo/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From d2f3a11d152882b5b7de1976b9fdc63996366f97 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:20:24 +0700 Subject: [PATCH 156/667] Add model 2023-11-06-bert_ner_rubert_base_srl_seqlabeling_en --- ...bert_ner_rubert_base_srl_seqlabeling_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_base_srl_seqlabeling_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_base_srl_seqlabeling_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_base_srl_seqlabeling_en.md new file mode 100644 index 00000000000000..b88711426b8785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_base_srl_seqlabeling_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Base Cased model (from Rexhaif) +author: John Snow Labs +name: bert_ner_rubert_base_srl_seqlabeling +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `rubert-base-srl-seqlabeling` is a English model originally trained by `Rexhaif`. + +## Predicted Entities + +`INSTRUMENT`, `OTHER`, `CAUSATOR`, `PREDICATE`, `EXPIRIENCER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_rubert_base_srl_seqlabeling_en_5.2.0_3.0_1699298254905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_rubert_base_srl_seqlabeling_en_5.2.0_3.0_1699298254905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rubert_base_srl_seqlabeling","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rubert_base_srl_seqlabeling","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.base.by_rexhaif").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_rubert_base_srl_seqlabeling| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|667.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Rexhaif/rubert-base-srl-seqlabeling \ No newline at end of file From 28342b09e9ea0bcbc8bb0de95725f2b720108575 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:21:24 +0700 Subject: [PATCH 157/667] Add model 2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_en --- ...uggingface_course_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..ba55171af48b9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from huggingface-course) +author: John Snow Labs +name: bert_ner_huggingface_course_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `huggingface-course`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_huggingface_course_bert_finetuned_ner_en_5.2.0_3.0_1699294557264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_huggingface_course_bert_finetuned_ner_en_5.2.0_3.0_1699294557264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_huggingface_course_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_huggingface_course_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_huggingface_course").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_huggingface_course_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/huggingface-course/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From e1d12358c7a17b3f2e47e3d887f0b15c433f16d0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:22:25 +0700 Subject: [PATCH 158/667] Add model 2023-11-06-bert_ner_deval_bert_base_ner_finetuned_ner_en --- ...er_deval_bert_base_ner_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_deval_bert_base_ner_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deval_bert_base_ner_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deval_bert_base_ner_finetuned_ner_en.md new file mode 100644 index 00000000000000..fd489b11f424ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_deval_bert_base_ner_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_deval_bert_base_ner_finetuned_ner BertForTokenClassification from deval +author: John Snow Labs +name: bert_ner_deval_bert_base_ner_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_deval_bert_base_ner_finetuned_ner` is a English model originally trained by deval. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_deval_bert_base_ner_finetuned_ner_en_5.2.0_3.0_1699291236473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_deval_bert_base_ner_finetuned_ner_en_5.2.0_3.0_1699291236473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_deval_bert_base_ner_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_deval_bert_base_ner_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_deval_bert_base_ner_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/deval/bert-base-NER-finetuned-ner \ No newline at end of file From 1d9786fff9a4762b8fcc13540141e26cc4fd7919 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:23:25 +0700 Subject: [PATCH 159/667] Add model 2023-11-06-bert_ner_sagerpascal_bert_finetuned_ner_en --- ...t_ner_sagerpascal_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_sagerpascal_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_sagerpascal_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_sagerpascal_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..75aeee20a7078e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_sagerpascal_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from sagerpascal) +author: John Snow Labs +name: bert_ner_sagerpascal_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `sagerpascal`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_sagerpascal_bert_finetuned_ner_en_5.2.0_3.0_1699298557423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_sagerpascal_bert_finetuned_ner_en_5.2.0_3.0_1699298557423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_sagerpascal_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_sagerpascal_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_sagerpascal").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_sagerpascal_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sagerpascal/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From c05191a560b49cfa4aabd43752761991154c5473 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:24:25 +0700 Subject: [PATCH 160/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_russian_upos_ru --- ...yula_popoluca_bert_base_russian_upos_ru.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_russian_upos_ru.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_russian_upos_ru.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_russian_upos_ru.md new file mode 100644 index 00000000000000..5d2fa50fb54392 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_russian_upos_ru.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Russian bert_sayula_popoluca_bert_base_russian_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_russian_upos +date: 2023-11-06 +tags: [bert, ru, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_russian_upos` is a Russian model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_russian_upos_ru_5.2.0_3.0_1699298638136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_russian_upos_ru_5.2.0_3.0_1699298638136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_russian_upos","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_russian_upos", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_russian_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ru| +|Size:|664.5 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-base-russian-upos \ No newline at end of file From 5e349471c2e20b693d217225f8df4bf6a84c6be0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:25:26 +0700 Subject: [PATCH 161/667] Add model 2023-11-06-bert_ner_ner_nerd_en --- .../2023-11-06-bert_ner_ner_nerd_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_en.md new file mode 100644 index 00000000000000..b6a7996fc37fed --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from ramybaly) +author: John Snow Labs +name: bert_ner_ner_nerd +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `ner_nerd` is a English model originally trained by `ramybaly`. + +## Predicted Entities + +`ORG`, `EVENT`, `BUILDING`, `MISC`, `PER`, `PRODUCT`, `LOC`, `ART` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_nerd_en_5.2.0_3.0_1699298675966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_nerd_en_5.2.0_3.0_1699298675966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_nerd","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_nerd","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.nerd.by_ramybaly").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_nerd| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ramybaly/ner_nerd \ No newline at end of file From f01260b9f058961336ead96e3fd5fc931c8355b3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:26:26 +0700 Subject: [PATCH 162/667] Add model 2023-11-06-bert_ner_t_202_bert_finetuned_ner_en --- ...06-bert_ner_t_202_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_t_202_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_t_202_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_t_202_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..f8b410153a76d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_t_202_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_t_202_bert_finetuned_ner BertForTokenClassification from T-202 +author: John Snow Labs +name: bert_ner_t_202_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_t_202_bert_finetuned_ner` is a English model originally trained by T-202. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_t_202_bert_finetuned_ner_en_5.2.0_3.0_1699283538281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_t_202_bert_finetuned_ner_en_5.2.0_3.0_1699283538281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_t_202_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_t_202_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_t_202_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/T-202/bert-finetuned-ner \ No newline at end of file From d04106fd0a4c4694f93f91a88a8f8d1695cadb52 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:27:26 +0700 Subject: [PATCH 163/667] Add model 2023-11-06-bert_ner_bert_base_indonesian_ner_id --- ...06-bert_ner_bert_base_indonesian_ner_id.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_indonesian_ner_id.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_indonesian_ner_id.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_indonesian_ner_id.md new file mode 100644 index 00000000000000..3ff21bc6d664ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_indonesian_ner_id.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Indonesian bert_ner_bert_base_indonesian_ner BertForTokenClassification from cahya +author: John Snow Labs +name: bert_ner_bert_base_indonesian_ner +date: 2023-11-06 +tags: [bert, id, open_source, token_classification, onnx] +task: Named Entity Recognition +language: id +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_base_indonesian_ner` is a Indonesian model originally trained by cahya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_indonesian_ner_id_5.2.0_3.0_1699286388251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_indonesian_ner_id_5.2.0_3.0_1699286388251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_indonesian_ner","id") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_base_indonesian_ner", "id") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_indonesian_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|id| +|Size:|412.7 MB| + +## References + +https://huggingface.co/cahya/bert-base-indonesian-NER \ No newline at end of file From a25c39f5da47b88cdadba756452d39b34c1a4be2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:28:26 +0700 Subject: [PATCH 164/667] Add model 2023-11-06-bert_ner_ner_hungarian_model_2021_hu --- ...06-bert_ner_ner_hungarian_model_2021_hu.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_hungarian_model_2021_hu.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_hungarian_model_2021_hu.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_hungarian_model_2021_hu.md new file mode 100644 index 00000000000000..9411d816457201 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_hungarian_model_2021_hu.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Hungarian bert_ner_ner_hungarian_model_2021 BertForTokenClassification from fdominik98 +author: John Snow Labs +name: bert_ner_ner_hungarian_model_2021 +date: 2023-11-06 +tags: [bert, hu, open_source, token_classification, onnx] +task: Named Entity Recognition +language: hu +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ner_hungarian_model_2021` is a Hungarian model originally trained by fdominik98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_hungarian_model_2021_hu_5.2.0_3.0_1699298022376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_hungarian_model_2021_hu_5.2.0_3.0_1699298022376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_hungarian_model_2021","hu") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_ner_hungarian_model_2021", "hu") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_hungarian_model_2021| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|hu| +|Size:|412.5 MB| + +## References + +https://huggingface.co/fdominik98/ner-hu-model-2021 \ No newline at end of file From ed0e93b5b7a1ba8ad92ee10047ca68f4af28aec1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:29:26 +0700 Subject: [PATCH 165/667] Add model 2023-11-06-bert_ner_scibert_scivocab_uncased_sdu21_ai_en --- ...er_scibert_scivocab_uncased_sdu21_ai_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_sdu21_ai_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_sdu21_ai_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_sdu21_ai_en.md new file mode 100644 index 00000000000000..ed9ac856c9f94d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_sdu21_ai_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_scibert_scivocab_uncased_sdu21_ai BertForTokenClassification from napsternxg +author: John Snow Labs +name: bert_ner_scibert_scivocab_uncased_sdu21_ai +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_scibert_scivocab_uncased_sdu21_ai` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_sdu21_ai_en_5.2.0_3.0_1699298884772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_sdu21_ai_en_5.2.0_3.0_1699298884772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_uncased_sdu21_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_scibert_scivocab_uncased_sdu21_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_scibert_scivocab_uncased_sdu21_ai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/napsternxg/scibert_scivocab_uncased_SDU21_AI \ No newline at end of file From 99b9d74e5609a6b3db9b33c048af0fced38b2260 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:30:26 +0700 Subject: [PATCH 166/667] Add model 2023-11-06-bert_sayula_popoluca_amhariccacopostag_en --- ...rt_sayula_popoluca_amhariccacopostag_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amhariccacopostag_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amhariccacopostag_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amhariccacopostag_en.md new file mode 100644 index 00000000000000..06b024535fb57f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amhariccacopostag_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_amhariccacopostag BertForTokenClassification from mitiku +author: John Snow Labs +name: bert_sayula_popoluca_amhariccacopostag +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_amhariccacopostag` is a English model originally trained by mitiku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_amhariccacopostag_en_5.2.0_3.0_1699298884730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_amhariccacopostag_en_5.2.0_3.0_1699298884730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_amhariccacopostag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_amhariccacopostag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_amhariccacopostag| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitiku/AmharicCacoPostag \ No newline at end of file From 8b9ddbb73ea3525958f5575036297ba26e24dd43 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:31:27 +0700 Subject: [PATCH 167/667] Add model 2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_en --- ...bert_ner_peterhsu_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..5d598872e6814b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from peterhsu) +author: John Snow Labs +name: bert_ner_peterhsu_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `peterhsu`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_peterhsu_bert_finetuned_ner_en_5.2.0_3.0_1699298921856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_peterhsu_bert_finetuned_ner_en_5.2.0_3.0_1699298921856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_peterhsu_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_peterhsu_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_peterhsu").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_peterhsu_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/peterhsu/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From fad9afb414434d8160d8f4869a474d83ab41f10f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:32:27 +0700 Subject: [PATCH 168/667] Add model 2023-11-06-bert_ner_original_biobert_bc2gm_en --- ...1-06-bert_ner_original_biobert_bc2gm_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_biobert_bc2gm_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_biobert_bc2gm_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_biobert_bc2gm_en.md new file mode 100644 index 00000000000000..538dd5c94f69c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_biobert_bc2gm_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_biobert_bc2gm BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_biobert_bc2gm +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_biobert_bc2gm` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_biobert_bc2gm_en_5.2.0_3.0_1699281281190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_biobert_bc2gm_en_5.2.0_3.0_1699281281190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_biobert_bc2gm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_biobert_bc2gm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_biobert_bc2gm| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-BioBERT-BC2GM \ No newline at end of file From 7606aabf46395d4e451bebf961be0e4a0c30da14 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:33:27 +0700 Subject: [PATCH 169/667] Add model 2023-11-06-bert_ner_bert_base_ner_finetuned_ner_isu_en --- ..._ner_bert_base_ner_finetuned_ner_isu_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_finetuned_ner_isu_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_finetuned_ner_isu_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_finetuned_ner_isu_en.md new file mode 100644 index 00000000000000..07308f94eb5f07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_finetuned_ner_isu_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bert_base_ner_finetuned_ner_isu BertForTokenClassification from mcdzwil +author: John Snow Labs +name: bert_ner_bert_base_ner_finetuned_ner_isu +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_base_ner_finetuned_ner_isu` is a English model originally trained by mcdzwil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_ner_finetuned_ner_isu_en_5.2.0_3.0_1699283923478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_ner_finetuned_ner_isu_en_5.2.0_3.0_1699283923478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_ner_finetuned_ner_isu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_base_ner_finetuned_ner_isu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_ner_finetuned_ner_isu| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mcdzwil/bert-base-NER-finetuned-ner-ISU \ No newline at end of file From bfb8b94d776b06463b2972626faa01ddca926a9e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:34:27 +0700 Subject: [PATCH 170/667] Add model 2023-11-06-bert_sayula_popoluca_amharicwicpostag_en --- ...ert_sayula_popoluca_amharicwicpostag_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag_en.md new file mode 100644 index 00000000000000..1da102d37b94ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_amharicwicpostag BertForTokenClassification from mitiku +author: John Snow Labs +name: bert_sayula_popoluca_amharicwicpostag +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_amharicwicpostag` is a English model originally trained by mitiku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_amharicwicpostag_en_5.2.0_3.0_1699299094755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_amharicwicpostag_en_5.2.0_3.0_1699299094755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_amharicwicpostag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_amharicwicpostag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_amharicwicpostag| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/mitiku/AmharicWICPostag \ No newline at end of file From 0ee229de7c2d48c1454eb0cac00f5397243e58b0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:35:28 +0700 Subject: [PATCH 171/667] Add model 2023-11-06-bert_ner_keyword_tag_model_2000_en --- ...1-06-bert_ner_keyword_tag_model_2000_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_en.md new file mode 100644 index 00000000000000..095debdb1db259 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_2000_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_2000 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-2000` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_2000_en_5.2.0_3.0_1699295816871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_2000_en_5.2.0_3.0_1699295816871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_2000","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_2000","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.keyword_tag_model_2000.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_2000| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-2000 \ No newline at end of file From c516f055af5ab1d0fceda7d19eeab8da21b4ae10 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:36:28 +0700 Subject: [PATCH 172/667] Add model 2023-11-06-bert_ner_rubert_ner_toxicity_en --- ...3-11-06-bert_ner_rubert_ner_toxicity_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_ner_toxicity_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_ner_toxicity_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_ner_toxicity_en.md new file mode 100644 index 00000000000000..baabb2de281c1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rubert_ner_toxicity_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from tesemnikov-av) +author: John Snow Labs +name: bert_ner_rubert_ner_toxicity +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `rubert-ner-toxicity` is a English model originally trained by `tesemnikov-av`. + +## Predicted Entities + +`TOXIC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_rubert_ner_toxicity_en_5.2.0_3.0_1699299371188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_rubert_ner_toxicity_en_5.2.0_3.0_1699299371188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rubert_ner_toxicity","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rubert_ner_toxicity","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.toxic.by_tesemnikov_av").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_rubert_ner_toxicity| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|43.8 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/tesemnikov-av/rubert-ner-toxicity \ No newline at end of file From ea2c30cb0fc910dbf6703efc38e9af2a9c88f941 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:37:28 +0700 Subject: [PATCH 173/667] Add model 2023-11-06-bert_ner_bert_small_finetuned_typo_detection_en --- ..._bert_small_finetuned_typo_detection_en.md | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_small_finetuned_typo_detection_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_small_finetuned_typo_detection_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_small_finetuned_typo_detection_en.md new file mode 100644 index 00000000000000..171b565bef559d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_small_finetuned_typo_detection_en.md @@ -0,0 +1,117 @@ +--- +layout: model +title: English Named Entity Recognition (from mrm8488) +author: John Snow Labs +name: bert_ner_bert_small_finetuned_typo_detection +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `bert-small-finetuned-typo-detection` is a English model orginally trained by `mrm8488`. + +## Predicted Entities + +`typo`, `ok` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_small_finetuned_typo_detection_en_5.2.0_3.0_1699290367344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_small_finetuned_typo_detection_en_5.2.0_3.0_1699290367344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_small_finetuned_typo_detection","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_small_finetuned_typo_detection","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.small_finetuned").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_small_finetuned_typo_detection| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|41.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mrm8488/bert-small-finetuned-typo-detection +- https://github.com/mhagiwara/github-typo-corpus +- https://github.com/mhagiwara/github-typo-corpus +- https://twitter.com/mrm8488 \ No newline at end of file From 3e100f11b6a2adbadb346ac061f3a0a1bdf5a57f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:38:29 +0700 Subject: [PATCH 174/667] Add model 2023-11-06-bert_ner_rdchambers_bert_finetuned_ner_en --- ...rt_ner_rdchambers_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_rdchambers_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rdchambers_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rdchambers_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..49a02b3489f078 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_rdchambers_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from rdchambers) +author: John Snow Labs +name: bert_ner_rdchambers_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `rdchambers`. + +## Predicted Entities + +`Filler`, `Null` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_rdchambers_bert_finetuned_ner_en_5.2.0_3.0_1699299487426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_rdchambers_bert_finetuned_ner_en_5.2.0_3.0_1699299487426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rdchambers_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_rdchambers_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.rdchambers.by_rdchambers").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_rdchambers_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/rdchambers/bert-finetuned-ner \ No newline at end of file From 52a8aa6ed50f1dbe80783b13c2ef80aace1735f8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:39:29 +0700 Subject: [PATCH 175/667] Add model 2023-11-06-bert_sayula_popoluca_bert_large_japanese_luw_upos_ja --- ...opoluca_bert_large_japanese_luw_upos_ja.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_luw_upos_ja.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_luw_upos_ja.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_luw_upos_ja.md new file mode 100644 index 00000000000000..1e1a8a3643440e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_luw_upos_ja.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Japanese bert_sayula_popoluca_bert_large_japanese_luw_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_large_japanese_luw_upos +date: 2023-11-06 +tags: [bert, ja, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ja +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_large_japanese_luw_upos` is a Japanese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_japanese_luw_upos_ja_5.2.0_3.0_1699299518567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_japanese_luw_upos_ja_5.2.0_3.0_1699299518567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_large_japanese_luw_upos","ja") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_large_japanese_luw_upos", "ja") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_large_japanese_luw_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ja| +|Size:|1.2 GB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-large-japanese-luw-upos \ No newline at end of file From 77672aab82deaf4495a916247c652a20ac6dd255 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:40:29 +0700 Subject: [PATCH 176/667] Add model 2023-11-06-bert_ner_wlt_bluebert_ncbi_en --- ...023-11-06-bert_ner_wlt_bluebert_ncbi_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_bluebert_ncbi_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_bluebert_ncbi_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_bluebert_ncbi_en.md new file mode 100644 index 00000000000000..f618f767685d6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_bluebert_ncbi_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_wlt_bluebert_ncbi BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_wlt_bluebert_ncbi +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_wlt_bluebert_ncbi` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_wlt_bluebert_ncbi_en_5.2.0_3.0_1699282747580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_wlt_bluebert_ncbi_en_5.2.0_3.0_1699282747580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_wlt_bluebert_ncbi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_wlt_bluebert_ncbi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_wlt_bluebert_ncbi| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/WLT-BlueBERT-NCBI \ No newline at end of file From a4e4868939a23af6801b8ab746895eec42933c80 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:41:29 +0700 Subject: [PATCH 177/667] Add model 2023-11-06-bert_ner_leander_bert_finetuned_ner_en --- ...-bert_ner_leander_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_leander_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_leander_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_leander_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..c4679f53515049 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_leander_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from leander) +author: John Snow Labs +name: bert_ner_leander_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `leander`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_leander_bert_finetuned_ner_en_5.2.0_3.0_1699296106567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_leander_bert_finetuned_ner_en_5.2.0_3.0_1699296106567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_leander_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_leander_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_leander").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_leander_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/leander/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 9d5f8c3d1be3d470348443a880ecbddcc790c1b3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:42:30 +0700 Subject: [PATCH 178/667] Add model 2023-11-06-bert_sayula_popoluca_clnspantagger_en --- ...6-bert_sayula_popoluca_clnspantagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_clnspantagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_clnspantagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_clnspantagger_en.md new file mode 100644 index 00000000000000..c8ed41a6bccfff --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_clnspantagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_clnspantagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_clnspantagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_clnspantagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_clnspantagger_en_5.2.0_3.0_1699299582578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_clnspantagger_en_5.2.0_3.0_1699299582578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_clnspantagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_clnspantagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_clnspantagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/RJ3vans/CLNspanTagger \ No newline at end of file From 55f3deead23e7f35767de9780cbb51a4fb40603f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:43:30 +0700 Subject: [PATCH 179/667] Add model 2023-11-06-bert_ner_russellc_bert_finetuned_ner_accelerate_en --- ...ssellc_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..4d026ae89a5c81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_russellc_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from russellc) +author: John Snow Labs +name: bert_ner_russellc_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `russellc`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_russellc_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699299752124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_russellc_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699299752124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_russellc_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_russellc_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_russellc").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_russellc_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/russellc/bert-finetuned-ner-accelerate \ No newline at end of file From 23cb84b9fad5423a4e9c5b10bdd78b02ba2a916b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:44:30 +0700 Subject: [PATCH 180/667] Add model 2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en --- ...bert_english_uncased_finetuned_chunk_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en.md new file mode 100644 index 00000000000000..2e604f2d541c88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_english_uncased_finetuned_chunk BertForTokenClassification from vblagoje +author: John Snow Labs +name: bert_sayula_popoluca_bert_english_uncased_finetuned_chunk +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_english_uncased_finetuned_chunk` is a English model originally trained by vblagoje. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en_5.2.0_3.0_1699298884738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_english_uncased_finetuned_chunk_en_5.2.0_3.0_1699298884738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_english_uncased_finetuned_chunk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_english_uncased_finetuned_chunk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_english_uncased_finetuned_chunk| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/vblagoje/bert-english-uncased-finetuned-chunk \ No newline at end of file From d1bc14dcf61ae58145d05933779a4bcbe38c56dd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:45:31 +0700 Subject: [PATCH 181/667] Add model 2023-11-06-bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da --- ...ert_punct_restoration_danish_alvenir_da.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da.md new file mode 100644 index 00000000000000..2c28dffc714c5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Danish bert_sayula_popoluca_bert_punct_restoration_danish_alvenir BertForTokenClassification from Alvenir +author: John Snow Labs +name: bert_sayula_popoluca_bert_punct_restoration_danish_alvenir +date: 2023-11-06 +tags: [bert, da, open_source, token_classification, onnx] +task: Named Entity Recognition +language: da +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_punct_restoration_danish_alvenir` is a Danish model originally trained by Alvenir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da_5.2.0_3.0_1699299878516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_punct_restoration_danish_alvenir_da_5.2.0_3.0_1699299878516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_punct_restoration_danish_alvenir","da") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_punct_restoration_danish_alvenir", "da") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_punct_restoration_danish_alvenir| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|da| +|Size:|412.3 MB| + +## References + +https://huggingface.co/Alvenir/bert-punct-restoration-da \ No newline at end of file From 493f9b8340e66e69bee32982c965432a2a97f313 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:46:31 +0700 Subject: [PATCH 182/667] Add model 2023-11-06-bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en --- ...scibert_scivocab_uncased_ft_sdu21_ai_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en.md new file mode 100644 index 00000000000000..8e753a79277ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_scibert_scivocab_uncased_ft_sdu21_ai BertForTokenClassification from napsternxg +author: John Snow Labs +name: bert_ner_scibert_scivocab_uncased_ft_sdu21_ai +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_scibert_scivocab_uncased_ft_sdu21_ai` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en_5.2.0_3.0_1699299937740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_ft_sdu21_ai_en_5.2.0_3.0_1699299937740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_uncased_ft_sdu21_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_scibert_scivocab_uncased_ft_sdu21_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_scibert_scivocab_uncased_ft_sdu21_ai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/napsternxg/scibert_scivocab_uncased_ft_SDU21_AI \ No newline at end of file From ff770b109100cd5016798c903be7432ea6fe808c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:47:31 +0700 Subject: [PATCH 183/667] Add model 2023-11-06-bert_ner_bert_base_uncased_swahili_macrolanguage_sw --- ...t_base_uncased_swahili_macrolanguage_sw.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_swahili_macrolanguage_sw.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_swahili_macrolanguage_sw.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_swahili_macrolanguage_sw.md new file mode 100644 index 00000000000000..676fb66eae9131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_uncased_swahili_macrolanguage_sw.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Swahili (macrolanguage) bert_ner_bert_base_uncased_swahili_macrolanguage BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_bert_base_uncased_swahili_macrolanguage +date: 2023-11-06 +tags: [bert, sw, open_source, token_classification, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_base_uncased_swahili_macrolanguage` is a Swahili (macrolanguage) model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_uncased_swahili_macrolanguage_sw_5.2.0_3.0_1699286179382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_uncased_swahili_macrolanguage_sw_5.2.0_3.0_1699286179382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_uncased_swahili_macrolanguage","sw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_base_uncased_swahili_macrolanguage", "sw") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_uncased_swahili_macrolanguage| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|403.7 MB| + +## References + +https://huggingface.co/arnolfokam/bert-base-uncased-swa \ No newline at end of file From 564d1a03e9b7664c757acc3d6142c9913f6acf4d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:48:31 +0700 Subject: [PATCH 184/667] Add model 2023-11-06-bert_ner_phijve_bert_finetuned_ner_en --- ...6-bert_ner_phijve_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_phijve_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_phijve_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_phijve_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..1c6a9ccd496e39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_phijve_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from phijve) +author: John Snow Labs +name: bert_ner_phijve_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `phijve`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_phijve_bert_finetuned_ner_en_5.2.0_3.0_1699299193636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_phijve_bert_finetuned_ner_en_5.2.0_3.0_1699299193636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_phijve_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_phijve_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_phijve").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_phijve_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/phijve/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From ceabc6c617b0afbc209bfd58b2371e997c7f7925 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:49:31 +0700 Subject: [PATCH 185/667] Add model 2023-11-06-bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en --- ..._large_cased_finetuned_ade_corpus_v2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en.md new file mode 100644 index 00000000000000..9cf848012d0767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English Named Entity Recognition (from abhibisht89) +author: John Snow Labs +name: bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2 +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `spanbert-large-cased-finetuned-ade_corpus_v2` is a English model orginally trained by `abhibisht89`. + +## Predicted Entities + +`DRUG`, `ADR` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en_5.2.0_3.0_1699300075479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2_en_5.2.0_3.0_1699300075479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.span_bert.cased_v2_large_finetuned_adverse_drug_event").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_spanbert_large_cased_finetuned_ade_corpus_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/abhibisht89/spanbert-large-cased-finetuned-ade_corpus_v2 \ No newline at end of file From 6a55b0c5f46e430d7661d9fdf5aefba260e0bcbb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:50:32 +0700 Subject: [PATCH 186/667] Add model 2023-11-06-bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de --- ...ert_punct_restoration_german_alvenir_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de.md new file mode 100644 index 00000000000000..2165d37f60533c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German bert_sayula_popoluca_bert_punct_restoration_german_alvenir BertForTokenClassification from Alvenir +author: John Snow Labs +name: bert_sayula_popoluca_bert_punct_restoration_german_alvenir +date: 2023-11-06 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_punct_restoration_german_alvenir` is a German model originally trained by Alvenir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de_5.2.0_3.0_1699300067718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_punct_restoration_german_alvenir_de_5.2.0_3.0_1699300067718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_punct_restoration_german_alvenir","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_punct_restoration_german_alvenir", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_punct_restoration_german_alvenir| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.8 MB| + +## References + +https://huggingface.co/Alvenir/bert-punct-restoration-de \ No newline at end of file From eb42f865411cb730185a37eb2bdb82d5499b5bce Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:51:32 +0700 Subject: [PATCH 187/667] Add model 2023-11-06-bert_ner_ner_nerd_fine_en --- .../2023-11-06-bert_ner_ner_nerd_fine_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_fine_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_fine_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_fine_en.md new file mode 100644 index 00000000000000..01c6b58e537748 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_nerd_fine_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from ramybaly) +author: John Snow Labs +name: bert_ner_ner_nerd_fine +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `ner_nerd_fine` is a English model originally trained by `ramybaly`. + +## Predicted Entities + +`MISC_educationaldegree`, `ORG_other`, `BUILDING_restaurant`, `MISC_law`, `LOC_mountain`, `ART_other`, `MISC_medical`, `LOC_other`, `PER_athlete`, `PRODUCT_food`, `MISC_god`, `BUILDING_theater`, `LOC_GPE`, `ORG_media/newspaper`, `PRODUCT_other`, `ORG_government/governmentagency`, `PRODUCT_airplane`, `PRODUCT_software`, `BUILDING_other`, `ART_film`, `LOC_park`, `LOC_road/railway/highway/transit`, `PER_soldier`, `PRODUCT_weapon`, `EVENT_other`, `ORG_sportsleague`, `PRODUCT_train`, `PER_other`, `PER_politician`, `EVENT_election`, `ORG_company`, `PER_director`, `BUILDING_sportsfacility`, `ART_painting`, `BUILDING_airport`, `ART_music`, `LOC_island`, `ORG_politicalparty`, `MISC_award`, `PRODUCT_ship`, `BUILDING_hospital`, `ORG_sportsteam`, `MISC_livingthing`, `MISC_astronomything`, `BUILDING_hotel`, `MISC_language`, `EVENT_attack/battle/war/militaryconflict`, `LOC_bodiesofwater`, `EVENT_sportsevent`, `ORG_religion`, `PRODUCT_car`, `BUILDING_library`, `ORG_education`, `MISC_disease`, `MISC_currency`, `PER_scholar`, `EVENT_disaster`, `PRODUCT_game`, `PER_artist/author`, `ART_writtenart`, `EVENT_protest`, `MISC_chemicalthing`, `PER_actor`, `MISC_biologything`, `ART_broadcastprogram`, `ORG_showorganization` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_nerd_fine_en_5.2.0_3.0_1699295857916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_nerd_fine_en_5.2.0_3.0_1699295857916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_nerd_fine","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_nerd_fine","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.nerd_fine.by_ramybaly").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_nerd_fine| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ramybaly/ner_nerd_fine \ No newline at end of file From aa9a588fbc55eef736e9c221fc21c0666a666412 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:52:32 +0700 Subject: [PATCH 188/667] Add model 2023-11-06-bert_ner_wende_bert_finetuned_ner_en --- ...06-bert_ner_wende_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_wende_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wende_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wende_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..22cd78142ac9a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wende_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_wende_bert_finetuned_ner BertForTokenClassification from Wende +author: John Snow Labs +name: bert_ner_wende_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_wende_bert_finetuned_ner` is a English model originally trained by Wende. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_wende_bert_finetuned_ner_en_5.2.0_3.0_1699284307641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_wende_bert_finetuned_ner_en_5.2.0_3.0_1699284307641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_wende_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_wende_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_wende_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Wende/bert-finetuned-ner \ No newline at end of file From 8ffe8c121d3601ee5c0d449eb8d3fd393ae67c81 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:53:32 +0700 Subject: [PATCH 189/667] Add model 2023-11-06-bert_ner_siegelou_bert_finetuned_ner_en --- ...bert_ner_siegelou_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_siegelou_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_siegelou_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_siegelou_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..8b60d2ff1fc763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_siegelou_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from siegelou) +author: John Snow Labs +name: bert_ner_siegelou_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `siegelou`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_siegelou_bert_finetuned_ner_en_5.2.0_3.0_1699299179962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_siegelou_bert_finetuned_ner_en_5.2.0_3.0_1699299179962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_siegelou_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_siegelou_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_siegelou").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_siegelou_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/siegelou/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From c1fb68cd7245f01b84a857cd64693790f47a9da4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:54:32 +0700 Subject: [PATCH 190/667] Add model 2023-11-06-bert_ner_spasis_bert_finetuned_ner_accelerate_en --- ...spasis_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..a8648009c8c1e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from spasis) +author: John Snow Labs +name: bert_ner_spasis_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `spasis`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_spasis_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699300386668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_spasis_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699300386668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_spasis_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_spasis_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_spasis").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_spasis_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/spasis/bert-finetuned-ner-accelerate \ No newline at end of file From dbca1ba378561fc9085ab73bb5b9e684d9a72de1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:55:32 +0700 Subject: [PATCH 191/667] Add model 2023-11-06-bert_sayula_popoluca_tetra_tag_english_kitaev_en --- ...la_popoluca_tetra_tag_english_kitaev_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tetra_tag_english_kitaev_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tetra_tag_english_kitaev_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tetra_tag_english_kitaev_en.md new file mode 100644 index 00000000000000..25c77c04833882 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tetra_tag_english_kitaev_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tetra_tag_english_kitaev BertForTokenClassification from kitaev +author: John Snow Labs +name: bert_sayula_popoluca_tetra_tag_english_kitaev +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tetra_tag_english_kitaev` is a English model originally trained by kitaev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tetra_tag_english_kitaev_en_5.2.0_3.0_1699300423265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tetra_tag_english_kitaev_en_5.2.0_3.0_1699300423265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tetra_tag_english_kitaev","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tetra_tag_english_kitaev", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tetra_tag_english_kitaev| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kitaev/tetra-tag-en \ No newline at end of file From 662eb2a8e3995a5c68891419fc81dbe134678855 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:56:33 +0700 Subject: [PATCH 192/667] Add model 2023-11-06-bert_ner_spasis_bert_finetuned_ner_en --- ...6-bert_ner_spasis_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..41679ca52beb96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_spasis_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from spasis) +author: John Snow Labs +name: bert_ner_spasis_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `spasis`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_spasis_bert_finetuned_ner_en_5.2.0_3.0_1699300428300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_spasis_bert_finetuned_ner_en_5.2.0_3.0_1699300428300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_spasis_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_spasis_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_spasis").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_spasis_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/spasis/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 0820adb760f220dcd6b131cba14e63f5abd67f88 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:57:33 +0700 Subject: [PATCH 193/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar --- ...ic_camelbert_mix_sayula_popoluca_glf_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar.md new file mode 100644 index 00000000000000..b481bc253a7576 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar_5.2.0_3.0_1699300623702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf_ar_5.2.0_3.0_1699300623702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_mix_sayula_popoluca_glf| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.7 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-mix-pos-glf \ No newline at end of file From 0cf3fbc89df2bd4dd06823e5198083165224c3bb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:58:33 +0700 Subject: [PATCH 194/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en --- ...ayula_popoluca_tiny_lr_kazakh_kktoto_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en.md new file mode 100644 index 00000000000000..49bc45967fe5f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_lr_kazakh_kktoto BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_lr_kazakh_kktoto +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_lr_kazakh_kktoto` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en_5.2.0_3.0_1699300652184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_lr_kazakh_kktoto_en_5.2.0_3.0_1699300652184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_lr_kazakh_kktoto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_lr_kazakh_kktoto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_lr_kazakh_kktoto| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_lr_kk \ No newline at end of file From 19da5727e57f163a0e16e26e26b4d9595243a77b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 02:59:34 +0700 Subject: [PATCH 195/667] Add model 2023-11-06-bert_ner_sysformbatches2acs_en --- ...23-11-06-bert_ner_sysformbatches2acs_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_sysformbatches2acs_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_sysformbatches2acs_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_sysformbatches2acs_en.md new file mode 100644 index 00000000000000..eca13680420fb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_sysformbatches2acs_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from blckwdw61) +author: John Snow Labs +name: bert_ner_sysformbatches2acs +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `sysformbatches2acs` is a English model originally trained by `blckwdw61`. + +## Predicted Entities + +`SYSTEMATIC`, `FORMULA` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_sysformbatches2acs_en_5.2.0_3.0_1699300725442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_sysformbatches2acs_en_5.2.0_3.0_1699300725442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_sysformbatches2acs","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_sysformbatches2acs","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_blckwdw61").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_sysformbatches2acs| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/blckwdw61/sysformbatches2acs \ No newline at end of file From 9c09546923184798fdc60e469489a46e4684fb2c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:00:34 +0700 Subject: [PATCH 196/667] Add model 2023-11-06-bert_ner_stefan_jo_bert_finetuned_ner_en --- ...ert_ner_stefan_jo_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_stefan_jo_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_stefan_jo_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_stefan_jo_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..8fe63876725192 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_stefan_jo_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from stefan-jo) +author: John Snow Labs +name: bert_ner_stefan_jo_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `stefan-jo`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_stefan_jo_bert_finetuned_ner_en_5.2.0_3.0_1699300785655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_stefan_jo_bert_finetuned_ner_en_5.2.0_3.0_1699300785655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_stefan_jo_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_stefan_jo_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_stefan_jo").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_stefan_jo_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/stefan-jo/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From fff98b9ed9aa2cb02231307494d151b2e9caeb9d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:01:34 +0700 Subject: [PATCH 197/667] Add model 2023-11-06-bert_ner_bert_base_ner_en --- .../2023-11-06-bert_ner_bert_base_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_en.md new file mode 100644 index 00000000000000..d3bc1a0aef52b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bert_base_ner BertForTokenClassification from dslim +author: John Snow Labs +name: bert_ner_bert_base_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_base_ner` is a English model originally trained by dslim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_ner_en_5.2.0_3.0_1699283745489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_ner_en_5.2.0_3.0_1699283745489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_base_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/dslim/bert-base-NER \ No newline at end of file From 3fa49bdb91834b595391071d451b3921614a8ef5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:02:35 +0700 Subject: [PATCH 198/667] Add model 2023-11-06-bert_ner_craft_chem_imbalanced_scibert_en --- ...rt_ner_craft_chem_imbalanced_scibert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_imbalanced_scibert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_imbalanced_scibert_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_imbalanced_scibert_en.md new file mode 100644 index 00000000000000..feabaca066f5d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_chem_imbalanced_scibert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_craft_chem_imbalanced_scibert BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_craft_chem_imbalanced_scibert +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_craft_chem_imbalanced_scibert` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_craft_chem_imbalanced_scibert_en_5.2.0_3.0_1699279571759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_craft_chem_imbalanced_scibert_en_5.2.0_3.0_1699279571759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_craft_chem_imbalanced_scibert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_craft_chem_imbalanced_scibert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_craft_chem_imbalanced_scibert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/ghadeermobasher/CRAFT-Chem_Imbalanced-SciBERT \ No newline at end of file From f156f1b638fa574a91f499fad99bb4f8376fe50f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:03:35 +0700 Subject: [PATCH 199/667] Add model 2023-11-06-bert_ner_mbert_base_uncased_swahili_macrolanguage_swa --- ..._base_uncased_swahili_macrolanguage_swa.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_swahili_macrolanguage_swa.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_swahili_macrolanguage_swa.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_swahili_macrolanguage_swa.md new file mode 100644 index 00000000000000..a23ebb6c0de0c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_swahili_macrolanguage_swa.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Swahili (macrolanguage) bert_ner_mbert_base_uncased_swahili_macrolanguage BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_mbert_base_uncased_swahili_macrolanguage +date: 2023-11-06 +tags: [bert, swa, open_source, token_classification, onnx] +task: Named Entity Recognition +language: swa +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_uncased_swahili_macrolanguage` is a Swahili (macrolanguage) model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_swahili_macrolanguage_swa_5.2.0_3.0_1699297744554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_swahili_macrolanguage_swa_5.2.0_3.0_1699297744554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_uncased_swahili_macrolanguage","swa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_uncased_swahili_macrolanguage", "swa") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_uncased_swahili_macrolanguage| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|swa| +|Size:|665.1 MB| + +## References + +https://huggingface.co/arnolfokam/mbert-base-uncased-swa \ No newline at end of file From 2559c5a257de51d0004ef504288790b90b7b175b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:04:35 +0700 Subject: [PATCH 200/667] Add model 2023-11-06-bert_ner_temporal_tagger_bert_tokenclassifier_en --- ...temporal_tagger_bert_tokenclassifier_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_temporal_tagger_bert_tokenclassifier_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_temporal_tagger_bert_tokenclassifier_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_temporal_tagger_bert_tokenclassifier_en.md new file mode 100644 index 00000000000000..60afd0b7c6cd7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_temporal_tagger_bert_tokenclassifier_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_temporal_tagger_bert_tokenclassifier BertForTokenClassification from satyaalmasian +author: John Snow Labs +name: bert_ner_temporal_tagger_bert_tokenclassifier +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_temporal_tagger_bert_tokenclassifier` is a English model originally trained by satyaalmasian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_temporal_tagger_bert_tokenclassifier_en_5.2.0_3.0_1699300992865.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_temporal_tagger_bert_tokenclassifier_en_5.2.0_3.0_1699300992865.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_temporal_tagger_bert_tokenclassifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_temporal_tagger_bert_tokenclassifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_temporal_tagger_bert_tokenclassifier| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/satyaalmasian/temporal_tagger_BERT_tokenclassifier \ No newline at end of file From 9d8c2508a9e7532165d2cf262488e9df7c2e4dd2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:05:36 +0700 Subject: [PATCH 201/667] Add model 2023-11-06-bert_ner_wikineural_multilingual_ner_nl --- ...bert_ner_wikineural_multilingual_ner_nl.md | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_wikineural_multilingual_ner_nl.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wikineural_multilingual_ner_nl.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wikineural_multilingual_ner_nl.md new file mode 100644 index 00000000000000..aae7953686a0db --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wikineural_multilingual_ner_nl.md @@ -0,0 +1,117 @@ +--- +layout: model +title: Dutch Named Entity Recognition (from Babelscape) +author: John Snow Labs +name: bert_ner_wikineural_multilingual_ner +date: 2023-11-06 +tags: [bert, ner, token_classification, nl, open_source, onnx] +task: Named Entity Recognition +language: nl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `wikineural-multilingual-ner` is a Dutch model orginally trained by `Babelscape`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_wikineural_multilingual_ner_nl_5.2.0_3.0_1699300983486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_wikineural_multilingual_ner_nl_5.2.0_3.0_1699300983486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_wikineural_multilingual_ner","nl") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ik hou van Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_wikineural_multilingual_ner","nl") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ik hou van Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("nl.ner.bert.wikineural.multilingual").predict("""Ik hou van Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_wikineural_multilingual_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Babelscape/wikineural-multilingual-ner +- https://github.com/Babelscape/wikineural +- https://aclanthology.org/2021.findings-emnlp.215/ +- https://creativecommons.org/licenses/by-nc-sa/4.0/ \ No newline at end of file From a0abbcc966220f287d819588ff1d4363923ddcb7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:06:35 +0700 Subject: [PATCH 202/667] Add model 2023-11-06-bert_ner_tinybert_fincorp_en --- ...2023-11-06-bert_ner_tinybert_fincorp_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_fincorp_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_fincorp_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_fincorp_en.md new file mode 100644 index 00000000000000..f31d9de4fc7404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_fincorp_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Tiny Cased model (from satyamrajawat1994) +author: John Snow Labs +name: bert_ner_tinybert_fincorp +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tinybert-fincorp` is a English model originally trained by `satyamrajawat1994`. + +## Predicted Entities + +`Fin_Corp` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tinybert_fincorp_en_5.2.0_3.0_1699301146413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tinybert_fincorp_en_5.2.0_3.0_1699301146413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tinybert_fincorp","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tinybert_fincorp","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.tiny.by_satyamrajawat1994").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tinybert_fincorp| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/satyamrajawat1994/tinybert-fincorp \ No newline at end of file From dd31f7aff6ca6a20d2aaff6f85a19dc5bc65c628 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:07:36 +0700 Subject: [PATCH 203/667] Add model 2023-11-06-bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt --- ...a_autonlp_sayula_popoluca_tag_bosque_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt.md new file mode 100644 index 00000000000000..7ca79df5cab69e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque BertForTokenClassification from Emanuel +author: John Snow Labs +name: bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque +date: 2023-11-06 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque` is a Portuguese model originally trained by Emanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt_5.2.0_3.0_1699300286479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque_pt_5.2.0_3.0_1699300286479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_autonlp_sayula_popoluca_tag_bosque| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Emanuel/autonlp-pos-tag-bosque \ No newline at end of file From 801fb12e6deb9c15f57dc26c1048b2d8b4ed578d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:08:36 +0700 Subject: [PATCH 204/667] Add model 2023-11-06-bert_token_classifier_autotrain_gro_ner_en --- ...t_token_classifier_autotrain_gro_ner_en.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_gro_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_gro_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_gro_ner_en.md new file mode 100644 index 00000000000000..3c5f8dd644ffce --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_gro_ner_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Wanjiru) +author: John Snow Labs +name: bert_token_classifier_autotrain_gro_ner +date: 2023-11-06 +tags: [en, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain_gro_ner` is a English model originally trained by `Wanjiru`. + +## Predicted Entities + +`METRIC`, `ITEM` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_gro_ner_en_5.2.0_3.0_1699301248192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_gro_ner_en_5.2.0_3.0_1699301248192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_autotrain_gro_ner","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_autotrain_gro_ner","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_autotrain_gro_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Wanjiru/autotrain_gro_ner \ No newline at end of file From 9fd9c717d1e1a6f8a8a81629f3ee36fab0e748c6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:09:37 +0700 Subject: [PATCH 205/667] Add model 2023-11-06-bert_sayula_popoluca_bert_large_japanese_upos_ja --- ...la_popoluca_bert_large_japanese_upos_ja.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_upos_ja.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_upos_ja.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_upos_ja.md new file mode 100644 index 00000000000000..6dee7380b9cdf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_japanese_upos_ja.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Japanese bert_sayula_popoluca_bert_large_japanese_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_large_japanese_upos +date: 2023-11-06 +tags: [bert, ja, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ja +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_large_japanese_upos` is a Japanese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_japanese_upos_ja_5.2.0_3.0_1699301302690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_japanese_upos_ja_5.2.0_3.0_1699301302690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_large_japanese_upos","ja") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_large_japanese_upos", "ja") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_large_japanese_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ja| +|Size:|1.2 GB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-large-japanese-upos \ No newline at end of file From 69ccf887f327c2910c54ac633e34f7bb1418bf51 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:10:37 +0700 Subject: [PATCH 206/667] Add model 2023-11-06-bert_ner_silpa_wikineural_multilingual_ner_en --- ...er_silpa_wikineural_multilingual_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_silpa_wikineural_multilingual_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_silpa_wikineural_multilingual_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_silpa_wikineural_multilingual_ner_en.md new file mode 100644 index 00000000000000..b261af9e09aa1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_silpa_wikineural_multilingual_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from silpa) +author: John Snow Labs +name: bert_ner_silpa_wikineural_multilingual_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `wikineural-multilingual-ner` is a English model originally trained by `silpa`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_silpa_wikineural_multilingual_ner_en_5.2.0_3.0_1699299492292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_silpa_wikineural_multilingual_ner_en_5.2.0_3.0_1699299492292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_silpa_wikineural_multilingual_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_silpa_wikineural_multilingual_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.wikineural.multilingual.by_silpa").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_silpa_wikineural_multilingual_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/silpa/wikineural-multilingual-ner \ No newline at end of file From 895e9eaad0be748c2f8a94404a2d89f619bb63a7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:11:37 +0700 Subject: [PATCH 207/667] Add model 2023-11-06-bert_ner_simple_transformer_en --- ...23-11-06-bert_ner_simple_transformer_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_simple_transformer_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_simple_transformer_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_simple_transformer_en.md new file mode 100644 index 00000000000000..e62d01344c4c58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_simple_transformer_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from kunalr63) +author: John Snow Labs +name: bert_ner_simple_transformer +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `simple_transformer` is a English model originally trained by `kunalr63`. + +## Predicted Entities + +`L-CLG`, `U-LOC`, `L-SKILLS`, `U-DESIG`, `U-SKILLS`, `L-ADDRESS`, `WORK_EXP`, `U-COMPANY`, `U-PER`, `L-EMAIL`, `DESIG`, `L-PER`, `L-LOC`, `LOC`, `COMPANY`, `L-QUALI`, `L-TRAIN`, `L-COMPANY`, `SCH`, `SKILLS`, `L-DESIG`, `L-WORK_EXP`, `L-SCH`, `U-SCH`, `CLG`, `L-HOBBI`, `L-EXPERIENCE`, `TRAIN`, `CERTIFICATION`, `QUALI`, `PHONE`, `U-CLG`, `U-EXPERIENCE`, `EMAIL`, `U-PHONE`, `PER`, `U-QUALI`, `L-CERTIFICATION`, `L-PHONE`, `HOBBI`, `U-EMAIL`, `ADDRESS`, `EXPERIENCE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_simple_transformer_en_5.2.0_3.0_1699300440938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_simple_transformer_en_5.2.0_3.0_1699300440938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_simple_transformer","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_simple_transformer","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_kunalr63").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_simple_transformer| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/kunalr63/simple_transformer \ No newline at end of file From 22f4dd2843d73363dc6d5811ed10b7615d1d2fc6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:12:37 +0700 Subject: [PATCH 208/667] Add model 2023-11-06-bert_ner_vikasaeta_bert_finetuned_ner_en --- ...ert_ner_vikasaeta_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_vikasaeta_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_vikasaeta_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_vikasaeta_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..2ad8946ae56d13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_vikasaeta_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from vikasaeta) +author: John Snow Labs +name: bert_ner_vikasaeta_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `vikasaeta`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_vikasaeta_bert_finetuned_ner_en_5.2.0_3.0_1699301417405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_vikasaeta_bert_finetuned_ner_en_5.2.0_3.0_1699301417405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_vikasaeta_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_vikasaeta_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_vikasaeta").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_vikasaeta_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/vikasaeta/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 7a75883b222fda1ff4cbad9bdaeb92ecf5403635 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:13:37 +0700 Subject: [PATCH 209/667] Add model 2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en --- ...ubmed_uncased_l_12_h_768_a_12_latest_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en.md new file mode 100644 index 00000000000000..f6b3c4ae4f01e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en_5.2.0_3.0_1699272396685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest_en_5.2.0_3.0_1699272396685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc5cdr_chem_modified_bluebert_pubmed_uncased_l_12_h_768_a_12_latest| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/BC5CDR-Chem-Modified_bluebert_pubmed_uncased_L-12_H-768_A-12_latest \ No newline at end of file From f5d4bd5f03ff26a0ae68c857e769ea5a2cfc2263 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:14:38 +0700 Subject: [PATCH 210/667] Add model 2023-11-06-bert_ner_mbateman_bert_finetuned_ner_en --- ...bert_ner_mbateman_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..1ba324a177a793 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mbateman) +author: John Snow Labs +name: bert_ner_mbateman_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `mbateman`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbateman_bert_finetuned_ner_en_5.2.0_3.0_1699297078432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbateman_bert_finetuned_ner_en_5.2.0_3.0_1699297078432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbateman_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbateman_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_mbateman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbateman_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mbateman/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From f44ae084f46bf2fea96e3b894e0ab7bcced66282 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:15:38 +0700 Subject: [PATCH 211/667] Add model 2023-11-06-bert_ner_tushar_rishav_bert_finetuned_ner_en --- ...ner_tushar_rishav_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tushar_rishav_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tushar_rishav_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tushar_rishav_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..c2ed9d3c70b73b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tushar_rishav_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from tushar-rishav) +author: John Snow Labs +name: bert_ner_tushar_rishav_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `tushar-rishav`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tushar_rishav_bert_finetuned_ner_en_5.2.0_3.0_1699301712518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tushar_rishav_bert_finetuned_ner_en_5.2.0_3.0_1699301712518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tushar_rishav_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tushar_rishav_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_tushar_rishav").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tushar_rishav_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/tushar-rishav/bert-finetuned-ner \ No newline at end of file From 93856468b5734ff48372cc26ca7a9540502c417c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:16:38 +0700 Subject: [PATCH 212/667] Add model 2023-11-06-bert_sayula_popoluca_4l_weight_decay_en --- ...bert_sayula_popoluca_4l_weight_decay_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_4l_weight_decay_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_4l_weight_decay_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_4l_weight_decay_en.md new file mode 100644 index 00000000000000..b4822704c3ea43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_4l_weight_decay_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_4l_weight_decay BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_4l_weight_decay +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_4l_weight_decay` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_4l_weight_decay_en_5.2.0_3.0_1699301742751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_4l_weight_decay_en_5.2.0_3.0_1699301742751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_4l_weight_decay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_4l_weight_decay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_4l_weight_decay| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/4L_weight_decay \ No newline at end of file From eeb1f5521c3dd6056ff467f0bb9ae0511210be22 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:17:39 +0700 Subject: [PATCH 213/667] Add model 2023-11-06-bert_token_classifier_base_turkish_cased_ner_tr --- ...en_classifier_base_turkish_cased_ner_tr.md | 102 ++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_turkish_cased_ner_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_turkish_cased_ner_tr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_turkish_cased_ner_tr.md new file mode 100644 index 00000000000000..3cbf0c84bdf90b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_turkish_cased_ner_tr.md @@ -0,0 +1,102 @@ +--- +layout: model +title: Turkish BertForTokenClassification Base Cased model (from akdeniz27) +author: John Snow Labs +name: bert_token_classifier_base_turkish_cased_ner +date: 2023-11-06 +tags: [tr, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-turkish-cased-ner` is a Turkish model originally trained by `akdeniz27`. + +## Predicted Entities + +`LOC`, `ORG`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_turkish_cased_ner_tr_5.2.0_3.0_1699301799959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_turkish_cased_ner_tr_5.2.0_3.0_1699301799959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_turkish_cased_ner","tr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_turkish_cased_ner","tr") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_turkish_cased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/akdeniz27/bert-base-turkish-cased-ner +- https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt +- https://ieeexplore.ieee.org/document/7495744 \ No newline at end of file From d25095e29a7b28772b3892341b37d3e6406171fb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:18:39 +0700 Subject: [PATCH 214/667] Add model 2023-11-06-bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it --- ...lian_cased_finetuned_sayula_popoluca_it.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it.md new file mode 100644 index 00000000000000..e9a4c032f2ab8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Italian bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca BertForTokenClassification from sachaarbonel +author: John Snow Labs +name: bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca +date: 2023-11-06 +tags: [bert, it, open_source, token_classification, onnx] +task: Named Entity Recognition +language: it +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca` is a Italian model originally trained by sachaarbonel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it_5.2.0_3.0_1699299194965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca_it_5.2.0_3.0_1699299194965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_italian_cased_finetuned_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|it| +|Size:|409.8 MB| + +## References + +https://huggingface.co/sachaarbonel/bert-italian-cased-finetuned-pos \ No newline at end of file From 31cb9b4723c83765743abe17d15d42c2f675973b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:19:39 +0700 Subject: [PATCH 215/667] Add model 2023-11-06-bert_sayula_popoluca_bert_finetuned_conll2003_pos_en --- ...opoluca_bert_finetuned_conll2003_pos_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_conll2003_pos_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_conll2003_pos_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_conll2003_pos_en.md new file mode 100644 index 00000000000000..4e5e694ce4f9a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_conll2003_pos_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_finetuned_conll2003_pos BertForTokenClassification from Tahsin +author: John Snow Labs +name: bert_sayula_popoluca_bert_finetuned_conll2003_pos +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_finetuned_conll2003_pos` is a English model originally trained by Tahsin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_finetuned_conll2003_pos_en_5.2.0_3.0_1699301917061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_finetuned_conll2003_pos_en_5.2.0_3.0_1699301917061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_finetuned_conll2003_pos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_finetuned_conll2003_pos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_finetuned_conll2003_pos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/Tahsin/BERT-finetuned-conll2003-POS \ No newline at end of file From 042c26eeb72664302e2110b6e70d7cbaafe0687b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:20:40 +0700 Subject: [PATCH 216/667] Add model 2023-11-06-bert_ner_xkang_bert_finetuned_ner_accelerate_en --- ..._xkang_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..76c95bd3cecca0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from xkang) +author: John Snow Labs +name: bert_ner_xkang_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `xkang`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_xkang_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699302026564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_xkang_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699302026564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_xkang_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_xkang_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_xkang").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_xkang_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/xkang/bert-finetuned-ner-accelerate \ No newline at end of file From c38739840cb68e3afb9a13499c696c0bc0c93c0a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:21:40 +0700 Subject: [PATCH 217/667] Add model 2023-11-06-bert_sayula_popoluca_ccvspantagger_en --- ...6-bert_sayula_popoluca_ccvspantagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ccvspantagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ccvspantagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ccvspantagger_en.md new file mode 100644 index 00000000000000..500bf7071655fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ccvspantagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_ccvspantagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_ccvspantagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_ccvspantagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_ccvspantagger_en_5.2.0_3.0_1699302072883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_ccvspantagger_en_5.2.0_3.0_1699302072883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_ccvspantagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_ccvspantagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_ccvspantagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/RJ3vans/CCVspanTagger \ No newline at end of file From 9aedd13f0aa94d4614312d3be444a50bf5bed084 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:22:40 +0700 Subject: [PATCH 218/667] Add model 2023-11-06-bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi --- ...eswitch_hineng_sayula_popoluca_lince_hi.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi.md new file mode 100644 index 00000000000000..2516c992015d07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Hindi bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince BertForTokenClassification from sagorsarker +author: John Snow Labs +name: bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince +date: 2023-11-06 +tags: [bert, hi, open_source, token_classification, onnx] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince` is a Hindi model originally trained by sagorsarker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi_5.2.0_3.0_1699302088739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince_hi_5.2.0_3.0_1699302088739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince","hi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince", "hi") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_codeswitch_hineng_sayula_popoluca_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|hi| +|Size:|665.1 MB| + +## References + +https://huggingface.co/sagorsarker/codeswitch-hineng-pos-lince \ No newline at end of file From 0063a213a9f8deaed1573e9501780f4ed58d734b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:23:40 +0700 Subject: [PATCH 219/667] Add model 2023-11-06-bert_ner_bert_ner_i2b2_en --- .../2023-11-06-bert_ner_bert_ner_i2b2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_i2b2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_i2b2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_i2b2_en.md new file mode 100644 index 00000000000000..07363e8e2c0da3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_ner_i2b2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from connorboyle) +author: John Snow Labs +name: bert_ner_bert_ner_i2b2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-ner-i2b2` is a English model originally trained by `connorboyle`. + +## Predicted Entities + +`STATE`, `ORGANIZATION`, `BIOID`, `HEALTHPLAN`, `PATIENT`, `COUNTRY`, `AGE`, `FAX`, `LOCATION`, `PHONE`, `IDNUM`, `DOCTOR`, `URL`, `DEVICE`, `STREET`, `DATE`, `ZIP`, `CITY`, `EMAIL`, `MEDICALRECORD`, `USERNAME`, `HOSPITAL`, `PROFESSION` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_ner_i2b2_en_5.2.0_3.0_1699290313908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_ner_i2b2_en_5.2.0_3.0_1699290313908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_ner_i2b2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_ner_i2b2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_connorboyle").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_ner_i2b2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/connorboyle/bert-ner-i2b2 \ No newline at end of file From 5ea3a48e71d9065088f3ce8a0970bc28a2107cae Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:24:41 +0700 Subject: [PATCH 220/667] Add model 2023-11-06-bert_ner_wlt_scibert_linnaeus_en --- ...-11-06-bert_ner_wlt_scibert_linnaeus_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_scibert_linnaeus_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_scibert_linnaeus_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_scibert_linnaeus_en.md new file mode 100644 index 00000000000000..55896db7f0e3fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_wlt_scibert_linnaeus_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_wlt_scibert_linnaeus BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_wlt_scibert_linnaeus +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_wlt_scibert_linnaeus` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_wlt_scibert_linnaeus_en_5.2.0_3.0_1699284109584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_wlt_scibert_linnaeus_en_5.2.0_3.0_1699284109584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_wlt_scibert_linnaeus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_wlt_scibert_linnaeus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_wlt_scibert_linnaeus| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/WLT-SciBERT-Linnaeus \ No newline at end of file From 5366b6297e20e2440f79565dc1a2126687935481 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:25:41 +0700 Subject: [PATCH 221/667] Add model 2023-11-06-bert_ner_lewtun_bert_finetuned_ner_en --- ...6-bert_ner_lewtun_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_lewtun_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_lewtun_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_lewtun_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..4c904da90c8b41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_lewtun_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from lewtun) +author: John Snow Labs +name: bert_ner_lewtun_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `lewtun`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_lewtun_bert_finetuned_ner_en_5.2.0_3.0_1699295206231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_lewtun_bert_finetuned_ner_en_5.2.0_3.0_1699295206231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_lewtun_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_lewtun_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_lewtun").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_lewtun_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/lewtun/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 07ae31a14bff2b4792b4bb441139ee4835836ebf Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:26:41 +0700 Subject: [PATCH 222/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar --- ...camelbert_danish_sayula_popoluca_egy_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar.md new file mode 100644 index 00000000000000..26a95fda9d1db5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar_5.2.0_3.0_1699302329367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy_ar_5.2.0_3.0_1699302329367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_egy| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.8 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-da-pos-egy \ No newline at end of file From b78a6bdd34880c603efecbcb6118cd0e3a3c4b5d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:27:41 +0700 Subject: [PATCH 223/667] Add model 2023-11-06-bert_ner_original_scibert_bc4chemd_o_en --- ...bert_ner_original_scibert_bc4chemd_o_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_o_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_o_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_o_en.md new file mode 100644 index 00000000000000..8e9603157c1ff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc4chemd_o_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_scibert_bc4chemd_o BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_scibert_bc4chemd_o +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_scibert_bc4chemd_o` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc4chemd_o_en_5.2.0_3.0_1699281473885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc4chemd_o_en_5.2.0_3.0_1699281473885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_scibert_bc4chemd_o","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_scibert_bc4chemd_o", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_scibert_bc4chemd_o| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-SciBERT-BC4CHEMD-O \ No newline at end of file From 734128a5da4122ebab49bcbefd6c80c3027d3a0c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:28:41 +0700 Subject: [PATCH 224/667] Add model 2023-11-06-bert_sayula_popoluca_cmn1spantagger_en --- ...-bert_sayula_popoluca_cmn1spantagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmn1spantagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmn1spantagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmn1spantagger_en.md new file mode 100644 index 00000000000000..389cf7ebf5b507 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_cmn1spantagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_cmn1spantagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_cmn1spantagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_cmn1spantagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_cmn1spantagger_en_5.2.0_3.0_1699302391956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_cmn1spantagger_en_5.2.0_3.0_1699302391956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_cmn1spantagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_cmn1spantagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_cmn1spantagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/RJ3vans/CMN1spanTagger \ No newline at end of file From f2b332bee1153b20003b28f398fb9e9aad44b7ac Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:29:42 +0700 Subject: [PATCH 225/667] Add model 2023-11-06-bert_token_classifier_est_morph_128_et --- ...-bert_token_classifier_est_morph_128_et.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_morph_128_et.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_morph_128_et.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_morph_128_et.md new file mode 100644 index 00000000000000..509b220a206806 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_morph_128_et.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Estonian BertForTokenClassification Cased model (from tartuNLP) +author: John Snow Labs +name: bert_token_classifier_est_morph_128 +date: 2023-11-06 +tags: [et, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: et +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `EstBERT_Morph_128` is a Estonian model originally trained by `tartuNLP`. + +## Predicted Entities + +`AdpType=Prep`, `VerbForm=Part`, `Case=Ade`, `PronType=Rel`, `Polarity=Neg`, `Degree=Pos`, `VerbForm=Inf`, `PronType=Ind`, `PronType=Tot`, `Case=Par`, `Abbr=Yes`, `Case=Nom`, `Foreign=Yes`, `_`, `PronType=Dem`, `NumType=Ord`, `Hyph=Yes`, `Connegative=Yes`, `AdpType=Post`, `NumType=Card`, `Number=Sing`, `VerbForm=Conv` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_est_morph_128_et_5.2.0_3.0_1699302426581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_est_morph_128_et_5.2.0_3.0_1699302426581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_est_morph_128","et") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_est_morph_128","et") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_est_morph_128| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|et| +|Size:|465.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/tartuNLP/EstBERT_Morph_128 \ No newline at end of file From 82c4220dbb4c987f79ab5b2ee93fb8eeab01fa34 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:30:42 +0700 Subject: [PATCH 226/667] Add model 2023-11-06-bert_token_classifier_base_chinese_ner_zh --- ...rt_token_classifier_base_chinese_ner_zh.md | 105 ++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ner_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ner_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ner_zh.md new file mode 100644 index 00000000000000..3c88b3e4a617fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ner_zh.md @@ -0,0 +1,105 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_chinese_ner +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-chinese-ner` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`S-WORK_OF_ART`, `S-TIME`, `E-FAC`, `S-PERCENT`, `S-PRODUCT`, `E-LANGUAGE`, `S-NORP`, `S-QUANTITY`, `S-PERSON`, `E-DATE`, `S-LOC`, `S-CARDINAL`, `E-QUANTITY`, `S-GPE`, `S-FAC`, `MONEY`, `S-ORG`, `E-NORP`, `E-GPE`, `E-TIME`, `EVENT`, `DATE`, `CARDINAL`, `FAC`, `E-PERCENT`, `E-PERSON`, `S-ORDINAL`, `NORP`, `LOC`, `E-ORG`, `E-MONEY`, `S-LAW`, `LAW`, `E-LOC`, `S-EVENT`, `ORG`, `TIME`, `ORDINAL`, `E-WORK_OF_ART`, `LANGUAGE`, `S-MONEY`, `E-ORDINAL`, `PERCENT`, `E-EVENT`, `S-LANGUAGE`, `E-PRODUCT`, `QUANTITY`, `WORK_OF_ART`, `E-LAW`, `S-DATE`, `PRODUCT`, `E-CARDINAL`, `PERSON`, `GPE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_chinese_ner_zh_5.2.0_3.0_1699302560261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_chinese_ner_zh_5.2.0_3.0_1699302560261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_chinese_ner","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_chinese_ner","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_chinese_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-chinese-ner +- https://github.com/ckiplab/ckip-transformers +- https://muyang.pro +- https://ckip.iis.sinica.edu.tw +- https://github.com/ckiplab/ckip-transformers +- https://github.com/ckiplab/ckip-transformers \ No newline at end of file From de85af98528702ae3c43186aa9df4e1e515c98bc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:31:42 +0700 Subject: [PATCH 227/667] Add model 2023-11-06-bert_sayula_popoluca_tahitian_punctuator_en --- ..._sayula_popoluca_tahitian_punctuator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tahitian_punctuator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tahitian_punctuator_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tahitian_punctuator_en.md new file mode 100644 index 00000000000000..fa745c8d203c46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tahitian_punctuator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tahitian_punctuator BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tahitian_punctuator +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tahitian_punctuator` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tahitian_punctuator_en_5.2.0_3.0_1699302308368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tahitian_punctuator_en_5.2.0_3.0_1699302308368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tahitian_punctuator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tahitian_punctuator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tahitian_punctuator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/kktoto/ty_punctuator \ No newline at end of file From 1324d8ce3610cc9633afebc29ba56fdb68e396c7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:32:42 +0700 Subject: [PATCH 228/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar --- ...camelbert_danish_sayula_popoluca_msa_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar.md new file mode 100644 index 00000000000000..d75e4e76ab380d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar_5.2.0_3.0_1699302654159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa_ar_5.2.0_3.0_1699302654159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_msa| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.8 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-da-pos-msa \ No newline at end of file From 465bc3263d5fdad7fc56f0c25e95796fc8090968 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:33:42 +0700 Subject: [PATCH 229/667] Add model 2023-11-06-bert_ner_yannis95_bert_finetuned_ner_en --- ...bert_ner_yannis95_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_yannis95_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yannis95_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yannis95_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..228ffd49d292bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yannis95_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from yannis95) +author: John Snow Labs +name: bert_ner_yannis95_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `yannis95`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_yannis95_bert_finetuned_ner_en_5.2.0_3.0_1699296385812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_yannis95_bert_finetuned_ner_en_5.2.0_3.0_1699296385812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_yannis95_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_yannis95_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_yannis95").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_yannis95_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/yannis95/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 45ceed934466d9520e0aa13ee82c18d738c06b12 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:34:43 +0700 Subject: [PATCH 230/667] Add model 2023-11-06-bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa --- ...e_uncased_ner_swahili_macrolanguage_swa.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa.md new file mode 100644 index 00000000000000..e097864bb90048 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Swahili (macrolanguage) bert_ner_mbert_base_uncased_ner_swahili_macrolanguage BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_mbert_base_uncased_ner_swahili_macrolanguage +date: 2023-11-06 +tags: [bert, swa, open_source, token_classification, onnx] +task: Named Entity Recognition +language: swa +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_uncased_ner_swahili_macrolanguage` is a Swahili (macrolanguage) model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa_5.2.0_3.0_1699295348491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_ner_swahili_macrolanguage_swa_5.2.0_3.0_1699295348491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_uncased_ner_swahili_macrolanguage","swa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_uncased_ner_swahili_macrolanguage", "swa") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_uncased_ner_swahili_macrolanguage| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|swa| +|Size:|665.1 MB| + +## References + +https://huggingface.co/arnolfokam/mbert-base-uncased-ner-swa \ No newline at end of file From fed5834172727b9d5f09fabeb6b81f313e16e2d8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:35:43 +0700 Subject: [PATCH 231/667] Add model 2023-11-06-bert_sayula_popoluca_ssccvspantagger_en --- ...bert_sayula_popoluca_ssccvspantagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ssccvspantagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ssccvspantagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ssccvspantagger_en.md new file mode 100644 index 00000000000000..d499b8840258bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_ssccvspantagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_ssccvspantagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_ssccvspantagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_ssccvspantagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_ssccvspantagger_en_5.2.0_3.0_1699300095426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_ssccvspantagger_en_5.2.0_3.0_1699300095426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_ssccvspantagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_ssccvspantagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_ssccvspantagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/RJ3vans/SSCCVspanTagger \ No newline at end of file From d75e73adfb5ea198c94d3c6948c52f68cb9448a9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:36:43 +0700 Subject: [PATCH 232/667] Add model 2023-11-06-bert_token_classifier_datafun_zh --- ...-11-06-bert_token_classifier_datafun_zh.md | 104 ++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_datafun_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_datafun_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_datafun_zh.md new file mode 100644 index 00000000000000..25f537355dca01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_datafun_zh.md @@ -0,0 +1,104 @@ +--- +layout: model +title: Chinese BertForTokenClassification Cased model (from canIjoin) +author: John Snow Labs +name: bert_token_classifier_datafun +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `datafun` is a Chinese model originally trained by `canIjoin`. + +## Predicted Entities + +`movie`, `no1`, `government`, `name1`, `position`, `book1`, `address`, `address1`, `game`, `organization`, `book`, `government1`, `company1`, `game1`, `position1`, `movie1`, `scene1`, `name`, `company`, `scene`, `organization1` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_datafun_zh_5.2.0_3.0_1699302097623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_datafun_zh_5.2.0_3.0_1699302097623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_datafun","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_datafun","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_datafun| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|380.9 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/canIjoin/datafun +- https://github.com/dbiir/UER-py/wiki/Modelzoo +- https://github.com/CLUEbenchmark/CLUENER2020 +- https://github.com/dbiir/UER-py/ +- https://cloud.tencent.com/ \ No newline at end of file From 03247f2fa0db6fe88a8e6a7187112761f6379232 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:37:43 +0700 Subject: [PATCH 233/667] Add model 2023-11-06-bert_ner_biored_dis_original_pubmedbert_256_13_en --- ...iored_dis_original_pubmedbert_256_13_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_256_13_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_256_13_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_256_13_en.md new file mode 100644 index 00000000000000..81193a24201c5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_original_pubmedbert_256_13_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biored_dis_original_pubmedbert_256_13 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_biored_dis_original_pubmedbert_256_13 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biored_dis_original_pubmedbert_256_13` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_original_pubmedbert_256_13_en_5.2.0_3.0_1699276868812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_original_pubmedbert_256_13_en_5.2.0_3.0_1699276868812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biored_dis_original_pubmedbert_256_13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biored_dis_original_pubmedbert_256_13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biored_dis_original_pubmedbert_256_13| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioRed-Dis-Original-PubMedBERT-256-13 \ No newline at end of file From aa080c9986bc63349f1770ad49437a684d166095 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:38:43 +0700 Subject: [PATCH 234/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar --- ...ic_camelbert_msa_sayula_popoluca_msa_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar.md new file mode 100644 index 00000000000000..311899accf08e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar_5.2.0_3.0_1699302967020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa_ar_5.2.0_3.0_1699302967020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_msa| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.4 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-pos-msa \ No newline at end of file From 711ec8c1c0352ba1fc34b9df923d82c1e7508507 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:39:43 +0700 Subject: [PATCH 235/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en --- ...luca_bert_base_cased_sayula_popoluca_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en.md new file mode 100644 index 00000000000000..ae10439e9a576b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_base_cased_sayula_popoluca BertForTokenClassification from QCRI +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_cased_sayula_popoluca +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_cased_sayula_popoluca` is a English model originally trained by QCRI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en_5.2.0_3.0_1699303128065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_cased_sayula_popoluca_en_5.2.0_3.0_1699303128065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_cased_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_cased_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_cased_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/QCRI/bert-base-cased-pos \ No newline at end of file From 984971e5b4898df30edfc59791addb18c568572e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:40:44 +0700 Subject: [PATCH 236/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_japanese_upos_ja --- ...ula_popoluca_bert_base_japanese_upos_ja.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_upos_ja.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_upos_ja.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_upos_ja.md new file mode 100644 index 00000000000000..910c351d136927 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_japanese_upos_ja.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Japanese bert_sayula_popoluca_bert_base_japanese_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_japanese_upos +date: 2023-11-06 +tags: [bert, ja, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ja +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_japanese_upos` is a Japanese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_japanese_upos_ja_5.2.0_3.0_1699303064211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_japanese_upos_ja_5.2.0_3.0_1699303064211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_japanese_upos","ja") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_japanese_upos", "ja") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_japanese_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ja| +|Size:|338.2 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-base-japanese-upos \ No newline at end of file From 96a9b25d0e1a92283a7978cca822ff03f2c11755 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:41:44 +0700 Subject: [PATCH 237/667] Add model 2023-11-06-bert_token_classifier_base_han_chinese_ws_zh --- ...token_classifier_base_han_chinese_ws_zh.md | 106 ++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_zh.md new file mode 100644 index 00000000000000..cc7e8c0ebc0bf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_zh.md @@ -0,0 +1,106 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_ws +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-han-chinese-ws` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`B`, `I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_zh_5.2.0_3.0_1699301530772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_zh_5.2.0_3.0_1699301530772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_ws| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-han-chinese-ws +- https://github.com/ckiplab/han-transformers +- http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh +- http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/dkiwi/kiwi.sh +- http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh +- http://asbc.iis.sinica.edu.tw +- https://ckip.iis.sinica.edu.tw/ \ No newline at end of file From bee82d5a81cc0b4fcdf933b29f608b35e1ec18fb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:42:44 +0700 Subject: [PATCH 238/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar --- ...amelbert_catalan_sayula_popoluca_egy_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar.md new file mode 100644 index 00000000000000..5039ee4f306ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar_5.2.0_3.0_1699300450055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy_ar_5.2.0_3.0_1699300450055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_egy| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.7 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-ca-pos-egy \ No newline at end of file From 213d8d61a0b93319485c9d4eec399c274683bc53 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:43:44 +0700 Subject: [PATCH 239/667] Add model 2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es --- ...sed_finetuned_sayula_popoluca_syntax_es.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es.md new file mode 100644 index 00000000000000..d224753aafb3d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Castilian, Spanish bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax BertForTokenClassification from mrm8488 +author: John Snow Labs +name: bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax +date: 2023-11-06 +tags: [bert, es, open_source, token_classification, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es_5.2.0_3.0_1699301495205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax_es_5.2.0_3.0_1699301495205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_syntax| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-pos-syntax \ No newline at end of file From 63f4b0144684cfe584689f0322cf0dc881bab523 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:44:45 +0700 Subject: [PATCH 240/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl --- ...finetuned_lassysmall_sayula_popoluca_nl.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl.md new file mode 100644 index 00000000000000..c77135210fa725 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Dutch, Flemish bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca BertForTokenClassification from wietsedv +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca +date: 2023-11-06 +tags: [bert, nl, open_source, token_classification, onnx] +task: Named Entity Recognition +language: nl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca` is a Dutch, Flemish model originally trained by wietsedv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl_5.2.0_3.0_1699303346752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca_nl_5.2.0_3.0_1699303346752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_dutch_cased_finetuned_lassysmall_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|407.3 MB| + +## References + +https://huggingface.co/wietsedv/bert-base-dutch-cased-finetuned-lassysmall-pos \ No newline at end of file From ad234f4389b1cb022bb95dba021bd6788774823d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:45:45 +0700 Subject: [PATCH 241/667] Add model 2023-11-06-bert_sayula_popoluca_signtagger_en --- ...1-06-bert_sayula_popoluca_signtagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_signtagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_signtagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_signtagger_en.md new file mode 100644 index 00000000000000..d9354a830dfee4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_signtagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_signtagger BertForTokenClassification from RJ3vans +author: John Snow Labs +name: bert_sayula_popoluca_signtagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_signtagger` is a English model originally trained by RJ3vans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_signtagger_en_5.2.0_3.0_1699302420746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_signtagger_en_5.2.0_3.0_1699302420746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_signtagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_signtagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_signtagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/RJ3vans/SignTagger \ No newline at end of file From 747776235173d3643a13a0929082d7b27cdef111 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:46:45 +0700 Subject: [PATCH 242/667] Add model 2023-11-06-bert_ner_turkish_ner_tr --- .../2023-11-06-bert_ner_turkish_ner_tr.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_turkish_ner_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_turkish_ner_tr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_turkish_ner_tr.md new file mode 100644 index 00000000000000..4d782a681ccdb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_turkish_ner_tr.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Turkish BertForTokenClassification Cased model (from gurkan08) +author: John Snow Labs +name: bert_ner_turkish_ner +date: 2023-11-06 +tags: [bert, ner, open_source, tr, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `turkish-ner` is a Turkish model originally trained by `gurkan08`. + +## Predicted Entities + +`ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_turkish_ner_tr_5.2.0_3.0_1699301430110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_turkish_ner_tr_5.2.0_3.0_1699301430110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_turkish_ner","tr") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Spark NLP'yi seviyorum"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_turkish_ner","tr") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Spark NLP'yi seviyorum").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("tr.ner.bert.by_gurkan08").predict("""Spark NLP'yi seviyorum""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_turkish_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/gurkan08/turkish-ner \ No newline at end of file From 3b11ba6dd3936fb631d1123b031ef06941b9c15d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:47:45 +0700 Subject: [PATCH 243/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar --- ...camelbert_danish_sayula_popoluca_glf_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar.md new file mode 100644 index 00000000000000..433085215658ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar_5.2.0_3.0_1699302497616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf_ar_5.2.0_3.0_1699302497616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_danish_sayula_popoluca_glf| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.8 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-da-pos-glf \ No newline at end of file From f7bf6d87d52e613bbeda8dea15f3a975592c6a3e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:48:45 +0700 Subject: [PATCH 244/667] Add model 2023-11-06-bert_token_classifier_navigation_chinese_zh --- ..._token_classifier_navigation_chinese_zh.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_navigation_chinese_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_navigation_chinese_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_navigation_chinese_zh.md new file mode 100644 index 00000000000000..220505a267824f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_navigation_chinese_zh.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Chinese BertForTokenClassification Cased model (from Kunologist) +author: John Snow Labs +name: bert_token_classifier_navigation_chinese +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `navigation-chinese` is a Chinese model originally trained by `Kunologist`. + +## Predicted Entities + +`IQ`, `X`, `IK`, `IO`, `IB`, `IM`, `IA`, `ID`, `DO`, `IH`, `II`, `IC`, `IG`, `IJ`, `DN`, `IN`, `IP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_navigation_chinese_zh_5.2.0_3.0_1699302683861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_navigation_chinese_zh_5.2.0_3.0_1699302683861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_navigation_chinese","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_navigation_chinese","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_navigation_chinese| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Kunologist/navigation-chinese \ No newline at end of file From e163aed280f9ad108d492c7f9b12cc2b10c59cb6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:49:46 +0700 Subject: [PATCH 245/667] Add model 2023-11-06-bert_ner_icelandic_ner_bert_is --- ...23-11-06-bert_ner_icelandic_ner_bert_is.md | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_icelandic_ner_bert_is.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_icelandic_ner_bert_is.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_icelandic_ner_bert_is.md new file mode 100644 index 00000000000000..aaf050c5316b1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_icelandic_ner_bert_is.md @@ -0,0 +1,117 @@ +--- +layout: model +title: Icelandic BertForTokenClassification Cased model (from m3hrdadfi) +author: John Snow Labs +name: bert_ner_icelandic_ner_bert +date: 2023-11-06 +tags: [bert, ner, open_source, is, onnx] +task: Named Entity Recognition +language: is +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `icelandic-ner-bert` is a Icelandic model originally trained by `m3hrdadfi`. + +## Predicted Entities + +`Organization`, `Time`, `Location`, `Miscellaneous`, `Person`, `Money`, `Percent`, `Date` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_icelandic_ner_bert_is_5.2.0_3.0_1699294925637.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_icelandic_ner_bert_is_5.2.0_3.0_1699294925637.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_icelandic_ner_bert","is") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ég elska neista NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_icelandic_ner_bert","is") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ég elska neista NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("is.ner.bert").predict("""Ég elska neista NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_icelandic_ner_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|is| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/m3hrdadfi/icelandic-ner-bert +- https://github.com/m3hrdadfi/icelandic-ner/issues +- https://en.ru.is/ +- http://hdl.handle.net/20.500.12537/42 \ No newline at end of file From da879082c2e524212c2bd078fb47e0bf828048df Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:50:46 +0700 Subject: [PATCH 246/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl --- ...ca_bert_base_dutch_cased_upos_alpino_nl.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl.md new file mode 100644 index 00000000000000..06f4a6e47c2ecd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Dutch, Flemish bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino BertForTokenClassification from GroNLP +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino +date: 2023-11-06 +tags: [bert, nl, open_source, token_classification, onnx] +task: Named Entity Recognition +language: nl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino` is a Dutch, Flemish model originally trained by GroNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl_5.2.0_3.0_1699302795836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_nl_5.2.0_3.0_1699302795836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|406.6 MB| + +## References + +https://huggingface.co/GroNLP/bert-base-dutch-cased-upos-alpino \ No newline at end of file From c185d8b88d998a27b3acb86c6508aac89862392e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:51:46 +0700 Subject: [PATCH 247/667] Add model 2023-11-06-bert_ner_bc4chemd_imbalancedpubmedbert_en --- ...rt_ner_bc4chemd_imbalancedpubmedbert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalancedpubmedbert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalancedpubmedbert_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalancedpubmedbert_en.md new file mode 100644 index 00000000000000..6f1ed769646fba --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc4chemd_imbalancedpubmedbert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bc4chemd_imbalancedpubmedbert BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bc4chemd_imbalancedpubmedbert +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bc4chemd_imbalancedpubmedbert` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc4chemd_imbalancedpubmedbert_en_5.2.0_3.0_1699271488566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc4chemd_imbalancedpubmedbert_en_5.2.0_3.0_1699271488566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc4chemd_imbalancedpubmedbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bc4chemd_imbalancedpubmedbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc4chemd_imbalancedpubmedbert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/BC4CHEMD_ImbalancedPubMedBERT \ No newline at end of file From cd5223ed191d35c334e11cda4c3ff4333c10be4f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:52:46 +0700 Subject: [PATCH 248/667] Add model 2023-11-06-bert_italian_cased_ner_it --- .../2023-11-06-bert_italian_cased_ner_it.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_italian_cased_ner_it.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_italian_cased_ner_it.md b/docs/_posts/ahmedlone127/2023-11-06-bert_italian_cased_ner_it.md new file mode 100644 index 00000000000000..824127f46052d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_italian_cased_ner_it.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Italian bert_italian_cased_ner BertForTokenClassification from osiria +author: John Snow Labs +name: bert_italian_cased_ner +date: 2023-11-06 +tags: [bert, it, open_source, token_classification, onnx] +task: Named Entity Recognition +language: it +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_italian_cased_ner` is a Italian model originally trained by osiria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_italian_cased_ner_it_5.2.0_3.0_1699303842218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_italian_cased_ner_it_5.2.0_3.0_1699303842218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_italian_cased_ner","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_italian_cased_ner", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_italian_cased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|it| +|Size:|409.0 MB| + +## References + +https://huggingface.co/osiria/bert-italian-cased-ner \ No newline at end of file From 2d12f31a0cf125638cec0f5865e2f28e1092bddb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:53:46 +0700 Subject: [PATCH 249/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar --- ...amelbert_catalan_sayula_popoluca_glf_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar.md new file mode 100644 index 00000000000000..bd543e117835fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar_5.2.0_3.0_1699302101136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf_ar_5.2.0_3.0_1699302101136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_glf| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.7 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-ca-pos-glf \ No newline at end of file From f321112d70641ab5bff6d69aa97e64ee22903e60 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:54:47 +0700 Subject: [PATCH 250/667] Add model 2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en --- ...sh_uncased_finetuned_sayula_popoluca_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en.md new file mode 100644 index 00000000000000..39d268736721ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca BertForTokenClassification from vblagoje +author: John Snow Labs +name: bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca` is a English model originally trained by vblagoje. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en_5.2.0_3.0_1699304061872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca_en_5.2.0_3.0_1699304061872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_english_uncased_finetuned_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/vblagoje/bert-english-uncased-finetuned-pos \ No newline at end of file From 9feb67fd0d96182d15f2386b642a56b5725fde38 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:55:47 +0700 Subject: [PATCH 251/667] Add model 2023-11-06-bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en --- ...t_sayula_popoluca_cased_deepfrog_nld_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en.md new file mode 100644 index 00000000000000..f64d131e3ac787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld BertForTokenClassification from proycon +author: John Snow Labs +name: bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld` is a English model originally trained by proycon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en_5.2.0_3.0_1699304108856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld_en_5.2.0_3.0_1699304108856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_sayula_popoluca_cased_deepfrog_nld| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/proycon/bert-pos-cased-deepfrog-nld \ No newline at end of file From bd4729c6de511c99108b48af684534131fd66ae7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:56:47 +0700 Subject: [PATCH 252/667] Add model 2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en --- ...netuned_sayula_popoluca_accelerate_5_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en.md new file mode 100644 index 00000000000000..219f42c99ab482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5 BertForTokenClassification from camilag +author: John Snow Labs +name: bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5` is a English model originally trained by camilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en_5.2.0_3.0_1699304172382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5_en_5.2.0_3.0_1699304172382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_5| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/camilag/bertimbau-finetuned-pos-accelerate-5 \ No newline at end of file From aaf13e97d1190f45fe2a6e1fdec669f36b041e87 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:57:48 +0700 Subject: [PATCH 253/667] Add model 2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es --- ...ed_finetuned_sayula_popoluca_16_tags_es.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es.md new file mode 100644 index 00000000000000..f632cb4bd81005 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Castilian, Spanish bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags BertForTokenClassification from mrm8488 +author: John Snow Labs +name: bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags +date: 2023-11-06 +tags: [bert, es, open_source, token_classification, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es_5.2.0_3.0_1699304252166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags_es_5.2.0_3.0_1699304252166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_16_tags| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-pos-16-tags \ No newline at end of file From 382c5c69b2b527ccfa59b892d3561d7a93cd9f60 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:58:48 +0700 Subject: [PATCH 254/667] Add model 2023-11-06-bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en --- ...rt_punct_restoration_english_alvenir_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en.md new file mode 100644 index 00000000000000..722f500a03fef8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_punct_restoration_english_alvenir BertForTokenClassification from Alvenir +author: John Snow Labs +name: bert_sayula_popoluca_bert_punct_restoration_english_alvenir +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_punct_restoration_english_alvenir` is a English model originally trained by Alvenir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en_5.2.0_3.0_1699304261004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_punct_restoration_english_alvenir_en_5.2.0_3.0_1699304261004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_punct_restoration_english_alvenir","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_punct_restoration_english_alvenir", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_punct_restoration_english_alvenir| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Alvenir/bert-punct-restoration-en \ No newline at end of file From f9f92cd6e0cbe5cc3d42251dadae0f4691689c3d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 03:59:48 +0700 Subject: [PATCH 255/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_focal_alpah75_en --- ...t_sayula_popoluca_tiny_focal_alpah75_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah75_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah75_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah75_en.md new file mode 100644 index 00000000000000..7cfdb2751693fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah75_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_focal_alpah75 BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_focal_alpah75 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_focal_alpah75` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_alpah75_en_5.2.0_3.0_1699304352550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_alpah75_en_5.2.0_3.0_1699304352550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_focal_alpah75","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_focal_alpah75", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_focal_alpah75| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_focal_alpah75 \ No newline at end of file From f955e9d8c8acdc552d91c7381ca88fe37ecb193a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:00:49 +0700 Subject: [PATCH 256/667] Add model 2023-11-06-bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh --- ...ca_bert_tiny_chinese_sayula_popoluca_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh.md new file mode 100644 index 00000000000000..f91a40641be6a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh_5.2.0_3.0_1699304358725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca_zh_5.2.0_3.0_1699304358725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_tiny_chinese_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|43.1 MB| + +## References + +https://huggingface.co/ckiplab/bert-tiny-chinese-pos \ No newline at end of file From b7f110aae2dcf600923b83f3753e5fda7d757bc3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:01:48 +0700 Subject: [PATCH 257/667] Add model 2023-11-06-bert_token_classifier_berturk_uncased_keyword_extractor_tr --- ...er_berturk_uncased_keyword_extractor_tr.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_uncased_keyword_extractor_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_uncased_keyword_extractor_tr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_uncased_keyword_extractor_tr.md new file mode 100644 index 00000000000000..2adc1793966ff5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_uncased_keyword_extractor_tr.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Turkish BertForTokenClassification Uncased model (from yanekyuk) +author: John Snow Labs +name: bert_token_classifier_berturk_uncased_keyword_extractor +date: 2023-11-06 +tags: [tr, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `berturk-uncased-keyword-extractor` is a Turkish model originally trained by `yanekyuk`. + +## Predicted Entities + +`KEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_berturk_uncased_keyword_extractor_tr_5.2.0_3.0_1699304433332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_berturk_uncased_keyword_extractor_tr_5.2.0_3.0_1699304433332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_berturk_uncased_keyword_extractor","tr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_berturk_uncased_keyword_extractor","tr") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_berturk_uncased_keyword_extractor| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/yanekyuk/berturk-uncased-keyword-extractor \ No newline at end of file From 54e50ed57282c5b946a898be5c16ca2e2363f4ca Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:02:49 +0700 Subject: [PATCH 258/667] Add model 2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en --- ...inetuned_sayula_popoluca_accelerate3_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en.md new file mode 100644 index 00000000000000..48989754e1f9e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3 BertForTokenClassification from Deborah +author: John Snow Labs +name: bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3` is a English model originally trained by Deborah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en_5.2.0_3.0_1699304416492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3_en_5.2.0_3.0_1699304416492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Deborah/bertimbau-finetuned-pos-accelerate3 \ No newline at end of file From b2c6eda4eb8e5dea91b406f1b96e626b978c2b71 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:03:49 +0700 Subject: [PATCH 259/667] Add model 2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en --- ...netuned_sayula_popoluca_accelerate_6_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en.md new file mode 100644 index 00000000000000..946ef05b474e9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6 BertForTokenClassification from camilag +author: John Snow Labs +name: bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6` is a English model originally trained by camilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en_5.2.0_3.0_1699304601450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6_en_5.2.0_3.0_1699304601450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_6| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/camilag/bertimbau-finetuned-pos-accelerate-6 \ No newline at end of file From e8d8c965660d71f4cabb8bfc554df52ef9597811 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:04:49 +0700 Subject: [PATCH 260/667] Add model 2023-11-06-bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk --- ...luca_bert_large_slavic_cyrillic_upos_uk.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk.md new file mode 100644 index 00000000000000..563777d250d6dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Ukrainian bert_sayula_popoluca_bert_large_slavic_cyrillic_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_large_slavic_cyrillic_upos +date: 2023-11-06 +tags: [bert, uk, open_source, token_classification, onnx] +task: Named Entity Recognition +language: uk +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_large_slavic_cyrillic_upos` is a Ukrainian model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk_5.2.0_3.0_1699303518422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_slavic_cyrillic_upos_uk_5.2.0_3.0_1699303518422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_large_slavic_cyrillic_upos","uk") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_large_slavic_cyrillic_upos", "uk") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_large_slavic_cyrillic_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|uk| +|Size:|1.6 GB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-large-slavic-cyrillic-upos \ No newline at end of file From be6b08ce12070b5239af2d56d44e1c9006f70374 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:05:50 +0700 Subject: [PATCH 261/667] Add model 2023-11-06-bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en --- ...sh_kongo_sayula_popoluca_conllu_bert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en.md new file mode 100644 index 00000000000000..73d84ad4db56d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert BertForTokenClassification from mustafabaris +author: John Snow Labs +name: bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert` is a English model originally trained by mustafabaris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en_5.2.0_3.0_1699304689691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert_en_5.2.0_3.0_1699304689691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_turkish_kongo_sayula_popoluca_conllu_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|689.0 MB| + +## References + +https://huggingface.co/mustafabaris/tr_kg_pos_conllu_bert \ No newline at end of file From 2ee6c10fc002146ec03b4ef8e8851db6125e3965 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:06:50 +0700 Subject: [PATCH 262/667] Add model 2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en --- ...netuned_sayula_popoluca_accelerate_7_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en.md new file mode 100644 index 00000000000000..51b7db2ac4010c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7 BertForTokenClassification from camilag +author: John Snow Labs +name: bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7` is a English model originally trained by camilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en_5.2.0_3.0_1699304708381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7_en_5.2.0_3.0_1699304708381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_7| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/camilag/bertimbau-finetuned-pos-accelerate-7 \ No newline at end of file From eb4f00d3179497150e73dc2957be57aa3caa00cb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:07:50 +0700 Subject: [PATCH 263/667] Add model 2023-11-06-bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh --- ...a_popoluca_chinese_bert_wwm_ext_upos_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh.md new file mode 100644 index 00000000000000..709064fce54c02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_sayula_popoluca_chinese_bert_wwm_ext_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_chinese_bert_wwm_ext_upos +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_chinese_bert_wwm_ext_upos` is a Chinese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh_5.2.0_3.0_1699304771456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_chinese_bert_wwm_ext_upos_zh_5.2.0_3.0_1699304771456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_chinese_bert_wwm_ext_upos","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_chinese_bert_wwm_ext_upos", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_chinese_bert_wwm_ext_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.3 MB| + +## References + +https://huggingface.co/KoichiYasuoka/chinese-bert-wwm-ext-upos \ No newline at end of file From 6433e318c82c84aade9e19ebbb3bddb8540bac8f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:08:50 +0700 Subject: [PATCH 264/667] Add model 2023-11-06-bert_token_classifier_est_ner_v2_et --- ...-06-bert_token_classifier_est_ner_v2_et.md | 102 ++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_ner_v2_et.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_ner_v2_et.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_ner_v2_et.md new file mode 100644 index 00000000000000..cc563fe5247b65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_est_ner_v2_et.md @@ -0,0 +1,102 @@ +--- +layout: model +title: Estonian BertForTokenClassification Cased model (from tartuNLP) +author: John Snow Labs +name: bert_token_classifier_est_ner_v2 +date: 2023-11-06 +tags: [et, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: et +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `EstBERT_NER_v2` is a Estonian model originally trained by `tartuNLP`. + +## Predicted Entities + +`TIME`, `ORG`, `MONEY`, `PER`, `GPE`, `DATE`, `PERCENT`, `TITLE`, `LOC`, `EVENT`, `PROD` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_est_ner_v2_et_5.2.0_3.0_1699304760907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_est_ner_v2_et_5.2.0_3.0_1699304760907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_est_ner_v2","et") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_est_ner_v2","et") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_est_ner_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|et| +|Size:|463.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/tartuNLP/EstBERT_NER_v2 +- https://metashare.ut.ee/repository/browse/reannotated-estonian-ner-corpus/bd43f1f614a511eca6e4fa163e9d45477d086613d2894fd5af79bf13e3f13594/ +- https://metashare.ut.ee/repository/browse/new-estonian-ner-corpus/98b6706c963c11eba6e4fa163e9d45470bcd0533b6994c93ab8b8c628516ffed/ \ No newline at end of file From 429bf03de4569731a494a695a592517586bcf875 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:09:50 +0700 Subject: [PATCH 265/667] Add model 2023-11-06-bert_token_classifier_base_ner_atc_english_atco2_1h_en --- ...sifier_base_ner_atc_english_atco2_1h_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_ner_atc_english_atco2_1h_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_ner_atc_english_atco2_1h_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_ner_atc_english_atco2_1h_en.md new file mode 100644 index 00000000000000..39d3216af99a80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_ner_atc_english_atco2_1h_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_token_classifier_base_ner_atc_english_atco2_1h BertForTokenClassification from Jzuluaga +author: John Snow Labs +name: bert_token_classifier_base_ner_atc_english_atco2_1h +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_ner_atc_english_atco2_1h` is a English model originally trained by Jzuluaga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_ner_atc_english_atco2_1h_en_5.2.0_3.0_1699303771342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_ner_atc_english_atco2_1h_en_5.2.0_3.0_1699303771342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_ner_atc_english_atco2_1h","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_ner_atc_english_atco2_1h", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_ner_atc_english_atco2_1h| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jzuluaga/bert-base-ner-atc-en-atco2-1h \ No newline at end of file From 65dd243196d149918f514f27a4c38c5aea547307 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:10:51 +0700 Subject: [PATCH 266/667] Add model 2023-11-06-bert_ner_foo_en --- .../2023-11-06-bert_ner_foo_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_foo_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_foo_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_foo_en.md new file mode 100644 index 00000000000000..858eff9e8ce3b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_foo_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from leonweber) +author: John Snow Labs +name: bert_ner_foo +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `foo` is a English model originally trained by `leonweber`. + +## Predicted Entities + +`medmentions_full_ner:B-T085)`, `bionlp_st_2013_gro_ner:B-Ribosome)`, `chemdner_TEXT:MESH:D013830)`, `anat_em_ner:O)`, `cellfinder_ner:I-GeneProtein)`, `ncbi_disease_ner:B-CompositeMention)`, `bionlp_st_2013_gro_ner:B-Virus)`, `medmentions_full_ner:I-T129)`, `scai_disease_ner:B-DISEASE)`, `biorelex_ner:B-chemical)`, `chemdner_TEXT:MESH:D011166)`, `medmentions_st21pv_ner:I-T204)`, `chemdner_TEXT:MESH:D008345)`, `bionlp_st_2013_gro_NER:B-RegulationOfFunction)`, `mlee_ner:I-Cell)`, `bionlp_st_2013_gro_NER:I-RNABiosynthesis)`, `biorelex_ner:I-RNA-family)`, `bionlp_st_2013_gro_NER:B-ResponseToChemicalStimulus)`, `bionlp_st_2011_epi_NER:B-Dephosphorylation)`, `chemdner_TEXT:MESH:D003035)`, `chemdner_TEXT:MESH:D013440)`, `chemdner_TEXT:MESH:D037341)`, `chemdner_TEXT:MESH:D009532)`, `chemdner_TEXT:MESH:D019216)`, `chemdner_TEXT:MESH:D036701)`, `chemdner_TEXT:MESH:D011107)`, `bionlp_st_2013_cg_NER:B-Translation)`, `genia_term_corpus_ner:B-cell_component)`, `medmentions_full_ner:I-T065)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfDNA)`, `anat_em_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D000225)`, `genia_term_corpus_ner:I-ORDNA_domain_or_regionDNA_domain_or_region)`, `medmentions_full_ner:I-T015)`, `chemdner_TEXT:MESH:D008239)`, `bionlp_st_2013_cg_NER:I-Binding)`, `bionlp_st_2013_cg_NER:B-Amino_acid_catabolism)`, `cellfinder_ner:B-CellComponent)`, `bionlp_st_2013_gro_NER:I-MetabolicPathway)`, `bionlp_st_2013_gro_ner:B-ProteinIdentification)`, `bionlp_st_2011_ge_ner:O)`, `bionlp_st_2011_id_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelixTF)`, `mirna_ner:B-Relation_Trigger)`, `bionlp_st_2011_ge_NER:B-Regulation)`, `bionlp_st_2013_cg_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D008055)`, `chemdner_TEXT:MESH:D009944)`, `verspoor_2013_ner:I-gene)`, `bionlp_st_2013_ge_ner:O)`, `chemdner_TEXT:MESH:D003907)`, `mlee_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D010569)`, `mlee_NER:I-Growth)`, `chemdner_TEXT:MESH:D036145)`, `medmentions_full_ner:I-T196)`, `ehr_rel_sts:1)`, `bionlp_st_2013_gro_NER:B-CellularComponentOrganizationAndBiogenesis)`, `chemdner_TEXT:MESH:D009285)`, `bionlp_st_2013_gro_NER:B-ProteinMetabolism)`, `chemdner_TEXT:MESH:D016718)`, `bionlp_st_2013_gro_NER:I-BindingOfTFToTFBindingSiteOfProtein)`, `medmentions_full_ner:I-T074)`, `chemdner_TEXT:MESH:D000432)`, `bionlp_st_2013_gro_NER:I-CellFateDetermination)`, `chia_ner:I-Reference_point)`, `bionlp_st_2013_gro_ner:B-Histone)`, `lll_RE:None)`, `scai_disease_ner:B-ADVERSE)`, `medmentions_full_ner:B-T130)`, `bionlp_st_2013_gro_NER:I-CellCyclePhaseTransition)`, `chemdner_TEXT:MESH:D000480)`, `chemdner_TEXT:MESH:D001556)`, `bionlp_st_2013_gro_ner:B-Nucleus)`, `bionlp_st_2013_gro_ner:B-AP2EREBPRelatedDomain)`, `chemdner_TEXT:MESH:D007854)`, `chemdner_TEXT:MESH:D009499)`, `genia_term_corpus_ner:B-polynucleotide)`, `bionlp_st_2013_gro_NER:I-Transcription)`, `chemdner_TEXT:MESH:D007213)`, `bionlp_st_2013_ge_NER:B-Regulation)`, `bionlp_st_2011_epi_NER:B-DNA_methylation)`, `medmentions_st21pv_ner:B-T031)`, `bionlp_st_2013_ge_NER:I-Gene_expression)`, `chemdner_TEXT:MESH:D007651)`, `bionlp_st_2013_gro_NER:B-OrganismalProcess)`, `bionlp_st_2011_epi_COREF:None)`, `medmentions_st21pv_ner:I-T062)`, `chemdner_TEXT:MESH:D002047)`, `chemdner_TEXT:MESH:D012822)`, `mantra_gsc_en_patents_ner:B-DEVI)`, `medmentions_full_ner:I-T071)`, `chemdner_TEXT:MESH:D013739)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfGeneExpression)`, `genia_term_corpus_ner:B-other_name)`, `medmentions_full_ner:B-T018)`, `chemdner_TEXT:MESH:D015242)`, `bionlp_st_2013_cg_NER:O)`, `chemdner_TEXT:MESH:D019469)`, `ncbi_disease_ner:B-DiseaseClass)`, `ebm_pico_ner:B-Intervention_Surgical)`, `chemdner_TEXT:MESH:D011422)`, `chemdner_TEXT:MESH:D002112)`, `chemdner_TEXT:MESH:D005682)`, `anat_em_ner:B-Immaterial_anatomical_entity)`, `bionlp_st_2011_epi_ner:B-Entity)`, `medmentions_full_ner:I-T169)`, `mlee_ner:B-Immaterial_anatomical_entity)`, `verspoor_2013_ner:B-Physiology)`, `cellfinder_ner:I-CellType)`, `chemdner_TEXT:MESH:D011122)`, `chemdner_TEXT:MESH:D010622)`, `chemdner_TEXT:MESH:D017378)`, `bionlp_st_2011_ge_RE:Theme)`, `chemdner_TEXT:MESH:D000431)`, `medmentions_full_ner:I-T102)`, `medmentions_full_ner:B-T097)`, `chemdner_TEXT:MESH:D007529)`, `chemdner_TEXT:MESH:D045265)`, `chemdner_TEXT:MESH:D005971)`, `an_em_ner:I-Multi-tissue_structure)`, `genia_term_corpus_ner:I-ANDDNA_family_or_groupDNA_family_or_group)`, `medmentions_full_ner:I-T080)`, `chemdner_TEXT:MESH:D002207)`, `chia_ner:I-Qualifier)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscriptionByTranscriptionRepressor)`, `an_em_ner:I-Immaterial_anatomical_entity)`, `biosses_sts:5)`, `chemdner_TEXT:MESH:D000079963)`, `chemdner_TEXT:MESH:D013196)`, `ehr_rel_sts:2)`, `chemdner_TEXT:MESH:D006152)`, `bionlp_st_2013_gro_NER:B-RegulationOfProcess)`, `mlee_NER:I-Development)`, `medmentions_full_ner:B-T197)`, `bionlp_st_2013_gro_ner:B-NucleicAcid)`, `medmentions_st21pv_ner:I-T017)`, `medmentions_full_ner:I-T046)`, `medmentions_full_ner:B-T204)`, `bionlp_st_2013_gro_NER:B-CellularDevelopmentalProcess)`, `bionlp_st_2013_cg_ner:B-Immaterial_anatomical_entity)`, `chemdner_TEXT:MESH:D014212)`, `bionlp_st_2013_cg_NER:B-Protein_processing)`, `chemdner_TEXT:MESH:D008926)`, `chia_ner:B-Visit)`, `bionlp_st_2011_ge_NER:B-Negative_regulation)`, `mantra_gsc_en_medline_ner:I-OBJC)`, `mlee_RE:FromLoc)`, `bionlp_st_2013_gro_ner:I-RNAMolecule)`, `chemdner_TEXT:MESH:D014812)`, `linnaeus_filtered_ner:I-species)`, `chebi_nactem_fullpaper_ner:B-Chemical)`, `bionlp_st_2011_ge_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_ner:B-MutantGene)`, `chemdner_TEXT:MESH:D014859)`, `bionlp_st_2019_bb_ner:B-Phenotype)`, `bionlp_st_2013_gro_NER:I-BindingOfTFToTFBindingSiteOfDNA)`, `diann_iber_eval_en_ner:I-Neg)`, `ddi_corpus_ner:B-DRUG_N)`, `bionlp_st_2013_cg_ner:B-Organ)`, `chemdner_TEXT:MESH:D009320)`, `bionlp_st_2013_cg_ner:I-Organism_subdivision)`, `bionlp_st_2013_cg_ner:B-Cellular_component)`, `chemdner_TEXT:MESH:D003188)`, `chemdner_TEXT:MESH:D001241)`, `chemdner_TEXT:MESH:D004811)`, `bioinfer_ner:I-GeneproteinRNA)`, `chemdner_TEXT:MESH:D002248)`, `bionlp_shared_task_2009_NER:B-Negative_regulation)`, `chemdner_TEXT:MESH:D000143)`, `chemdner_TEXT:MESH:D007099)`, `nlm_gene_ner:O)`, `chemdner_TEXT:MESH:D005485)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorBindingSiteOfDNA)`, `bionlp_st_2013_gro_ner:B-PhysicalContact)`, `medmentions_full_ner:B-T167)`, `medmentions_st21pv_ner:B-T091)`, `seth_corpus_ner:I-Gene)`, `bionlp_st_2011_ge_COREF:coref)`, `bionlp_st_2011_ge_NER:B-Gene_expression)`, `medmentions_full_ner:B-T031)`, `genia_relation_corpus_RE:None)`, `genia_term_corpus_ner:I-ANDDNA_domain_or_regionDNA_domain_or_region)`, `chemdner_TEXT:MESH:D014970)`, `bionlp_st_2013_gro_NER:B-Mutation)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivator)`, `chemdner_TEXT:MESH:D002217)`, `chemdner_TEXT:MESH:D003367)`, `medmentions_full_ner:I-UnknownType)`, `chemdner_TEXT:MESH:D002998)`, `bionlp_st_2013_gro_ner:I-Phenotype)`, `genia_term_corpus_ner:B-ANDDNA_family_or_groupDNA_family_or_group)`, `hprd50_RE:PPI)`, `chemdner_TEXT:MESH:D002118)`, `scai_chemical_ner:B-IUPAC)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfProtein)`, `verspoor_2013_ner:B-mutation)`, `chemdner_TEXT:MESH:D011719)`, `chemdner_TEXT:MESH:D013729)`, `bionlp_shared_task_2009_ner:O)`, `chemdner_TEXT:MESH:D005840)`, `chemdner_TEXT:MESH:D009287)`, `medmentions_full_ner:B-T029)`, `chemdner_TEXT:MESH:D037742)`, `medmentions_full_ner:I-T200)`, `chemdner_TEXT:MESH:D012503)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndRNA)`, `mirna_ner:I-Non-Specific_miRNAs)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfProtein)`, `bionlp_st_2013_pc_NER:B-Deacetylation)`, `chemprot_RE:CPR:7)`, `chia_ner:I-Value)`, `medmentions_full_ner:I-T048)`, `chemprot_ner:B-GENE-Y)`, `bionlp_st_2013_cg_NER:B-Reproduction)`, `bionlp_st_2011_id_ner:I-Regulon-operon)`, `ebm_pico_ner:I-Outcome_Adverse-effects)`, `bioinfer_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-bZIPTF)`, `mirna_ner:I-GenesProteins)`, `biorelex_ner:I-process)`, `chemdner_TEXT:MESH:D001555)`, `genia_term_corpus_ner:B-DNA_domain_or_region)`, `cellfinder_ner:O)`, `bionlp_st_2013_gro_ner:I-MutatedProtein)`, `bionlp_st_2013_gro_NER:I-CellularComponentOrganizationAndBiogenesis)`, `spl_adr_200db_train_ner:O)`, `medmentions_full_ner:I-T026)`, `chemdner_TEXT:MESH:D013619)`, `bionlp_st_2013_gro_NER:I-BindingToRNA)`, `biorelex_ner:I-drug)`, `bionlp_st_2013_pc_NER:B-Translation)`, `mantra_gsc_en_emea_ner:B-LIVB)`, `mantra_gsc_en_patents_ner:B-PROC)`, `bionlp_st_2013_pc_NER:B-Binding)`, `bionlp_st_2013_gro_NER:B-ModificationOfMolecularEntity)`, `bionlp_st_2013_cg_NER:I-Cell_transformation)`, `scai_chemical_ner:B-TRIVIALVAR)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomain)`, `bionlp_st_2013_gro_NER:I-TranscriptionInitiation)`, `chemdner_TEXT:MESH:D010907)`, `bionlp_st_2013_gro_ner:B-InorganicChemical)`, `bionlp_st_2013_pc_RE:None)`, `chemdner_TEXT:MESH:D002922)`, `chemdner_TEXT:MESH:D010743)`, `bionlp_st_2019_bb_ner:O)`, `medmentions_full_ner:I-T001)`, `chemdner_TEXT:MESH:D001381)`, `bionlp_shared_task_2009_ner:I-Protein)`, `bionlp_st_2013_gro_ner:B-Spliceosome)`, `bionlp_st_2013_gro_ner:I-HMGTF)`, `minimayosrs_sts:3)`, `ddi_corpus_RE:ADVISE)`, `mlee_NER:B-Dissociation)`, `bionlp_st_2013_gro_ner:I-Holoenzyme)`, `chemdner_TEXT:MESH:D001552)`, `bionlp_st_2013_gro_ner:B-bHLH)`, `chemdner_TEXT:MESH:D000109)`, `chemdner_TEXT:MESH:D013449)`, `bionlp_st_2013_gro_ner:I-GeneRegion)`, `medmentions_full_ner:B-T019)`, `scai_chemical_ner:B-TRIVIAL)`, `mlee_ner:B-Gene_or_gene_product)`, `biosses_sts:3)`, `bionlp_st_2013_cg_NER:I-Pathway)`, `bionlp_st_2011_id_ner:I-Organism)`, `bionlp_st_2013_gro_ner:B-tRNA)`, `chemdner_TEXT:MESH:D013109)`, `mlee_ner:I-Immaterial_anatomical_entity)`, `medmentions_full_ner:B-T065)`, `ebm_pico_ner:I-Participant_Sample-size)`, `mlee_RE:AtLoc)`, `genia_term_corpus_ner:I-protein_family_or_group)`, `chemdner_TEXT:MESH:D002444)`, `chemdner_TEXT:MESH:D063388)`, `mlee_NER:B-Translation)`, `chemdner_TEXT:MESH:D007052)`, `bionlp_st_2013_gro_ner:B-Gene)`, `chia_ner:B-Scope)`, `bionlp_st_2013_ge_NER:I-Positive_regulation)`, `chemdner_TEXT:MESH:D007785)`, `medmentions_st21pv_ner:I-T097)`, `iepa_RE:None)`, `medmentions_full_ner:B-T001)`, `medmentions_full_ner:I-T194)`, `chemdner_TEXT:MESH:D047309)`, `bionlp_st_2013_gro_ner:B-Substrate)`, `chemdner_TEXT:MESH:D002186)`, `ebm_pico_ner:B-Outcome_Other)`, `bionlp_st_2013_gro_NER:I-OrganismalProcess)`, `bionlp_st_2013_gro_ner:B-Ion)`, `bionlp_st_2013_gro_NER:I-ProteinBiosynthesis)`, `chia_ner:B-Drug)`, `bionlp_st_2013_gro_ner:I-MolecularEntity)`, `anat_em_ner:B-Cellular_component)`, `bionlp_st_2013_cg_ner:B-Multi-tissue_structure)`, `medmentions_full_ner:I-T122)`, `an_em_ner:B-Cell)`, `chemdner_TEXT:MESH:D011564)`, `bionlp_st_2013_gro_NER:B-Splicing)`, `bionlp_st_2013_cg_NER:I-Metabolism)`, `bionlp_st_2013_pc_NER:B-Activation)`, `bionlp_st_2013_gro_ner:I-BindingSiteOfProtein)`, `bionlp_st_2011_id_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:I-Ribosome)`, `nlmchem_ner:I-Chemical)`, `mirna_ner:I-Specific_miRNAs)`, `medmentions_full_ner:I-T012)`, `bionlp_st_2013_gro_NER:B-IntraCellularTransport)`, `mlee_RE:Instrument)`, `bionlp_st_2011_id_NER:I-Transcription)`, `mantra_gsc_en_patents_ner:I-ANAT)`, `an_em_ner:B-Immaterial_anatomical_entity)`, `scai_chemical_ner:I-IUPAC)`, `bionlp_st_2011_epi_NER:B-Deubiquitination)`, `chemdner_TEXT:MESH:D007295)`, `bionlp_st_2011_ge_NER:B-Binding)`, `bionlp_st_2013_pc_NER:B-Localization)`, `chia_ner:B-Procedure)`, `medmentions_full_ner:I-T109)`, `chemdner_TEXT:MESH:D002791)`, `mantra_gsc_en_medline_ner:I-CHEM)`, `chebi_nactem_fullpaper_ner:B-Biological_Activity)`, `ncbi_disease_ner:B-SpecificDisease)`, `medmentions_full_ner:B-T063)`, `chemdner_TEXT:MESH:D016595)`, `bionlp_st_2011_id_NER:B-Transcription)`, `bionlp_st_2013_gro_ner:B-DNAMolecule)`, `mlee_NER:B-Protein_processing)`, `biorelex_ner:B-protein-complex)`, `anat_em_ner:I-Cancer)`, `bionlp_st_2013_cg_RE:AtLoc)`, `medmentions_full_ner:I-T072)`, `bio_sim_verb_sts:2)`, `seth_corpus_ner:O)`, `medmentions_full_ner:B-T070)`, `biorelex_ner:I-experiment-tag)`, `chemdner_TEXT:MESH:D020126)`, `biorelex_ner:I-protein-RNA-complex)`, `bionlp_st_2013_pc_NER:I-Phosphorylation)`, `medmentions_st21pv_ner:I-T201)`, `genia_term_corpus_ner:B-protein_complex)`, `medmentions_full_ner:I-T125)`, `bionlp_st_2013_ge_ner:I-Entity)`, `chemdner_TEXT:MESH:D054659)`, `bionlp_st_2013_pc_RE:ToLoc)`, `medmentions_full_ner:B-T099)`, `bionlp_st_2013_gro_NER:B-Binding)`, `medmentions_full_ner:B-T114)`, `spl_adr_200db_train_ner:B-Factor)`, `mlee_RE:CSite)`, `bionlp_st_2013_gro_ner:B-HMG)`, `bionlp_st_2013_gro_ner:B-Operon)`, `bionlp_st_2013_ge_NER:I-Protein_catabolism)`, `ebm_pico_ner:I-Outcome_Pain)`, `bionlp_st_2013_ge_NER:B-Transcription)`, `chemdner_TEXT:MESH:D000880)`, `ebm_pico_ner:I-Outcome_Physical)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D006160)`, `gnormplus_ner:B-DomainMotif)`, `medmentions_full_ner:I-T016)`, `pdr_ner:I-Disease)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToProteinBindingSiteOfProtein)`, `chemdner_TEXT:MESH:D002264)`, `genia_term_corpus_ner:I-protein_NA)`, `bionlp_shared_task_2009_NER:I-Negative_regulation)`, `medmentions_full_ner:I-T011)`, `bionlp_st_2013_gro_NER:I-CellularMetabolicProcess)`, `mqp_sts:1)`, `an_em_ner:I-Pathological_formation)`, `bionlp_st_2011_epi_NER:B-Deacetylation)`, `bionlp_st_2013_pc_RE:Theme)`, `medmentions_full_ner:I-T103)`, `bionlp_st_2011_epi_NER:B-Methylation)`, `ebm_pico_ner:B-Intervention_Psychological)`, `bionlp_st_2013_gro_ner:B-Stress)`, `genia_term_corpus_ner:B-multi_cell)`, `bionlp_st_2013_cg_NER:B-Positive_regulation)`, `anat_em_ner:I-Cellular_component)`, `spl_adr_200db_train_ner:I-Negation)`, `chemdner_TEXT:MESH:D000605)`, `mlee_RE:Cause)`, `bionlp_st_2013_gro_ner:B-RegulatoryDNARegion)`, `bionlp_st_2013_gro_ner:I-HomeoboxTF)`, `bionlp_st_2013_gro_NER:I-GeneSilencing)`, `ddi_corpus_ner:I-DRUG)`, `bionlp_st_2013_cg_NER:I-Growth)`, `mantra_gsc_en_medline_ner:B-OBJC)`, `mayosrs_sts:3)`, `bionlp_st_2013_gro_NER:B-RNAProcessing)`, `cellfinder_ner:B-CellType)`, `medmentions_full_ner:B-T007)`, `chemprot_ner:B-GENE-N)`, `biorelex_ner:B-brand)`, `ebm_pico_ner:B-Outcome_Mental)`, `bionlp_st_2013_gro_NER:B-RegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-EukaryoticCell)`, `genia_term_corpus_ner:I-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:I-T184)`, `bionlp_st_2013_gro_NER:B-RegulatoryProcess)`, `bionlp_st_2011_id_NER:B-Negative_regulation)`, `bionlp_st_2013_cg_NER:I-Development)`, `cellfinder_ner:I-Anatomy)`, `chia_ner:B-Condition)`, `chemdner_TEXT:MESH:D003065)`, `medmentions_full_ner:B-T012)`, `bionlp_st_2011_id_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorComplex)`, `bionlp_st_2013_cg_NER:I-Carcinogenesis)`, `medmentions_full_ner:B-T064)`, `medmentions_full_ner:B-T026)`, `nlmchem_ner:B-Chemical)`, `genia_term_corpus_ner:I-RNA_domain_or_region)`, `ebm_pico_ner:I-Intervention_Educational)`, `genia_term_corpus_ner:B-ANDcell_linecell_line)`, `genia_term_corpus_ner:B-protein_substructure)`, `bionlp_st_2013_gro_NER:I-ProteinTransport)`, `bionlp_st_2013_cg_NER:B-DNA_demethylation)`, `medmentions_full_ner:I-T058)`, `biorelex_ner:B-parameter)`, `chemdner_TEXT:MESH:D013006)`, `mirna_ner:I-Relation_Trigger)`, `bionlp_st_2013_gro_ner:B-PrimaryStructure)`, `bionlp_st_2013_gro_NER:I-Phosphorylation)`, `chemdner_TEXT:MESH:D003911)`, `pico_extraction_ner:I-participant)`, `chemdner_TEXT:MESH:D010938)`, `chia_ner:B-Person)`, `an_em_ner:B-Tissue)`, `medmentions_st21pv_ner:B-T170)`, `chemdner_TEXT:MESH:D013936)`, `chemdner_TEXT:MESH:D001080)`, `mlee_RE:None)`, `chemdner_TEXT:MESH:D013669)`, `chemdner_TEXT:MESH:D009943)`, `spl_adr_200db_train_ner:I-Factor)`, `chemdner_TEXT:MESH:D044004)`, `ebm_pico_ner:I-Participant_Sex)`, `chemdner_TEXT:MESH:D000409)`, `bionlp_st_2013_cg_NER:B-Cell_division)`, `medmentions_st21pv_ner:B-T033)`, `pcr_ner:I-Herb)`, `chemdner_TEXT:MESH:D020112)`, `bionlp_st_2013_pc_NER:B-Gene_expression)`, `bionlp_st_2011_rel_ner:O)`, `chemdner_TEXT:MESH:D008610)`, `bionlp_st_2013_gro_NER:B-BindingOfDNABindingDomainOfProteinToDNA)`, `bionlp_st_2013_gro_ner:I-Cell)`, `medmentions_full_ner:I-T055)`, `bionlp_st_2013_pc_NER:I-Negative_regulation)`, `chia_RE:Has_value)`, `tmvar_v1_ner:I-SNP)`, `biorelex_ner:I-experimental-construct)`, `genia_term_corpus_ner:B-)`, `chemdner_TEXT:MESH:D053978)`, `bionlp_st_2013_gro_ner:I-Stress)`, `mlee_ner:B-Pathological_formation)`, `bionlp_st_2013_cg_ner:O)`, `chemdner_TEXT:MESH:D007631)`, `chemdner_TEXT:MESH:D011084)`, `medmentions_full_ner:B-T080)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-TranscriptionCorepressor)`, `ehr_rel_sts:4)`, `mlee_ner:I-Gene_or_gene_product)`, `chemdner_TEXT:MESH:D003474)`, `medmentions_full_ner:B-T098)`, `scicite_TEXT:method)`, `medmentions_full_ner:B-T100)`, `chemdner_TEXT:MESH:D011849)`, `medmentions_full_ner:I-T039)`, `anat_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:I-Nucleus)`, `mlee_NER:I-Regulation)`, `bionlp_st_2013_gro_ner:I-NuclearReceptor)`, `bionlp_st_2013_ge_RE:None)`, `chemdner_TEXT:MESH:D019483)`, `bionlp_st_2013_cg_ner:B-Cell)`, `bionlp_st_2013_gro_ner:B-Holoenzyme)`, `bionlp_st_2011_epi_NER:I-Methylation)`, `bionlp_shared_task_2009_ner:B-Protein)`, `medmentions_st21pv_ner:I-T038)`, `bionlp_st_2013_gro_ner:I-DNARegion)`, `bionlp_st_2013_gro_NER:I-CellCyclePhase)`, `bionlp_st_2013_gro_ner:I-tRNA)`, `mlee_ner:I-Multi-tissue_structure)`, `chemprot_ner:O)`, `medmentions_full_ner:B-T094)`, `bionlp_st_2013_gro_RE:fromSpecies)`, `bionlp_st_2013_gro_NER:O)`, `bionlp_st_2013_gro_NER:B-Acetylation)`, `bioinfer_ner:I-Protein_family_or_group)`, `medmentions_st21pv_ner:I-T098)`, `pdr_ner:B-Disease)`, `chemdner_ner:I-Chemical)`, `bionlp_st_2013_cg_NER:B-Negative_regulation)`, `chebi_nactem_fullpaper_ner:B-Chemical_Structure)`, `bionlp_st_2011_ge_NER:I-Negative_regulation)`, `diann_iber_eval_en_ner:O)`, `bionlp_shared_task_2009_NER:I-Binding)`, `mlee_NER:I-Cell_proliferation)`, `chebi_nactem_fullpaper_ner:B-Protein)`, `bionlp_st_2013_gro_NER:B-Phosphorylation)`, `bionlp_st_2011_epi_COREF:coref)`, `medmentions_full_ner:B-T200)`, `bionlp_st_2013_cg_ner:B-Tissue)`, `chemdner_TEXT:MESH:D000082)`, `chemdner_TEXT:MESH:D037201)`, `bionlp_st_2013_gro_ner:B-ComplexMolecularEntity)`, `bionlp_st_2011_ge_RE:ToLoc)`, `diann_iber_eval_en_ner:B-Neg)`, `bionlp_st_2013_gro_ner:B-RibosomalRNA)`, `bionlp_shared_task_2009_NER:I-Protein_catabolism)`, `chemdner_TEXT:MESH:D016912)`, `medmentions_full_ner:B-T017)`, `bionlp_st_2013_gro_ner:B-CpGIsland)`, `mlee_ner:I-Organism_substance)`, `medmentions_full_ner:I-T075)`, `bionlp_st_2013_gro_ner:I-SecondMessenger)`, `bioinfer_ner:B-Protein_family_or_group)`, `bionlp_st_2013_cg_NER:I-Negative_regulation)`, `mantra_gsc_en_emea_ner:B-CHEM)`, `genia_term_corpus_ner:B-DNA_NA)`, `chemdner_TEXT:MESH:D057888)`, `chemdner_TEXT:MESH:D006495)`, `chemdner_TEXT:MESH:D006575)`, `geokhoj_v1_TEXT:0)`, `bionlp_st_2013_gro_RE:locatedIn)`, `genia_term_corpus_ner:B-virus)`, `bionlp_st_2013_gro_ner:B-RuntLikeDomain)`, `medmentions_full_ner:B-T131)`, `bionlp_st_2013_gro_ner:I-ProteinCodingRegion)`, `chemdner_TEXT:MESH:D015525)`, `genia_term_corpus_ner:I-mono_cell)`, `chemdner_TEXT:MESH:D007840)`, `medmentions_full_ner:I-T098)`, `chemdner_TEXT:MESH:D009930)`, `genia_term_corpus_ner:I-polynucleotide)`, `biorelex_ner:I-protein-region)`, `bionlp_st_2011_id_NER:I-Process)`, `bionlp_st_2013_gro_NER:I-CellularProcess)`, `medmentions_full_ner:B-T023)`, `chemdner_TEXT:MESH:D008942)`, `medmentions_full_ner:I-T070)`, `biorelex_ner:B-organelle)`, `bionlp_st_2013_gro_NER:I-Decrease)`, `verspoor_2013_ner:I-size)`, `chemdner_TEXT:MESH:D002945)`, `ebm_pico_ner:B-Intervention_Other)`, `bionlp_st_2013_cg_ner:I-Simple_chemical)`, `chemdner_TEXT:MESH:D008751)`, `chia_RE:AND)`, `medmentions_full_ner:I-T028)`, `ebm_pico_ner:I-Intervention_Other)`, `chemdner_TEXT:MESH:D005472)`, `chemdner_TEXT:MESH:D005070)`, `gnormplus_ner:B-Gene)`, `medmentions_full_ner:I-T190)`, `mlee_NER:B-Breakdown)`, `bioinfer_ner:B-GeneproteinRNA)`, `bioinfer_ner:B-Gene)`, `chemdner_TEXT:MESH:D006835)`, `chemdner_TEXT:MESH:D004298)`, `chemdner_TEXT:MESH:D002951)`, `chia_ner:I-Device)`, `bionlp_st_2013_pc_NER:B-Conversion)`, `bionlp_shared_task_2009_NER:I-Transcription)`, `mlee_NER:B-DNA_methylation)`, `pubmed_qa_labeled_fold0_CLF:no)`, `minimayosrs_sts:1)`, `chemdner_TEXT:MESH:D002166)`, `chemdner_TEXT:MESH:D005934)`, `bionlp_st_2013_gro_NER:B-CatabolicPathway)`, `tmvar_v1_ner:I-ProteinMutation)`, `verspoor_2013_ner:I-Phenomena)`, `medmentions_full_ner:B-T011)`, `chemdner_TEXT:MESH:D001218)`, `medmentions_full_ner:B-T185)`, `mantra_gsc_en_patents_ner:I-PROC)`, `medmentions_full_ner:I-T120)`, `chia_ner:I-Procedure)`, `genia_term_corpus_ner:I-ANDcell_typecell_type)`, `bionlp_st_2011_id_ner:I-Entity)`, `pcr_ner:B-Chemical)`, `bionlp_st_2013_gro_NER:B-PositiveRegulation)`, `mlee_RE:Theme)`, `bionlp_st_2011_epi_ner:B-Protein)`, `medmentions_full_ner:B-T055)`, `spl_adr_200db_train_ner:I-Severity)`, `bionlp_st_2013_gro_ner:I-Ion)`, `bionlp_st_2011_id_RE:Cause)`, `bc5cdr_ner:I-Disease)`, `bionlp_st_2013_gro_ner:I-bHLH)`, `chemdner_TEXT:MESH:D001058)`, `bionlp_st_2013_gro_ner:I-AminoAcid)`, `bionlp_st_2011_epi_NER:B-Phosphorylation)`, `medmentions_full_ner:B-T086)`, `chemdner_TEXT:MESH:D004441)`, `medmentions_st21pv_ner:I-T007)`, `biorelex_ner:B-drug)`, `mantra_gsc_en_patents_ner:I-DISO)`, `medmentions_full_ner:I-T197)`, `bionlp_st_2011_ge_RE:AtLoc)`, `bionlp_st_2013_gro_NER:B-MolecularProcess)`, `bionlp_st_2011_ge_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:B-FormationOfTranscriptionInitiationComplex)`, `bionlp_st_2011_ge_NER:I-Binding)`, `mirna_ner:B-GenesProteins)`, `mirna_ner:B-Diseases)`, `mantra_gsc_en_emea_ner:I-DISO)`, `anat_em_ner:I-Multi-tissue_structure)`, `bioinfer_ner:O)`, `chemdner_TEXT:MESH:D017673)`, `bionlp_st_2013_gro_NER:B-Methylation)`, `genia_term_corpus_ner:I-AND_NOTcell_typecell_type)`, `bionlp_st_2013_cg_NER:I-Positive_regulation)`, `bionlp_st_2013_cg_NER:B-Carcinogenesis)`, `chemdner_TEXT:MESH:D009543)`, `gnormplus_ner:I-Gene)`, `bionlp_st_2013_cg_RE:Participant)`, `chemdner_TEXT:MESH:D019804)`, `seth_corpus_RE:Equals)`, `medmentions_full_ner:I-T082)`, `hprd50_ner:O)`, `bionlp_st_2013_gro_ner:B-OxidativeStress)`, `chemdner_TEXT:MESH:D014227)`, `bio_sim_verb_sts:7)`, `bionlp_st_2011_ge_NER:I-Protein_catabolism)`, `bionlp_st_2011_ge_NER:B-Localization)`, `chemdner_TEXT:MESH:D001224)`, `chemdner_TEXT:MESH:D009842)`, `bionlp_st_2013_cg_ner:B-Amino_acid)`, `bionlp_st_2013_gro_NER:B-CellCyclePhase)`, `chemdner_TEXT:MESH:D002245)`, `bionlp_st_2013_ge_NER:I-Ubiquitination)`, `bionlp_st_2013_cg_NER:I-Cell_death)`, `pico_extraction_ner:O)`, `chemdner_TEXT:MESH:D000596)`, `chemdner_TEXT:MESH:D000638)`, `an_em_ner:B-Developing_anatomical_structure)`, `bionlp_st_2019_bb_ner:I-Phenotype)`, `bionlp_st_2013_gro_NER:I-CellDeath)`, `mantra_gsc_en_patents_ner:B-PHYS)`, `chemdner_TEXT:MESH:D009705)`, `genia_term_corpus_ner:B-protein_molecule)`, `mantra_gsc_en_medline_ner:B-PHEN)`, `bionlp_st_2013_gro_NER:I-PosttranslationalModification)`, `ddi_corpus_ner:B-BRAND)`, `mantra_gsc_en_medline_ner:B-DEVI)`, `mlee_NER:I-Planned_process)`, `tmvar_v1_ner:O)`, `bionlp_st_2011_ge_NER:I-Phosphorylation)`, `genia_term_corpus_ner:I-ANDprotein_substructureprotein_substructure)`, `medmentions_st21pv_ner:B-T007)`, `bionlp_st_2013_cg_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-NucleicAcid)`, `medmentions_full_ner:I-T044)`, `chia_ner:I-Person)`, `chemdner_TEXT:MESH:D016572)`, `scai_disease_ner:O)`, `bionlp_st_2013_gro_ner:B-TranscriptionCofactor)`, `chemdner_TEXT:MESH:D002762)`, `chemdner_TEXT:MESH:D011685)`, `chemdner_TEXT:MESH:D005031)`, `scai_disease_ner:I-ADVERSE)`, `biorelex_ner:I-protein-isoform)`, `bionlp_shared_task_2009_COREF:None)`, `genia_term_corpus_ner:I-lipid)`, `biorelex_ner:B-RNA)`, `chemdner_TEXT:MESH:D018020)`, `scai_chemical_ner:B-FAMILY)`, `chemdner_TEXT:MESH:D017382)`, `chemdner_TEXT:MESH:D006027)`, `chemdner_TEXT:MESH:D018942)`, `medmentions_full_ner:I-T024)`, `chemdner_TEXT:MESH:D008050)`, `bionlp_st_2013_cg_NER:B-Glycosylation)`, `chemdner_TEXT:MESH:D019342)`, `chemdner_TEXT:MESH:D008774)`, `bionlp_st_2011_ge_RE:CSite)`, `bionlp_st_2013_gro_ner:B-HMGTF)`, `chemdner_ner:B-Chemical)`, `bioscope_papers_ner:B-negation)`, `biorelex_RE:bind)`, `bioinfer_ner:B-Protein_complex)`, `bionlp_st_2011_epi_NER:B-Ubiquitination)`, `bionlp_st_2013_gro_NER:I-RegulationOfTranscription)`, `chemdner_TEXT:MESH:D011134)`, `bionlp_st_2011_rel_ner:I-Entity)`, `mantra_gsc_en_medline_ner:I-PROC)`, `ncbi_disease_ner:I-DiseaseClass)`, `chemdner_TEXT:MESH:D014315)`, `bionlp_st_2013_gro_ner:I-Chromosome)`, `chemdner_TEXT:MESH:D000639)`, `chemdner_TEXT:MESH:D005740)`, `bionlp_st_2013_gro_ner:I-MolecularFunction)`, `verspoor_2013_ner:B-gene)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomainTF)`, `bionlp_st_2013_gro_ner:B-DNARegion)`, `ebm_pico_ner:B-Intervention_Educational)`, `medmentions_st21pv_ner:B-T005)`, `medmentions_full_ner:I-T022)`, `gnormplus_ner:B-FamilyName)`, `bionlp_st_2011_epi_RE:Contextgene)`, `bionlp_st_2013_pc_NER:B-Demethylation)`, `chia_ner:I-Observation)`, `medmentions_full_ner:I-T089)`, `bionlp_st_2013_gro_ner:I-ComplexMolecularEntity)`, `bionlp_st_2013_gro_ner:B-Lipid)`, `biorelex_ner:I-gene)`, `chemdner_TEXT:MESH:D003300)`, `chemdner_TEXT:MESH:D008903)`, `verspoor_2013_RE:relatedTo)`, `bionlp_st_2011_epi_NER:I-DNA_methylation)`, `genia_term_corpus_ner:I-cell_component)`, `bionlp_st_2011_ge_COREF:None)`, `ebm_pico_ner:B-Participant_Sample-size)`, `chemdner_TEXT:MESH:D043823)`, `chemdner_TEXT:MESH:D004958)`, `bionlp_st_2013_gro_ner:I-RNA)`, `chemdner_TEXT:MESH:D006150)`, `bionlp_st_2013_gro_ner:B-MolecularStructure)`, `chemdner_TEXT:MESH:D007457)`, `bionlp_st_2013_gro_ner:I-OxidativeStress)`, `scai_chemical_ner:B-PARTIUPAC)`, `mlee_NER:I-Blood_vessel_development)`, `bionlp_shared_task_2009_ner:B-Entity)`, `bionlp_st_2013_ge_RE:CSite)`, `medmentions_full_ner:B-T058)`, `chemdner_TEXT:MESH:D000628)`, `ebm_pico_ner:I-Intervention_Surgical)`, `an_em_ner:I-Organ)`, `bionlp_st_2013_gro_NER:B-Increase)`, `iepa_RE:PPI)`, `mlee_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D014284)`, `chemdner_TEXT:MESH:D014260)`, `bionlp_st_2011_epi_NER:I-Glycosylation)`, `bionlp_st_2013_gro_NER:B-BindingToProtein)`, `bionlp_st_2013_gro_NER:B-BindingToRNA)`, `medmentions_full_ner:I-T047)`, `bionlp_st_2013_gro_NER:B-Localization)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfGeneExpression)`, `medmentions_full_ner:I-T051)`, `bionlp_st_2011_id_COREF:None)`, `chemdner_TEXT:MESH:D011744)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToDNA)`, `bionlp_st_2013_gro_ner:B-CatalyticActivity)`, `chebi_nactem_abstr_ann1_ner:I-Biological_Activity)`, `bio_sim_verb_sts:1)`, `chemdner_TEXT:MESH:D012402)`, `bionlp_st_2013_gro_ner:B-bZIPTF)`, `chemdner_TEXT:MESH:D003913)`, `bionlp_shared_task_2009_RE:Site)`, `bionlp_st_2013_gro_ner:I-AntisenseRNA)`, `bionlp_st_2013_gro_NER:B-ProteinTargeting)`, `bionlp_st_2013_gro_NER:B-GeneExpression)`, `bionlp_st_2013_cg_NER:I-Blood_vessel_development)`, `mantra_gsc_en_patents_ner:I-CHEM)`, `mayosrs_sts:2)`, `chemdner_TEXT:MESH:D001645)`, `bionlp_st_2011_ge_NER:I-Transcription)`, `bionlp_st_2011_epi_NER:B-Acetylation)`, `medmentions_full_ner:B-T002)`, `verspoor_2013_ner:I-Concepts_Ideas)`, `hprd50_RE:None)`, `ddi_corpus_ner:O)`, `chemdner_TEXT:MESH:D014131)`, `ebm_pico_ner:B-Outcome_Physical)`, `medmentions_st21pv_ner:B-T103)`, `chemdner_TEXT:MESH:D016650)`, `mlee_NER:B-Cell_proliferation)`, `bionlp_st_2013_gro_ner:I-TranscriptionCoactivator)`, `chebi_nactem_fullpaper_ner:I-Chemical)`, `chemdner_TEXT:MESH:D013256)`, `biorelex_ner:I-protein-DNA-complex)`, `chemdner_TEXT:MESH:D008767)`, `bioinfer_RE:None)`, `nlm_gene_ner:B-Gene)`, `bionlp_st_2013_gro_ner:B-ReporterGene)`, `biosses_sts:1)`, `chemdner_TEXT:MESH:D000493)`, `chemdner_TEXT:MESH:D011374)`, `ebm_pico_ner:B-Intervention_Control)`, `bionlp_st_2013_pc_NER:I-Pathway)`, `chemprot_RE:CPR:3)`, `bionlp_st_2013_cg_ner:I-Amino_acid)`, `chemdner_TEXT:MESH:D005557)`, `bionlp_st_2011_ge_RE:Site)`, `bionlp_st_2013_pc_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:I-Elongation)`, `bionlp_st_2011_ge_NER:I-Localization)`, `spl_adr_200db_train_ner:B-Negation)`, `chemdner_TEXT:MESH:D010455)`, `nlm_gene_ner:B-GENERIF)`, `mlee_RE:Site)`, `bionlp_st_2013_gro_NER:B-BindingOfTFToTFBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D017953)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscription)`, `osiris_ner:B-gene)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressor)`, `medmentions_full_ner:I-T131)`, `genia_term_corpus_ner:B-protein_family_or_group)`, `genia_term_corpus_ner:B-cell_type)`, `chemdner_TEXT:MESH:D013759)`, `chemdner_TEXT:MESH:D002247)`, `scai_chemical_ner:I-FAMILY)`, `chemdner_TEXT:MESH:D006020)`, `biorelex_ner:B-DNA)`, `chebi_nactem_abstr_ann1_ner:I-Spectral_Data)`, `mantra_gsc_en_medline_ner:B-DISO)`, `chemdner_TEXT:MESH:D019829)`, `ncbi_disease_ner:I-CompositeMention)`, `chemdner_TEXT:MESH:D013876)`, `chebi_nactem_fullpaper_ner:I-Spectral_Data)`, `biorelex_ner:I-DNA)`, `chemdner_TEXT:MESH:D005492)`, `chemdner_TEXT:MESH:D011810)`, `chemdner_TEXT:MESH:D008563)`, `chemdner_TEXT:MESH:D015735)`, `bionlp_st_2019_bb_ner:B-Microorganism)`, `ddi_corpus_RE:INT)`, `medmentions_st21pv_ner:B-T038)`, `bionlp_st_2013_gro_NER:B-CellCyclePhaseTransition)`, `cellfinder_ner:B-CellLine)`, `pdr_RE:Cause)`, `chemdner_TEXT:MESH:D011433)`, `chemdner_TEXT:MESH:D011720)`, `chemdner_TEXT:MESH:D020156)`, `ebm_pico_ner:O)`, `mlee_ner:B-Organ)`, `chemdner_TEXT:MESH:D012721)`, `chebi_nactem_fullpaper_ner:I-Biological_Activity)`, `bionlp_st_2013_cg_COREF:coref)`, `chemdner_TEXT:MESH:D006918)`, `medmentions_full_ner:B-T092)`, `genia_term_corpus_ner:B-protein_NA)`, `bionlp_st_2013_ge_ner:B-Entity)`, `an_em_ner:B-Multi-tissue_structure)`, `chia_ner:I-Measurement)`, `chia_RE:Has_temporal)`, `bionlp_st_2011_id_NER:B-Protein_catabolism)`, `bionlp_st_2013_gro_NER:B-CellAdhesion)`, `bionlp_st_2013_gro_ner:B-DNABindingSite)`, `biorelex_ner:B-organism)`, `scai_disease_ner:I-DISEASE)`, `bionlp_st_2013_gro_ner:I-DNABindingSite)`, `chemdner_TEXT:MESH:D016607)`, `chemdner_TEXT:MESH:D030421)`, `bionlp_st_2013_pc_NER:I-Binding)`, `medmentions_full_ner:I-T029)`, `chemdner_TEXT:MESH:D001569)`, `genia_term_corpus_ner:B-ANDcell_typecell_type)`, `scai_chemical_ner:B-SUM)`, `chemdner_TEXT:MESH:D007656)`, `medmentions_full_ner:B-T082)`, `chemdner_TEXT:MESH:D009525)`, `medmentions_full_ner:B-T079)`, `bionlp_st_2013_cg_NER:B-Synthesis)`, `biorelex_ner:B-process)`, `bionlp_st_2013_ge_RE:Theme)`, `chemdner_TEXT:MESH:D012825)`, `chemdner_TEXT:MESH:D005462)`, `bionlp_st_2013_cg_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-CellCycle)`, `cellfinder_ner:I-CellLine)`, `bionlp_st_2013_gro_ner:I-DNABindingDomainOfProtein)`, `medmentions_st21pv_ner:B-T168)`, `genia_term_corpus_ner:B-body_part)`, `genia_term_corpus_ner:B-ANDprotein_family_or_groupprotein_family_or_group)`, `mlee_ner:B-Tissue)`, `mlee_NER:I-Localization)`, `medmentions_full_ner:B-T125)`, `bionlp_st_2013_cg_NER:B-Infection)`, `chebi_nactem_abstr_ann1_ner:I-Protein)`, `chemdner_TEXT:MESH:D009570)`, `medmentions_full_ner:I-T045)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivator)`, `verspoor_2013_ner:B-disease)`, `medmentions_full_ner:I-T056)`, `medmentions_full_ner:B-T050)`, `bionlp_st_2013_gro_ner:B-MolecularFunction)`, `medmentions_full_ner:B-T060)`, `bionlp_st_2013_gro_ner:B-Cell)`, `medmentions_full_ner:I-T060)`, `bionlp_st_2013_pc_NER:I-Gene_expression)`, `genia_term_corpus_ner:B-RNA_NA)`, `bionlp_st_2013_gro_ner:I-MessengerRNA)`, `medmentions_full_ner:I-T086)`, `an_em_RE:Part-of)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_gro_NER:I-Splicing)`, `bioinfer_RE:PPI)`, `bioscope_papers_ner:I-speculation)`, `bionlp_st_2013_gro_ner:B-HomeoBox)`, `medmentions_full_ner:B-T004)`, `chia_ner:I-Drug)`, `bionlp_st_2013_gro_ner:B-FusionOfGeneWithReporterGene)`, `genia_term_corpus_ner:I-cell_line)`, `chebi_nactem_abstr_ann1_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-ExpressionProfiling)`, `chemdner_TEXT:MESH:D004390)`, `medmentions_full_ner:B-T016)`, `bionlp_st_2013_cg_NER:B-Growth)`, `medmentions_full_ner:I-T170)`, `medmentions_full_ner:B-T093)`, `genia_term_corpus_ner:I-inorganic)`, `mlee_NER:B-Planned_process)`, `bionlp_st_2013_gro_RE:hasPart)`, `bionlp_st_2013_gro_ner:B-BasicDomain)`, `chemdner_TEXT:MESH:D050091)`, `medmentions_st21pv_ner:B-T037)`, `chemdner_TEXT:MESH:D011522)`, `bionlp_st_2013_ge_NER:B-Deacetylation)`, `chemdner_TEXT:MESH:D004008)`, `chemdner_TEXT:MESH:D013972)`, `bionlp_st_2013_gro_NER:B-SignalingPathway)`, `bionlp_st_2013_gro_ner:B-Promoter)`, `chemdner_TEXT:MESH:D012701)`, `an_em_COREF:None)`, `bionlp_st_2019_bb_RE:None)`, `mlee_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_NER:I-Translation)`, `chemdner_TEXT:MESH:D013453)`, `genia_term_corpus_ner:I-ANDprotein_moleculeprotein_molecule)`, `chemdner_TEXT:MESH:D002746)`, `chebi_nactem_abstr_ann1_ner:O)`, `bionlp_st_2013_pc_ner:O)`, `mayosrs_sts:7)`, `bionlp_st_2013_cg_NER:B-Pathway)`, `verspoor_2013_ner:I-age)`, `biorelex_ner:I-peptide)`, `medmentions_full_ner:I-T096)`, `chebi_nactem_fullpaper_ner:I-Chemical_Structure)`, `chemdner_TEXT:MESH:D007211)`, `medmentions_full_ner:I-T018)`, `medmentions_full_ner:B-T201)`, `bionlp_st_2013_gro_NER:B-BindingOfTFToTFBindingSiteOfProtein)`, `medmentions_full_ner:B-T054)`, `ebm_pico_ner:I-Intervention_Pharmacological)`, `chemdner_TEXT:MESH:D010672)`, `chemdner_TEXT:MESH:D004492)`, `chemdner_TEXT:MESH:D008094)`, `chemdner_TEXT:MESH:D002227)`, `chemdner_TEXT:MESH:D009553)`, `bionlp_st_2013_gro_NER:I-ResponseProcess)`, `chemdner_TEXT:MESH:D006046)`, `ebm_pico_ner:B-Participant_Condition)`, `nlm_gene_ner:I-Gene)`, `bionlp_st_2019_bb_ner:I-Habitat)`, `bionlp_shared_task_2009_COREF:coref)`, `chemdner_TEXT:MESH:D005640)`, `mantra_gsc_en_emea_ner:B-PHYS)`, `mantra_gsc_en_patents_ner:B-DISO)`, `bionlp_st_2013_gro_ner:B-Heterochromatin)`, `bionlp_st_2013_gro_NER:I-CellCycle)`, `bionlp_st_2013_cg_NER:I-Cell_proliferation)`, `bionlp_st_2013_cg_ner:B-Simple_chemical)`, `genia_term_corpus_ner:I-cell_type)`, `chemdner_TEXT:MESH:D003553)`, `bionlp_st_2013_ge_RE:Theme2)`, `tmvar_v1_ner:B-ProteinMutation)`, `chemdner_TEXT:MESH:D012717)`, `chemdner_TEXT:MESH:D026121)`, `chemdner_TEXT:MESH:D008687)`, `bionlp_st_2013_gro_NER:I-TranscriptionTermination)`, `medmentions_full_ner:B-T028)`, `biorelex_ner:B-assay)`, `genia_term_corpus_ner:B-tissue)`, `chemdner_TEXT:MESH:D009173)`, `bionlp_st_2013_gro_ner:B-TranscriptionCoactivator)`, `genia_term_corpus_ner:B-amino_acid_monomer)`, `mantra_gsc_en_emea_ner:B-DEVI)`, `bionlp_st_2013_gro_NER:B-Growth)`, `chemdner_TEXT:MESH:D017374)`, `genia_term_corpus_ner:B-other_artificial_source)`, `medmentions_full_ner:B-T072)`, `bionlp_st_2013_gro_NER:B-CellGrowth)`, `bionlp_st_2013_gro_ner:I-DoubleStrandDNA)`, `chemdner_ner:O)`, `bionlp_shared_task_2009_NER:I-Localization)`, `bionlp_st_2013_gro_NER:B-RegulationOfPathway)`, `genia_term_corpus_ner:I-amino_acid_monomer)`, `bionlp_st_2013_gro_NER:I-SPhase)`, `an_em_ner:B-Organism_substance)`, `medmentions_full_ner:B-T052)`, `genia_term_corpus_ner:B-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:B-T096)`, `chemdner_TEXT:MESH:D056831)`, `chemdner_TEXT:MESH:D010755)`, `pdr_NER:I-Cause_of_disease)`, `mlee_NER:B-Phosphorylation)`, `medmentions_full_ner:I-T064)`, `chemdner_TEXT:MESH:D005978)`, `mantra_gsc_en_medline_ner:I-PHEN)`, `bionlp_st_2013_cg_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_NER:B-Modification)`, `bionlp_st_2013_gro_ner:B-ProteinComplex)`, `bionlp_st_2013_gro_ner:B-DoubleStrandDNA)`, `medmentions_full_ner:B-T068)`, `medmentions_full_ner:I-T034)`, `bionlp_st_2011_epi_NER:B-Catalysis)`, `biosses_sts:0)`, `bionlp_st_2013_cg_ner:B-Organism_substance)`, `chemdner_TEXT:MESH:D055549)`, `bionlp_st_2013_cg_NER:B-Glycolysis)`, `chemdner_TEXT:MESH:D001761)`, `chemdner_TEXT:MESH:D011728)`, `bionlp_st_2013_gro_ner:B-Function)`, `medmentions_full_ner:I-T033)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfTranscriptionOfGene)`, `medmentions_full_ner:I-T053)`, `bionlp_st_2013_gro_ner:B-Protein)`, `genia_term_corpus_ner:I-ANDprotein_family_or_groupprotein_family_or_group)`, `bionlp_st_2013_gro_NER:I-CatabolicPathway)`, `biorelex_ner:I-chemical)`, `chemdner_TEXT:MESH:D013185)`, `biorelex_ner:I-RNA)`, `chemdner_TEXT:MESH:D009838)`, `medmentions_full_ner:I-T008)`, `chemdner_TEXT:MESH:D002104)`, `bionlp_st_2013_gro_NER:B-RNABiosynthesis)`, `verspoor_2013_ner:I-ethnicity)`, `bionlp_st_2013_gro_ner:I-SmallInterferingRNA)`, `chemdner_TEXT:MESH:D026023)`, `mlee_ner:O)`, `bionlp_st_2013_gro_NER:I-CellHomeostasis)`, `bionlp_st_2013_pc_NER:B-Pathway)`, `gnormplus_ner:I-DomainMotif)`, `bionlp_st_2013_gro_ner:I-OpenReadingFrame)`, `bionlp_st_2013_gro_NER:I-RegulationOfGeneExpression)`, `muchmore_en_ner:O)`, `chemdner_TEXT:MESH:D000911)`, `bionlp_st_2011_epi_NER:B-DNA_demethylation)`, `bionlp_st_2013_gro_ner:I-RuntLikeDomain)`, `chemdner_TEXT:MESH:D010748)`, `medmentions_full_ner:B-T008)`, `biorelex_ner:B-protein-RNA-complex)`, `bionlp_st_2013_cg_NER:I-Planned_process)`, `chemdner_TEXT:MESH:D014867)`, `mantra_gsc_en_patents_ner:I-LIVB)`, `bionlp_st_2013_gro_NER:I-Silencing)`, `chemdner_TEXT:MESH:D015306)`, `chemdner_TEXT:MESH:D001679)`, `bionlp_shared_task_2009_NER:I-Positive_regulation)`, `linnaeus_filtered_ner:O)`, `chia_RE:Has_multiplier)`, `medmentions_full_ner:B-T116)`, `bionlp_shared_task_2009_NER:B-Positive_regulation)`, `anat_em_ner:B-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D011137)`, `chemdner_TEXT:MESH:D048271)`, `chemdner_TEXT:MESH:D003975)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressorActivity)`, `bionlp_st_2011_id_ner:B-Protein)`, `bionlp_st_2013_gro_NER:I-Mutation)`, `chemdner_TEXT:MESH:D001572)`, `mantra_gsc_en_patents_ner:B-CHEM)`, `mantra_gsc_en_medline_ner:I-DEVI)`, `bionlp_st_2013_gro_ner:B-Enzyme)`, `medmentions_full_ner:B-T056)`, `mantra_gsc_en_patents_ner:B-OBJC)`, `medmentions_full_ner:B-T073)`, `anat_em_ner:I-Tissue)`, `chemdner_TEXT:MESH:D047310)`, `chia_ner:I-Scope)`, `ncbi_disease_ner:B-Modifier)`, `medmentions_st21pv_ner:B-T082)`, `medmentions_full_ner:I-T054)`, `genia_term_corpus_ner:I-carbohydrate)`, `bionlp_st_2013_cg_RE:Theme)`, `chemdner_TEXT:MESH:D009538)`, `chemdner_TEXT:MESH:D008691)`, `genia_term_corpus_ner:B-ANDprotein_substructureprotein_substructure)`, `bionlp_st_2013_cg_ner:I-Tissue)`, `chia_ner:B-Device)`, `chemdner_TEXT:MESH:D002784)`, `medmentions_full_ner:I-T007)`, `bionlp_st_2013_gro_ner:I-DNAFragment)`, `mlee_RE:ToLoc)`, `spl_adr_200db_train_ner:I-AdverseReaction)`, `bionlp_st_2013_cg_NER:B-Catabolism)`, `chemdner_TEXT:MESH:D013779)`, `bionlp_st_2013_pc_NER:B-Regulation)`, `bionlp_st_2013_gro_NER:I-Disease)`, `chia_ner:I-Condition)`, `chemdner_TEXT:MESH:D012370)`, `bionlp_st_2013_ge_NER:O)`, `bionlp_st_2013_pc_NER:B-Deubiquitination)`, `bionlp_st_2013_pc_NER:I-Translation)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_cg_NER:B-DNA_methylation)`, `bioscope_papers_ner:B-speculation)`, `chemdner_TEXT:MESH:D018130)`, `bionlp_st_2013_gro_ner:B-RNAPolymeraseII)`, `medmentions_st21pv_ner:B-T098)`, `bionlp_st_2013_gro_NER:B-Elongation)`, `bionlp_st_2013_pc_RE:Cause)`, `seth_corpus_ner:B-RS)`, `bionlp_st_2013_ge_RE:ToLoc)`, `chemdner_TEXT:MESH:D000538)`, `medmentions_full_ner:B-T192)`, `medmentions_full_ner:B-T061)`, `medmentions_full_ner:B-T032)`, `bionlp_st_2013_gro_NER:B-Transport)`, `medmentions_full_ner:I-T014)`, `chemdner_TEXT:MESH:D004137)`, `medmentions_full_ner:B-T101)`, `bionlp_st_2013_gro_NER:B-Transcription)`, `bionlp_st_2013_pc_NER:B-Transport)`, `medmentions_full_ner:I-T203)`, `ebm_pico_ner:I-Intervention_Control)`, `genia_term_corpus_ner:I-atom)`, `chemdner_TEXT:MESH:D014230)`, `osiris_ner:I-gene)`, `mantra_gsc_en_patents_ner:B-ANAT)`, `ncbi_disease_ner:I-SpecificDisease)`, `bionlp_st_2013_gro_NER:I-CellGrowth)`, `chemdner_TEXT:MESH:D001205)`, `chemdner_TEXT:MESH:D016627)`, `genia_term_corpus_ner:B-protein_subunit)`, `bionlp_st_2013_gro_ner:I-CellComponent)`, `medmentions_full_ner:B-T049)`, `scai_chemical_ner:O)`, `chemdner_TEXT:MESH:D010840)`, `chemdner_TEXT:MESH:D008694)`, `mantra_gsc_en_patents_ner:B-PHEN)`, `bionlp_st_2013_cg_RE:Cause)`, `chemdner_TEXT:MESH:D012293)`, `bionlp_st_2013_gro_NER:B-Homodimerization)`, `chemdner_TEXT:MESH:D008070)`, `chia_RE:OR)`, `bionlp_st_2013_cg_ner:I-Gene_or_gene_product)`, `verspoor_2013_ner:I-disease)`, `muchmore_en_ner:B-umlsterm)`, `chemdner_TEXT:MESH:D011794)`, `medmentions_full_ner:I-T002)`, `chemdner_TEXT:MESH:D007649)`, `genia_term_corpus_ner:B-AND_NOTcell_typecell_type)`, `medmentions_full_ner:I-T023)`, `chemprot_RE:CPR:1)`, `chemdner_TEXT:MESH:D001786)`, `bionlp_st_2013_gro_ner:B-HomeoboxTF)`, `bionlp_st_2013_cg_ner:I-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-Attenuator)`, `bionlp_st_2019_bb_ner:B-Habitat)`, `chemdner_TEXT:MESH:D017931)`, `medmentions_full_ner:B-T047)`, `chemdner_TEXT:MESH:D006886)`, `genia_term_corpus_ner:I-)`, `medmentions_full_ner:B-T039)`, `chemdner_TEXT:MESH:D004220)`, `bionlp_st_2013_pc_RE:FromLoc)`, `nlm_gene_ner:I-GENERIF)`, `bionlp_st_2013_ge_NER:I-Protein_modification)`, `genia_term_corpus_ner:B-RNA_molecule)`, `chemdner_TEXT:MESH:D006854)`, `chemdner_TEXT:MESH:D006493)`, `chia_ner:B-Qualifier)`, `medmentions_full_ner:I-T013)`, `ehr_rel_sts:8)`, `an_em_RE:frag)`, `genia_term_corpus_ner:I-DNA_substructure)`, `chemdner_TEXT:MESH:D063065)`, `genia_term_corpus_ner:I-ANDprotein_complexprotein_complex)`, `bionlp_st_2013_pc_NER:I-Dissociation)`, `medmentions_full_ner:I-T004)`, `bionlp_st_2013_cg_ner:B-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D010069)`, `bionlp_st_2013_gro_NER:I-Homodimerization)`, `chemdner_TEXT:MESH:D006147)`, `medmentions_full_ner:I-T041)`, `bionlp_st_2011_id_NER:B-Regulation)`, `bionlp_st_2013_gro_ner:O)`, `chemdner_TEXT:MESH:D008623)`, `bionlp_st_2013_ge_ner:I-Protein)`, `scai_chemical_ner:I-TRIVIAL)`, `an_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-BindingAssay)`, `bionlp_st_2013_gro_ner:I-HMG)`, `anat_em_ner:I-Anatomical_system)`, `chemdner_TEXT:MESH:D015034)`, `mlee_NER:B-Catabolism)`, `mantra_gsc_en_medline_ner:B-LIVB)`, `ddi_corpus_ner:I-BRAND)`, `chia_ner:I-Multiplier)`, `bionlp_st_2013_gro_ner:I-SequenceHomologyAnalysis)`, `seth_corpus_RE:None)`, `bionlp_st_2013_cg_NER:B-Binding)`, `bioscope_papers_ner:I-negation)`, `chemdner_TEXT:MESH:D008741)`, `chemdner_TEXT:MESH:D052998)`, `chemdner_TEXT:MESH:D005227)`, `chemdner_TEXT:MESH:D009828)`, `spl_adr_200db_train_ner:B-Animal)`, `chemdner_TEXT:MESH:D010616)`, `bionlp_st_2013_gro_ner:I-ProteinComplex)`, `pico_extraction_ner:B-outcome)`, `mlee_NER:B-Negative_regulation)`, `chemdner_TEXT:MESH:D007093)`, `bionlp_st_2013_gro_NER:I-RNAProcessing)`, `bionlp_st_2013_gro_RE:hasAgent2)`, `biorelex_ner:I-reagent)`, `medmentions_st21pv_ner:I-T074)`, `bionlp_st_2013_gro_NER:B-BindingOfMolecularEntity)`, `chemdner_TEXT:MESH:D008911)`, `medmentions_full_ner:B-T033)`, `genia_term_corpus_ner:B-ANDprotein_complexprotein_complex)`, `medmentions_full_ner:I-T100)`, `chemdner_TEXT:MESH:D019259)`, `genia_term_corpus_ner:I-BUT_NOTother_nameother_name)`, `geokhoj_v1_TEXT:1)`, `bionlp_st_2013_cg_RE:Site)`, `medmentions_full_ner:B-T184)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelixTF)`, `bionlp_st_2013_cg_ner:I-Protein_domain_or_region)`, `genia_term_corpus_ner:I-other_organic_compound)`, `chemdner_TEXT:MESH:D010793)`, `bionlp_st_2011_id_NER:B-Phosphorylation)`, `chemdner_TEXT:MESH:D002482)`, `bionlp_st_2013_cg_NER:B-Breakdown)`, `biorelex_ner:I-disease)`, `genia_term_corpus_ner:B-DNA_substructure)`, `bionlp_st_2013_gro_RE:hasPatient)`, `medmentions_full_ner:B-T127)`, `medmentions_full_ner:I-T185)`, `bionlp_shared_task_2009_RE:AtLoc)`, `medmentions_full_ner:I-T201)`, `chemdner_TEXT:MESH:D005290)`, `mlee_NER:I-Breakdown)`, `medmentions_full_ner:I-T063)`, `chemdner_TEXT:MESH:D017964)`, `an_em_ner:I-Tissue)`, `mlee_ner:I-Organism)`, `mantra_gsc_en_emea_ner:I-CHEM)`, `bionlp_st_2013_cg_ner:B-Anatomical_system)`, `genia_term_corpus_ner:B-ORDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_pc_NER:B-Degradation)`, `chemprot_RE:CPR:0)`, `genia_term_corpus_ner:B-inorganic)`, `chemdner_TEXT:MESH:D005466)`, `chia_ner:O)`, `medmentions_full_ner:B-T078)`, `mlee_NER:B-Growth)`, `mantra_gsc_en_emea_ner:B-PHEN)`, `chemdner_TEXT:MESH:D012545)`, `bionlp_st_2013_gro_NER:B-G1Phase)`, `chemdner_TEXT:MESH:D009841)`, `bionlp_st_2013_gro_ner:B-Chromatin)`, `bionlp_st_2011_epi_RE:Site)`, `medmentions_full_ner:B-T066)`, `genetaggold_ner:O)`, `bionlp_st_2013_cg_NER:I-Gene_expression)`, `medmentions_st21pv_ner:B-T092)`, `chemprot_RE:CPR:8)`, `bionlp_st_2013_cg_RE:Instrument)`, `nlm_gene_ner:I-Domain)`, `chemdner_TEXT:MESH:D006151)`, `bionlp_st_2011_id_ner:I-Protein)`, `mlee_NER:B-Synthesis)`, `bionlp_st_2013_gro_NER:B-CellMotility)`, `scai_chemical_ner:B-MODIFIER)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfTranscription)`, `osiris_ner:O)`, `mlee_NER:B-Acetylation)`, `medmentions_st21pv_ner:B-T062)`, `chemdner_TEXT:MESH:D017705)`, `bionlp_st_2013_gro_NER:I-TranscriptionOfGene)`, `genia_term_corpus_ner:I-protein_complex)`, `chemprot_RE:CPR:10)`, `medmentions_full_ner:B-T102)`, `medmentions_full_ner:I-T171)`, `chia_ner:B-Reference_point)`, `medmentions_full_ner:B-T015)`, `bionlp_st_2013_gro_ner:I-RNAPolymerase)`, `chebi_nactem_abstr_ann1_ner:B-Metabolite)`, `bionlp_st_2013_gro_NER:I-CellDifferentiation)`, `chemdner_TEXT:MESH:D006861)`, `pubmed_qa_labeled_fold0_CLF:maybe)`, `bionlp_st_2013_gro_ner:I-Sequence)`, `mlee_NER:B-Transcription)`, `bc5cdr_ner:B-Chemical)`, `chemdner_TEXT:MESH:D000072317)`, `bionlp_st_2013_gro_NER:B-Producing)`, `genia_term_corpus_ner:B-ANDprotein_moleculeprotein_molecule)`, `bionlp_st_2011_id_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-MolecularInteraction)`, `chemdner_TEXT:MESH:D014639)`, `bionlp_st_2013_gro_NER:I-Increase)`, `mlee_NER:I-Translation)`, `medmentions_full_ner:B-T087)`, `bioscope_abstracts_ner:B-speculation)`, `ebm_pico_ner:B-Outcome_Adverse-effects)`, `mantra_gsc_en_medline_ner:B-PHYS)`, `bionlp_st_2013_gro_ner:I-Lipid)`, `bionlp_st_2011_ge_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D005278)`, `bionlp_shared_task_2009_NER:B-Phosphorylation)`, `mlee_NER:I-Gene_expression)`, `bionlp_st_2011_epi_NER:I-Deacetylation)`, `chemdner_TEXT:MESH:D002110)`, `medmentions_full_ner:I-T121)`, `bionlp_st_2011_epi_ner:I-Entity)`, `bionlp_st_2019_bb_RE:Lives_In)`, `chemdner_TEXT:MESH:D001710)`, `anat_em_ner:B-Cancer)`, `bionlp_st_2013_gro_NER:B-RNASplicing)`, `mantra_gsc_en_medline_ner:I-ANAT)`, `chemdner_TEXT:MESH:D024508)`, `chemdner_TEXT:MESH:D000537)`, `mantra_gsc_en_medline_ner:I-DISO)`, `bionlp_st_2013_gro_ner:I-Prokaryote)`, `bionlp_st_2013_gro_ner:I-Chromatin)`, `bionlp_st_2013_gro_ner:B-Nucleotide)`, `linnaeus_ner:I-species)`, `verspoor_2013_ner:I-body-part)`, `bionlp_st_2013_gro_ner:B-DNAFragment)`, `bionlp_st_2013_gro_ner:B-PositiveTranscriptionRegulator)`, `medmentions_full_ner:I-T049)`, `bionlp_st_2011_ge_ner:B-Entity)`, `medmentions_full_ner:I-T017)`, `bionlp_st_2013_gro_NER:B-TranscriptionOfGene)`, `chemdner_TEXT:MESH:D009947)`, `mlee_NER:B-Dephosphorylation)`, `bionlp_st_2013_gro_NER:B-GeneSilencing)`, `pdr_RE:None)`, `scai_chemical_ner:I-TRIVIALVAR)`, `bionlp_st_2011_epi_NER:O)`, `bionlp_st_2013_cg_ner:I-Cell)`, `sciq_SEQ:None)`, `chemdner_TEXT:MESH:D019913)`, `mlee_RE:Participant)`, `chia_ner:I-Negation)`, `chemdner_TEXT:MESH:D014801)`, `chemdner_TEXT:MESH:D058846)`, `chemdner_TEXT:MESH:D011809)`, `bionlp_st_2011_epi_ner:O)`, `bionlp_st_2013_cg_NER:I-Metastasis)`, `chemdner_TEXT:MESH:D012643)`, `an_em_ner:I-Cell)`, `bionlp_st_2013_gro_ner:I-CatalyticActivity)`, `anat_em_ner:B-Anatomical_system)`, `mlee_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:I-ChromosomalDNA)`, `anat_em_ner:B-Cell)`, `chemdner_TEXT:MESH:D000242)`, `chemdner_TEXT:MESH:D017641)`, `bioscope_abstracts_ner:I-negation)`, `medmentions_st21pv_ner:B-T058)`, `chemdner_TEXT:MESH:D008744)`, `bionlp_st_2013_gro_ner:B-UpstreamRegulatorySequence)`, `chemdner_TEXT:MESH:D008012)`, `medmentions_full_ner:B-T013)`, `bionlp_st_2011_epi_NER:B-Glycosylation)`, `chemdner_TEXT:MESH:D052999)`, `chemdner_TEXT:MESH:D002329)`, `ebm_pico_ner:I-Intervention_Physical)`, `bionlp_st_2013_pc_ner:B-Complex)`, `medmentions_st21pv_ner:I-T005)`, `chemdner_TEXT:MESH:D064704)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomainTF)`, `bionlp_st_2013_pc_ner:I-Cellular_component)`, `genia_term_corpus_ner:B-ANDDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_gro_ner:B-Chromosome)`, `chemdner_TEXT:MESH:D007546)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfGeneExpression)`, `medmentions_full_ner:I-T010)`, `pdr_NER:B-Treatment_of_disease)`, `medmentions_full_ner:B-T081)`, `bionlp_st_2011_epi_NER:B-Demethylation)`, `chemdner_TEXT:MESH:D013261)`, `bionlp_st_2013_gro_ner:I-RibosomalRNA)`, `verspoor_2013_ner:O)`, `bionlp_st_2013_gro_NER:B-DevelopmentalProcess)`, `chemdner_TEXT:MESH:D009270)`, `medmentions_full_ner:I-T130)`, `bionlp_st_2013_cg_ner:B-Organism)`, `medmentions_full_ner:B-T014)`, `chemdner_TEXT:MESH:D003374)`, `chemdner_TEXT:MESH:D011078)`, `cellfinder_ner:B-GeneProtein)`, `mayosrs_sts:6)`, `chemdner_TEXT:MESH:D005576)`, `bionlp_st_2013_ge_RE:Cause)`, `an_em_RE:None)`, `sciq_SEQ:answer)`, `bionlp_st_2013_cg_NER:B-Dissociation)`, `mlee_RE:frag)`, `bionlp_st_2013_pc_COREF:coref)`, `chemdner_TEXT:MESH:D008469)`, `ncbi_disease_ner:O)`, `bionlp_st_2011_epi_ner:I-Protein)`, `chemdner_TEXT:MESH:D011140)`, `chemdner_TEXT:MESH:D020001)`, `bionlp_st_2013_gro_ner:I-ThreeDimensionalMolecularStructure)`, `bionlp_st_2013_cg_ner:B-Cancer)`, `genia_term_corpus_ner:B-BUT_NOTother_nameother_name)`, `chemdner_TEXT:MESH:D006862)`, `medmentions_full_ner:B-T104)`, `bionlp_st_2011_epi_RE:Theme)`, `cellfinder_ner:B-Anatomy)`, `chemdner_TEXT:MESH:D010545)`, `biorelex_ner:B-RNA-family)`, `pico_extraction_ner:I-outcome)`, `mantra_gsc_en_patents_ner:I-PHYS)`, `bionlp_st_2013_pc_NER:I-Transcription)`, `bionlp_shared_task_2009_RE:Cause)`, `bionlp_st_2013_gro_ner:B-Vitamin)`, `bionlp_shared_task_2009_RE:CSite)`, `bionlp_st_2011_ge_ner:I-Protein)`, `mlee_COREF:coref)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelix)`, `bioinfer_ner:I-Gene)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivatorActivity)`, `chemdner_TEXT:MESH:D054439)`, `chemdner_TEXT:MESH:D011621)`, `ddi_corpus_ner:I-DRUG_N)`, `chemdner_TEXT:MESH:D019308)`, `bionlp_st_2013_gro_ner:I-Locus)`, `bionlp_shared_task_2009_RE:ToLoc)`, `bionlp_st_2013_cg_NER:B-Development)`, `bionlp_st_2013_gro_NER:I-CellularDevelopmentalProcess)`, `bionlp_st_2013_gro_ner:B-Eukaryote)`, `bionlp_st_2013_ge_NER:B-Negative_regulation)`, `seth_corpus_ner:I-SNP)`, `hprd50_ner:B-protein)`, `bionlp_st_2013_gro_NER:B-BindingOfProtein)`, `mlee_NER:I-Negative_regulation)`, `bionlp_st_2011_ge_NER:B-Protein_catabolism)`, `bionlp_st_2013_pc_ner:B-Cellular_component)`, `bionlp_st_2011_id_ner:I-Chemical)`, `chemdner_TEXT:MESH:D013831)`, `biorelex_COREF:None)`, `chemdner_TEXT:MESH:D005609)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactor)`, `mlee_NER:B-Regulation)`, `chemdner_TEXT:MESH:D059808)`, `bionlp_st_2013_gro_ner:I-bHLHTF)`, `chemdner_TEXT:MESH:D010121)`, `chemdner_TEXT:MESH:D017608)`, `chemdner_TEXT:MESH:D007455)`, `mlee_NER:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorComplex)`, `biorelex_ner:B-disease)`, `bionlp_st_2013_cg_NER:B-Cell_differentiation)`, `medmentions_st21pv_ner:I-T092)`, `chemdner_TEXT:MESH:D007477)`, `medmentions_full_ner:B-T168)`, `pcr_ner:I-Chemical)`, `chemdner_TEXT:MESH:D009636)`, `chemdner_TEXT:MESH:D008051)`, `bionlp_shared_task_2009_NER:I-Gene_expression)`, `chemprot_ner:I-GENE-N)`, `biorelex_ner:B-reagent)`, `chemdner_TEXT:MESH:D020123)`, `nlmchem_ner:O)`, `ebm_pico_ner:I-Outcome_Mental)`, `chemdner_TEXT:MESH:D004040)`, `chemdner_TEXT:MESH:D000450)`, `chebi_nactem_fullpaper_ner:O)`, `biorelex_ner:B-protein-isoform)`, `chemdner_TEXT:MESH:D001564)`, `medmentions_full_ner:I-T095)`, `mlee_NER:I-Remodeling)`, `bionlp_st_2013_cg_RE:None)`, `biorelex_ner:O)`, `seth_corpus_RE:AssociatedTo)`, `bioscope_abstracts_ner:B-negation)`, `chebi_nactem_fullpaper_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressorActivity)`, `bionlp_st_2013_cg_NER:B-Transcription)`, `bionlp_st_2011_ge_ner:B-Protein)`, `bionlp_st_2013_ge_ner:B-Protein)`, `bionlp_st_2013_gro_ner:I-Tissue)`, `chemdner_TEXT:MESH:D044005)`, `genia_term_corpus_ner:I-protein_substructure)`, `bionlp_st_2013_gro_ner:I-TranslationFactor)`, `minimayosrs_sts:5)`, `chemdner_TEXT:MESH:D012834)`, `ncbi_disease_ner:I-Modifier)`, `mlee_NER:B-Death)`, `medmentions_full_ner:B-T196)`, `bio_sim_verb_sts:4)`, `bionlp_st_2013_gro_NER:B-CellHomeostasis)`, `chemdner_TEXT:MESH:D006001)`, `bionlp_st_2013_gro_RE:encodes)`, `biorelex_ner:B-fusion-protein)`, `mlee_COREF:None)`, `chemdner_TEXT:MESH:D001623)`, `chemdner_TEXT:MESH:D000812)`, `medmentions_full_ner:B-T046)`, `bionlp_shared_task_2009_NER:O)`, `chemdner_TEXT:MESH:D000735)`, `gnormplus_ner:O)`, `chemdner_TEXT:MESH:D014635)`, `bionlp_st_2013_gro_NER:B-Mitosis)`, `chemdner_TEXT:MESH:D003847)`, `chemdner_TEXT:MESH:D002809)`, `medmentions_full_ner:I-T116)`, `chemdner_TEXT:MESH:D060406)`, `chemprot_ner:B-CHEMICAL)`, `chemdner_TEXT:MESH:D016642)`, `bionlp_st_2013_cg_NER:B-Phosphorylation)`, `an_em_ner:B-Organ)`, `chemdner_TEXT:MESH:D013431)`, `bionlp_shared_task_2009_RE:None)`, `medmentions_full_ner:B-T041)`, `mlee_ner:I-Tissue)`, `chemdner_TEXT:MESH:D023303)`, `ebm_pico_ner:I-Participant_Condition)`, `bionlp_st_2013_gro_ner:I-TATAbox)`, `bionlp_st_2013_gro_ner:I-bZIP)`, `bionlp_st_2011_epi_RE:Sidechain)`, `bionlp_st_2013_gro_ner:B-LivingEntity)`, `mantra_gsc_en_medline_ner:B-CHEM)`, `chemdner_TEXT:MESH:D007659)`, `medmentions_full_ner:I-T085)`, `bionlp_st_2013_cg_ner:I-Organism_substance)`, `medmentions_full_ner:B-T067)`, `chemdner_TEXT:MESH:D057846)`, `bionlp_st_2013_gro_NER:I-SignalingPathway)`, `bc5cdr_ner:I-Chemical)`, `nlm_gene_ner:I-STARGENE)`, `medmentions_full_ner:B-T090)`, `medmentions_full_ner:I-T037)`, `medmentions_full_ner:B-T037)`, `minimayosrs_sts:6)`, `medmentions_full_ner:I-T020)`, `chebi_nactem_fullpaper_ner:B-Species)`, `mirna_ner:O)`, `bionlp_st_2011_id_RE:Participant)`, `bionlp_st_2013_ge_NER:B-Binding)`, `ddi_corpus_ner:B-DRUG)`, `medmentions_full_ner:I-T078)`, `chemdner_TEXT:MESH:D012965)`, `bionlp_st_2013_cg_ner:I-Organ)`, `bionlp_st_2011_id_NER:B-Binding)`, `chemdner_TEXT:MESH:D006571)`, `mayosrs_sts:4)`, `chemdner_TEXT:MESH:D026422)`, `genia_term_corpus_ner:I-RNA_NA)`, `bionlp_st_2011_epi_RE:None)`, `chemdner_TEXT:MESH:D012265)`, `medmentions_full_ner:B-T195)`, `chemdner_TEXT:MESH:D014443)`, `bionlp_st_2013_gro_ner:I-OrganicChemical)`, `ebm_pico_ner:B-Participant_Age)`, `chemdner_TEXT:MESH:D009584)`, `chemdner_TEXT:MESH:D010862)`, `verspoor_2013_ner:B-Concepts_Ideas)`, `bionlp_st_2013_gro_NER:B-ActivationOfProcess)`, `chemdner_TEXT:MESH:D010118)`, `biorelex_COREF:coref)`, `bionlp_st_2013_gro_ner:I-Enzyme)`, `chemdner_TEXT:MESH:D012530)`, `chemdner_TEXT:MESH:D002351)`, `biorelex_ner:B-gene)`, `chemdner_TEXT:MESH:D013213)`, `medmentions_full_ner:B-T103)`, `chemdner_TEXT:MESH:D010091)`, `ebm_pico_ner:B-Participant_Sex)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndDNA)`, `bionlp_st_2013_gro_ner:B-Phenotype)`, `chemdner_TEXT:MESH:D019791)`, `chemdner_TEXT:MESH:D014280)`, `chemdner_TEXT:MESH:D011094)`, `chia_RE:None)`, `biorelex_RE:None)`, `chemdner_TEXT:MESH:D005230)`, `verspoor_2013_ner:B-cohort-patient)`, `chemdner_TEXT:MESH:D013645)`, `bionlp_st_2013_gro_ner:B-SecondMessenger)`, `mlee_ner:B-Cellular_component)`, `bionlp_shared_task_2009_NER:I-Phosphorylation)`, `mlee_ner:B-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D017275)`, `chemdner_TEXT:MESH:D007053)`, `bionlp_st_2013_ge_RE:Site)`, `genia_term_corpus_ner:O)`, `chemprot_RE:CPR:6)`, `chemdner_TEXT:MESH:D006859)`, `genia_term_corpus_ner:I-other_name)`, `medmentions_full_ner:I-T042)`, `pdr_ner:O)`, `medmentions_full_ner:I-T057)`, `bionlp_st_2013_pc_RE:Product)`, `verspoor_2013_ner:B-size)`, `bionlp_st_2013_pc_NER:B-Acetylation)`, `medmentions_st21pv_ner:B-T017)`, `chia_ner:B-Temporal)`, `chemdner_TEXT:MESH:D003404)`, `bionlp_st_2013_gro_RE:None)`, `bionlp_shared_task_2009_NER:B-Gene_expression)`, `mqp_sts:3)`, `bionlp_st_2013_gro_ner:B-Chemical)`, `chemdner_TEXT:MESH:D013754)`, `mantra_gsc_en_medline_ner:B-GEOG)`, `mirna_ner:B-Specific_miRNAs)`, `chemdner_TEXT:MESH:D012492)`, `medmentions_full_ner:B-T190)`, `bionlp_st_2013_cg_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:B-RNA)`, `chemdner_TEXT:MESH:D011743)`, `chemdner_TEXT:MESH:D010795)`, `bionlp_st_2013_gro_NER:I-PositiveRegulation)`, `chemdner_TEXT:MESH:D002241)`, `medmentions_full_ner:B-T038)`, `bionlp_st_2013_gro_RE:hasAgent)`, `mlee_ner:B-Organism)`, `medmentions_full_ner:I-T168)`, `bioscope_abstracts_ner:O)`, `chemdner_TEXT:MESH:D002599)`, `bionlp_st_2013_pc_ner:I-Simple_chemical)`, `medmentions_full_ner:I-T066)`, `chemdner_TEXT:MESH:D019695)`, `bionlp_st_2013_ge_NER:I-Transcription)`, `mantra_gsc_en_emea_ner:B-DISO)`, `bionlp_st_2013_gro_NER:B-CellDeath)`, `medmentions_st21pv_ner:I-T031)`, `chemdner_TEXT:MESH:D004317)`, `bionlp_st_2013_gro_ner:B-TATAbox)`, `chemdner_TEXT:MESH:D052203)`, `bionlp_st_2013_gro_NER:B-CellFateDetermination)`, `medmentions_st21pv_ner:I-T022)`, `bionlp_st_2013_ge_NER:B-Protein_catabolism)`, `bionlp_st_2011_epi_NER:I-Catalysis)`, `verspoor_2013_ner:I-cohort-patient)`, `chemdner_TEXT:MESH:D010100)`, `an_em_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D045162)`, `chia_RE:Has_qualifier)`, `verspoor_2013_RE:has)`, `chemdner_TEXT:MESH:D021382)`, `bionlp_st_2013_ge_NER:B-Acetylation)`, `medmentions_full_ner:I-T079)`, `bionlp_st_2013_gro_NER:B-Maintenance)`, `biorelex_ner:I-protein-domain)`, `chebi_nactem_abstr_ann1_ner:I-Chemical)`, `bioscope_papers_ner:O)`, `chia_RE:Has_scope)`, `bc5cdr_ner:B-Disease)`, `mlee_ner:I-Cellular_component)`, `medmentions_full_ner:I-T195)`, `spl_adr_200db_train_ner:B-AdverseReaction)`, `bionlp_st_2013_gro_ner:I-Promoter)`, `medmentions_full_ner:B-T040)`, `chemdner_TEXT:MESH:D005960)`, `chemdner_TEXT:MESH:D004164)`, `chemdner_TEXT:MESH:D015032)`, `chemdner_TEXT:MESH:D014255)`, `ebm_pico_ner:B-Outcome_Pain)`, `bionlp_st_2013_gro_ner:I-UpstreamRegulatorySequence)`, `bionlp_st_2013_pc_NER:I-Positive_regulation)`, `bionlp_st_2013_cg_NER:I-Regulation)`, `chemdner_TEXT:MESH:D001151)`, `medmentions_full_ner:I-T077)`, `chemdner_TEXT:MESH:D000081)`, `bionlp_st_2013_gro_NER:B-Stabilization)`, `mayosrs_sts:1)`, `biorelex_ner:B-mutation)`, `chemdner_TEXT:MESH:D000241)`, `chemdner_TEXT:MESH:D007930)`, `bionlp_st_2013_gro_NER:B-MetabolicPathway)`, `chemdner_TEXT:MESH:D013629)`, `chemdner_TEXT:MESH:D016202)`, `tmvar_v1_ner:I-DNAMutation)`, `chemdner_TEXT:MESH:D012502)`, `chemdner_TEXT:MESH:D044945)`, `bionlp_st_2013_cg_ner:I-Cellular_component)`, `mlee_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-AP2EREBPRelatedDomain)`, `chemdner_TEXT:MESH:D002338)`, `mayosrs_sts:5)`, `bionlp_st_2013_gro_ner:B-Intron)`, `genia_term_corpus_ner:I-DNA_domain_or_region)`, `anat_em_ner:I-Immaterial_anatomical_entity)`, `bionlp_st_2013_gro_ner:B-MutatedProtein)`, `ebm_pico_ner:I-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-ProteinCodingRegion)`, `chemdner_TEXT:MESH:D005047)`, `chia_ner:B-Mood)`, `medmentions_st21pv_ner:O)`, `cellfinder_ner:I-Species)`, `bionlp_st_2013_gro_ner:I-InorganicChemical)`, `bionlp_st_2011_id_ner:B-Entity)`, `bionlp_st_2013_cg_NER:I-Catabolism)`, `an_em_ner:I-Cellular_component)`, `medmentions_full_ner:B-T021)`, `bionlp_st_2013_gro_NER:B-Heterodimerization)`, `chemdner_TEXT:MESH:D008315)`, `medmentions_st21pv_ner:I-T170)`, `chemdner_TEXT:MESH:D050112)`, `chia_RE:Subsumes)`, `medmentions_full_ner:I-T099)`, `bionlp_st_2013_gro_ner:I-Protein)`, `chemdner_TEXT:MESH:D047071)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorActivity)`, `mlee_ner:B-Organism_subdivision)`, `chemdner_TEXT:MESH:D016559)`, `medmentions_full_ner:B-T129)`, `genia_term_corpus_ner:I-protein_molecule)`, `mlee_ner:B-Drug_or_compound)`, `bionlp_st_2013_gro_NER:B-Silencing)`, `bionlp_st_2013_gro_ner:I-MolecularStructure)`, `genia_term_corpus_ner:B-nucleotide)`, `chemdner_TEXT:MESH:D003042)`, `mantra_gsc_en_emea_ner:B-ANAT)`, `chemdner_TEXT:MESH:D006690)`, `genia_term_corpus_ner:I-ANDcell_linecell_line)`, `chemdner_TEXT:MESH:D005473)`, `mantra_gsc_en_medline_ner:I-PHYS)`, `bionlp_st_2013_cg_NER:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-BetaScaffoldDomain_WithMinorGrooveContacts)`, `chemdner_TEXT:MESH:D001549)`, `chia_ner:B-Measurement)`, `bionlp_st_2011_id_ner:B-Regulon-operon)`, `bionlp_st_2013_cg_NER:B-Acetylation)`, `pdr_ner:B-Plant)`, `mlee_NER:B-Development)`, `linnaeus_filtered_ner:B-species)`, `bionlp_st_2013_pc_RE:AtLoc)`, `medmentions_full_ner:I-T192)`, `bionlp_st_2013_gro_ner:B-BindingSiteOfProtein)`, `bionlp_st_2013_ge_NER:B-Ubiquitination)`, `bionlp_st_2013_gro_ner:I-ProteinCodingDNARegion)`, `chemdner_TEXT:MESH:D009647)`, `bionlp_st_2013_gro_ner:I-Ligand)`, `bionlp_st_2011_id_ner:O)`, `bionlp_st_2013_gro_NER:I-RNASplicing)`, `bionlp_st_2013_gro_ner:I-ComplexOfProteinAndRNA)`, `bionlp_st_2011_id_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D007501)`, `ehr_rel_sts:5)`, `bionlp_st_2013_gro_ner:B-TranscriptionRegulator)`, `medmentions_full_ner:B-T089)`, `bionlp_st_2011_epi_NER:I-DNA_demethylation)`, `mirna_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-TranscriptionRegulator)`, `bionlp_st_2013_gro_NER:B-ProteinBiosynthesis)`, `scai_chemical_ner:B-ABBREVIATION)`, `bionlp_st_2013_gro_ner:I-Virus)`, `bionlp_st_2011_ge_NER:O)`, `medmentions_full_ner:B-T203)`, `bionlp_st_2013_cg_NER:I-Mutation)`, `bionlp_st_2013_gro_ner:B-ThreeDimensionalMolecularStructure)`, `genetaggold_ner:I-NEWGENE)`, `chemdner_TEXT:MESH:D010705)`, `chia_ner:I-Mood)`, `medmentions_full_ner:I-T068)`, `minimayosrs_sts:4)`, `medmentions_full_ner:I-T097)`, `bionlp_st_2013_gro_ner:I-BetaScaffoldDomain_WithMinorGrooveContacts)`, `mantra_gsc_en_emea_ner:I-PHYS)`, `medmentions_full_ner:I-T104)`, `bio_sim_verb_sts:5)`, `chebi_nactem_abstr_ann1_ner:B-Biological_Activity)`, `bionlp_st_2013_gro_NER:B-IntraCellularProcess)`, `mantra_gsc_en_emea_ner:I-PHEN)`, `mlee_ner:B-Cell)`, `chemdner_TEXT:MESH:D045784)`, `bionlp_st_2013_gro_ner:I-Vitamin)`, `chemdner_TEXT:MESH:D010416)`, `bionlp_st_2013_gro_ner:B-FusionGene)`, `bionlp_st_2013_gro_ner:I-FusionProtein)`, `mlee_NER:B-Remodeling)`, `minimayosrs_sts:8)`, `bionlp_st_2013_gro_ner:B-Enhancer)`, `mantra_gsc_en_emea_ner:O)`, `bionlp_st_2013_gro_ner:B-OpenReadingFrame)`, `bionlp_st_2013_pc_COREF:None)`, `medmentions_full_ner:I-T123)`, `bionlp_st_2013_gro_NER:I-RegulatoryProcess)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfGeneExpression)`, `nlm_gene_ner:B-Domain)`, `bionlp_st_2013_pc_NER:B-Methylation)`, `medmentions_full_ner:B-T057)`, `chemdner_TEXT:MESH:D010226)`, `bionlp_st_2013_gro_ner:B-GeneProduct)`, `ebm_pico_ner:I-Outcome_Other)`, `chemdner_TEXT:MESH:D005223)`, `pdr_RE:Theme)`, `bionlp_shared_task_2009_NER:B-Protein_catabolism)`, `chemdner_TEXT:MESH:D019344)`, `gnormplus_ner:I-FamilyName)`, `verspoor_2013_ner:B-gender)`, `bionlp_st_2013_gro_NER:B-TranscriptionInitiation)`, `spl_adr_200db_train_ner:B-Severity)`, `medmentions_st21pv_ner:B-T097)`, `anat_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_NER:I-RNAMetabolism)`, `bioinfer_ner:I-Protein_complex)`, `anat_em_ner:I-Cell)`, `bionlp_st_2013_gro_ner:B-ProteinDomain)`, `bionlp_st_2013_gro_ner:I-PrimaryStructure)`, `genia_term_corpus_ner:I-other_artificial_source)`, `chemdner_TEXT:MESH:D010098)`, `bionlp_st_2013_gro_ner:I-Enhancer)`, `bionlp_st_2013_gro_ner:I-PositiveTranscriptionRegulator)`, `chemdner_TEXT:MESH:D004051)`, `chemdner_TEXT:MESH:D013853)`, `chebi_nactem_fullpaper_ner:B-Metabolite)`, `diann_iber_eval_en_ner:B-Disability)`, `biorelex_ner:B-peptide)`, `medmentions_full_ner:B-T048)`, `bionlp_st_2013_gro_ner:I-Function)`, `genia_term_corpus_ner:I-DNA_NA)`, `mlee_ner:I-Anatomical_system)`, `bioinfer_ner:B-Individual_protein)`, `verspoor_2013_ner:I-Physiology)`, `genia_term_corpus_ner:I-RNA_molecule)`, `chemdner_TEXT:MESH:D000255)`, `minimayosrs_sts:7)`, `mlee_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-ResponseProcess)`, `mantra_gsc_en_medline_ner:I-LIVB)`, `chemdner_TEXT:MESH:D010649)`, `seth_corpus_ner:B-Gene)`, `bionlp_st_2013_gro_ner:B-Attenuator)`, `chemdner_TEXT:MESH:D015363)`, `bionlp_st_2013_pc_NER:B-Inactivation)`, `medmentions_full_ner:I-T191)`, `mlee_ner:I-Organ)`, `chemdner_TEXT:MESH:D011765)`, `bionlp_shared_task_2009_NER:B-Binding)`, `an_em_ner:B-Cellular_component)`, `genia_term_corpus_ner:I-RNA_substructure)`, `medmentions_full_ner:B-T051)`, `anat_em_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_RE:hasPatient3)`, `chemdner_TEXT:MESH:D013634)`, `chemdner_TEXT:MESH:D014414)`, `chia_RE:Has_index)`, `ddi_corpus_ner:B-GROUP)`, `bionlp_st_2013_gro_ner:B-MutantProtein)`, `bionlp_st_2013_ge_NER:I-Negative_regulation)`, `biorelex_ner:I-amino-acid)`, `chemdner_TEXT:MESH:D053279)`, `chemprot_RE:CPR:2)`, `bionlp_st_2013_gro_ner:B-bHLHTF)`, `bionlp_st_2013_cg_NER:I-Breakdown)`, `scai_chemical_ner:I-ABBREVIATION)`, `pdr_NER:B-Cause_of_disease)`, `chemdner_TEXT:MESH:D002219)`, `medmentions_full_ner:B-T044)`, `mirna_ner:B-Non-Specific_miRNAs)`, `chemdner_TEXT:MESH:D020748)`, `bionlp_shared_task_2009_RE:Theme)`, `chemdner_TEXT:MESH:D001647)`, `bionlp_st_2011_ge_NER:I-Regulation)`, `bionlp_st_2013_pc_ner:B-Gene_or_gene_product)`, `biorelex_ner:I-protein)`, `mantra_gsc_en_medline_ner:B-PROC)`, `medmentions_full_ner:I-T081)`, `medmentions_st21pv_ner:B-T022)`, `chia_ner:B-Multiplier)`, `bionlp_st_2013_gro_NER:B-GeneMutation)`, `chemdner_TEXT:MESH:D002232)`, `chemdner_TEXT:MESH:D010456)`, `biosses_sts:7)`, `medmentions_full_ner:B-T071)`, `chemdner_TEXT:MESH:D008628)`, `biorelex_ner:I-protein-complex)`, `chemdner_TEXT:MESH:D007328)`, `bionlp_st_2013_pc_NER:I-Activation)`, `bionlp_st_2013_cg_NER:B-Metabolism)`, `scai_chemical_ner:I-PARTIUPAC)`, `verspoor_2013_ner:B-age)`, `medmentions_full_ner:B-T122)`, `medmentions_full_ner:I-T050)`, `genia_term_corpus_ner:B-ANDother_nameother_name)`, `bionlp_st_2013_gro_NER:B-SPhase)`, `chemdner_TEXT:MESH:D012500)`, `mlee_NER:B-Metabolism)`, `bionlp_st_2011_id_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D002794)`, `bionlp_st_2013_gro_NER:B-ProteinTransport)`, `chemdner_TEXT:MESH:D006028)`, `bionlp_st_2013_gro_RE:hasPatient2)`, `chemdner_TEXT:MESH:D009822)`, `bionlp_st_2013_cg_ner:I-Cancer)`, `bionlp_shared_task_2009_ner:I-Entity)`, `pcr_ner:B-Herb)`, `pubmed_qa_labeled_fold0_CLF:yes)`, `bionlp_st_2013_gro_NER:I-NegativeRegulation)`, `bionlp_st_2013_cg_NER:B-Dephosphorylation)`, `anat_em_ner:B-Multi-tissue_structure)`, `chemdner_TEXT:MESH:D008274)`, `medmentions_full_ner:B-T025)`, `chemprot_RE:CPR:9)`, `bionlp_st_2013_pc_RE:Participant)`, `bionlp_st_2013_pc_ner:B-Simple_chemical)`, `genia_term_corpus_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:B-bZIP)`, `bionlp_st_2013_gro_ner:I-Eukaryote)`, `bionlp_st_2013_pc_ner:I-Complex)`, `hprd50_ner:I-protein)`, `medmentions_full_ner:B-T020)`, `bionlp_st_2013_gro_ner:B-Agonist)`, `medmentions_full_ner:B-T030)`, `chemdner_TEXT:MESH:D009536)`, `medmentions_full_ner:B-T169)`, `genia_term_corpus_ner:I-nucleotide)`, `bionlp_st_2013_gro_NER:I-ProteinCatabolism)`, `bc5cdr_ner:O)`, `chemdner_TEXT:MESH:D003078)`, `medmentions_full_ner:I-T040)`, `chemdner_TEXT:MESH:D005963)`, `bionlp_st_2013_gro_ner:B-ExpressionProfiling)`, `mantra_gsc_en_emea_ner:I-DEVI)`, `mlee_NER:B-Cell_division)`, `ebm_pico_ner:B-Intervention_Pharmacological)`, `chemdner_TEXT:MESH:D008790)`, `mantra_gsc_en_emea_ner:I-ANAT)`, `mantra_gsc_en_medline_ner:B-ANAT)`, `chemdner_TEXT:MESH:D003545)`, `bionlp_st_2013_gro_NER:I-IntraCellularTransport)`, `bionlp_st_2013_gro_NER:I-CellDivision)`, `chemdner_TEXT:MESH:D013438)`, `bionlp_st_2011_id_NER:I-Negative_regulation)`, `bionlp_st_2013_gro_NER:I-DevelopmentalProcess)`, `mlee_ner:B-Protein_domain_or_region)`, `chemdner_TEXT:MESH:D014978)`, `bionlp_st_2011_id_NER:O)`, `bionlp_st_2013_gro_ner:I-ReporterGeneConstruction)`, `medmentions_full_ner:I-T025)`, `bionlp_st_2019_bb_RE:Exhibits)`, `ddi_corpus_ner:I-GROUP)`, `chemdner_TEXT:MESH:D011241)`, `chemdner_TEXT:MESH:D010446)`, `bionlp_st_2013_gro_ner:I-ExperimentalMethod)`, `anat_em_ner:B-Tissue)`, `chemdner_TEXT:MESH:D000470)`, `bionlp_st_2013_pc_NER:I-Inactivation)`, `bionlp_st_2013_gro_ner:I-Agonist)`, `medmentions_full_ner:B-T024)`, `mlee_NER:I-Transcription)`, `bionlp_st_2011_epi_NER:B-Deglycosylation)`, `bionlp_st_2013_cg_NER:B-Cell_death)`, `chemdner_TEXT:MESH:D000266)`, `chemdner_TEXT:MESH:D019833)`, `genia_term_corpus_ner:I-RNA_family_or_group)`, `biosses_sts:8)`, `lll_RE:genic_interaction)`, `bionlp_st_2013_gro_ner:B-OrganicChemical)`, `chemdner_TEXT:MESH:D013267)`, `bionlp_st_2013_gro_ner:I-TranscriptionCofactor)`, `biorelex_ner:B-protein-region)`, `chemdner_TEXT:MESH:D001565)`, `genia_term_corpus_ner:B-cell_line)`, `bionlp_st_2013_gro_NER:B-Cleavage)`, `ddi_corpus_RE:EFFECT)`, `bionlp_st_2013_cg_NER:B-Planned_process)`, `bionlp_st_2013_cg_ner:I-Immaterial_anatomical_entity)`, `chemdner_TEXT:MESH:D007660)`, `medmentions_full_ner:I-T090)`, `bionlp_st_2013_gro_ner:I-CpGIsland)`, `bionlp_st_2013_gro_ner:B-AminoAcid)`, `chemdner_TEXT:MESH:D001095)`, `mlee_NER:I-Death)`, `bionlp_st_2013_cg_ner:I-Anatomical_system)`, `bionlp_st_2013_gro_NER:B-Decrease)`, `bionlp_st_2013_pc_NER:B-Hydroxylation)`, `chemdner_TEXT:None)`, `bio_sim_verb_sts:3)`, `biorelex_ner:B-protein)`, `bionlp_st_2013_gro_ner:I-BasicDomain)`, `bionlp_st_2011_ge_ner:I-Entity)`, `bionlp_st_2013_gro_ner:B-PhysicalContinuant)`, `chemprot_RE:CPR:4)`, `chemdner_TEXT:MESH:D003345)`, `chemdner_TEXT:MESH:D010080)`, `mantra_gsc_en_patents_ner:O)`, `bionlp_st_2013_gro_ner:B-AntisenseRNA)`, `bionlp_st_2013_gro_ner:B-ProteinCodingDNARegion)`, `chemdner_TEXT:MESH:D010768)`, `chebi_nactem_fullpaper_ner:I-Protein)`, `genia_term_corpus_ner:I-multi_cell)`, `bionlp_st_2013_gro_ner:I-Gene)`, `medmentions_full_ner:B-T042)`, `chemdner_TEXT:MESH:D006034)`, `biorelex_ner:I-brand)`, `chebi_nactem_abstr_ann1_ner:I-Species)`, `chemdner_TEXT:MESH:D012236)`, `bionlp_st_2013_gro_ner:I-GeneProduct)`, `chemdner_TEXT:MESH:D005665)`, `chemdner_TEXT:MESH:D008715)`, `medmentions_st21pv_ner:I-T103)`, `ddi_corpus_RE:None)`, `medmentions_st21pv_ner:I-T091)`, `chemdner_TEXT:MESH:D019158)`, `chemdner_TEXT:MESH:D001280)`, `chemdner_TEXT:MESH:D009249)`, `medmentions_full_ner:I-T067)`, `medmentions_full_ner:B-T005)`, `bionlp_st_2013_cg_NER:I-Remodeling)`, `chemdner_TEXT:MESH:D000166)`, `osiris_ner:B-variant)`, `spl_adr_200db_train_ner:I-DrugClass)`, `mirna_ner:I-Species)`, `medmentions_st21pv_ner:I-T033)`, `ebm_pico_ner:I-Participant_Age)`, `medmentions_full_ner:B-T095)`, `bionlp_st_2013_gro_NER:B-RNAMetabolism)`, `chemdner_TEXT:MESH:D005231)`, `medmentions_full_ner:B-T062)`, `bionlp_st_2011_ge_NER:I-Gene_expression)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactor)`, `genia_term_corpus_ner:B-protein_domain_or_region)`, `mantra_gsc_en_emea_ner:B-PROC)`, `mlee_NER:I-Pathway)`, `bionlp_st_2013_gro_NER:I-BindingOfProteinToProteinBindingSiteOfProtein)`, `bionlp_st_2011_id_COREF:coref)`, `biosses_sts:6)`, `biorelex_ner:I-organism)`, `chia_ner:B-Value)`, `verspoor_2013_ner:B-body-part)`, `chemdner_TEXT:MESH:D004974)`, `chia_RE:Has_mood)`, `medmentions_st21pv_ner:B-T074)`, `chemdner_TEXT:MESH:D000535)`, `verspoor_2013_ner:I-Disorder)`, `bionlp_st_2013_gro_NER:B-BindingToMolecularEntity)`, `bionlp_st_2013_gro_ner:I-ReporterGene)`, `mayosrs_sts:8)`, `bionlp_st_2013_cg_ner:I-DNA_domain_or_region)`, `bionlp_st_2013_gro_NER:I-Pathway)`, `medmentions_st21pv_ner:I-T168)`, `bionlp_st_2013_gro_NER:B-NegativeRegulation)`, `medmentions_full_ner:B-T123)`, `bionlp_st_2013_pc_NER:B-Positive_regulation)`, `bionlp_st_2013_gro_NER:I-FormationOfProteinDNAComplex)`, `chemdner_TEXT:MESH:D000577)`, `mlee_NER:B-Ubiquitination)`, `chemdner_TEXT:MESH:D003630)`, `bionlp_st_2013_gro_ner:B-Transcript)`, `bionlp_st_2013_cg_NER:I-Transcription)`, `anat_em_ner:B-Organ)`, `anat_em_ner:I-Organism_substance)`, `spl_adr_200db_train_ner:B-DrugClass)`, `bionlp_st_2013_gro_ner:I-ProteinSubunit)`, `biorelex_ner:B-protein-domain)`, `chemdner_TEXT:MESH:D006051)`, `bionlp_st_2011_id_NER:B-Process)`, `bionlp_st_2013_pc_NER:B-Ubiquitination)`, `bionlp_st_2013_pc_NER:B-Transcription)`, `chemdner_TEXT:MESH:D006838)`, `bionlp_st_2013_gro_RE:hasPatient5)`, `bionlp_st_2013_ge_NER:B-Localization)`, `chemdner_TEXT:MESH:D011759)`, `chemdner_TEXT:MESH:D053243)`, `biorelex_ner:I-mutation)`, `mantra_gsc_en_emea_ner:I-LIVB)`, `bionlp_st_2013_gro_NER:I-Transport)`, `bionlp_st_2011_id_RE:Site)`, `chemdner_TEXT:MESH:D015474)`, `bionlp_st_2013_gro_NER:B-Dimerization)`, `bionlp_st_2013_cg_NER:I-Localization)`, `medmentions_full_ner:I-T032)`, `chemdner_TEXT:MESH:D018036)`, `medmentions_full_ner:I-T167)`, `chemprot_RE:CPR:5)`, `minimayosrs_sts:2)`, `biorelex_ner:B-protein-DNA-complex)`, `cellfinder_ner:I-CellComponent)`, `nlm_gene_ner:B-Other)`, `medmentions_full_ner:I-T019)`, `chebi_nactem_abstr_ann1_ner:B-Spectral_Data)`, `bionlp_st_2013_cg_ner:I-Multi-tissue_structure)`, `medmentions_full_ner:B-T010)`, `mantra_gsc_en_medline_ner:I-GEOG)`, `chemprot_ner:I-GENE-Y)`, `mirna_ner:I-Diseases)`, `an_em_ner:O)`, `bionlp_st_2013_cg_NER:B-Remodeling)`, `medmentions_st21pv_ner:I-T058)`, `scicite_TEXT:background)`, `bionlp_st_2013_cg_NER:B-Mutation)`, `genia_term_corpus_ner:B-mono_cell)`, `bionlp_st_2013_gro_ner:B-DNA)`, `medmentions_full_ner:I-T114)`, `bionlp_st_2011_id_RE:Theme)`, `genetaggold_ner:B-NEWGENE)`, `mlee_ner:I-Organism_subdivision)`, `bionlp_shared_task_2009_NER:I-Regulation)`, `bionlp_st_2013_gro_ner:B-Microorganism)`, `chemdner_TEXT:MESH:D006108)`, `biorelex_ner:B-amino-acid)`, `bioinfer_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-Chemical)`, `mantra_gsc_en_patents_ner:I-DEVI)`, `mantra_gsc_en_medline_ner:O)`, `bionlp_st_2013_pc_NER:I-Regulation)`, `medmentions_full_ner:B-T043)`, `scicite_TEXT:result)`, `bionlp_st_2013_ge_NER:I-Binding)`, `chemdner_TEXT:MESH:D011441)`, `genia_term_corpus_ner:I-protein_domain_or_region)`, `bionlp_st_2011_epi_RE:Cause)`, `bionlp_st_2013_gro_ner:B-Nucleosome)`, `chemdner_TEXT:MESH:D011223)`, `chebi_nactem_abstr_ann1_ner:B-Protein)`, `bionlp_st_2013_gro_RE:hasFunction)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorActivity)`, `biorelex_ner:B-protein-family)`, `bionlp_st_2013_cg_ner:B-Gene_or_gene_product)`, `tmvar_v1_ner:B-SNP)`, `bionlp_st_2013_gro_ner:B-ExperimentalMethod)`, `bionlp_st_2013_gro_ner:B-ReporterGeneConstruction)`, `bionlp_st_2011_ge_NER:B-Transcription)`, `chemdner_TEXT:MESH:D004041)`, `chemdner_TEXT:MESH:D000631)`, `chebi_nactem_fullpaper_ner:I-Species)`, `medmentions_full_ner:B-T170)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelix)`, `bionlp_st_2013_cg_ner:B-Organism_subdivision)`, `genia_term_corpus_ner:I-DNA_molecule)`, `bionlp_st_2013_cg_NER:I-Glycolysis)`, `an_em_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_NER:B-TranscriptionTermination)`, `bionlp_st_2013_gro_NER:B-CellAging)`, `bionlp_st_2013_cg_ner:B-Protein_domain_or_region)`, `anat_em_ner:B-Organism_substance)`, `medmentions_full_ner:B-T053)`, `mlee_ner:B-Multi-tissue_structure)`, `biosses_sts:4)`, `bioscope_abstracts_ner:I-speculation)`, `chemdner_TEXT:MESH:D053644)`, `bionlp_st_2013_cg_NER:I-Translation)`, `tmvar_v1_ner:B-DNAMutation)`, `genia_term_corpus_ner:B-RNA_substructure)`, `an_em_ner:B-Anatomical_system)`, `bionlp_st_2013_gro_ner:B-Conformation)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfTranscriptionOfGene)`, `medmentions_full_ner:I-T069)`, `chemdner_TEXT:MESH:D006820)`, `chemdner_TEXT:MESH:D015725)`, `chemdner_TEXT:MESH:D010281)`, `mlee_NER:B-Pathway)`, `bionlp_st_2011_id_NER:I-Regulation)`, `bionlp_st_2013_gro_NER:I-GeneExpression)`, `medmentions_full_ner:I-T073)`, `biosses_sts:2)`, `medmentions_full_ner:I-T043)`, `chemdner_TEXT:MESH:D001152)`, `bionlp_st_2013_gro_ner:I-DNAMolecule)`, `chemdner_TEXT:MESH:D015636)`, `chemdner_TEXT:MESH:D000666)`, `chemprot_RE:None)`, `bionlp_st_2013_gro_ner:B-Sequence)`, `chemdner_TEXT:MESH:D009151)`, `chia_ner:B-Observation)`, `an_em_COREF:coref)`, `medmentions_full_ner:B-T120)`, `bionlp_st_2013_gro_ner:B-Tissue)`, `bionlp_st_2013_gro_ner:B-MolecularEntity)`, `bionlp_st_2013_pc_NER:B-Dephosphorylation)`, `chemdner_TEXT:MESH:D044242)`, `bionlp_st_2013_gro_ner:B-FusionProtein)`, `biorelex_ner:B-cell)`, `bionlp_st_2013_gro_NER:B-Disease)`, `bionlp_st_2011_id_RE:None)`, `biorelex_ner:B-protein-motif)`, `bionlp_st_2013_pc_NER:I-Localization)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomain)`, `bionlp_st_2013_gro_ner:B-Locus)`, `genia_term_corpus_ner:B-other_organic_compound)`, `seth_corpus_ner:B-SNP)`, `pcr_ner:O)`, `genia_term_corpus_ner:I-virus)`, `bionlp_st_2013_gro_ner:I-Peptide)`, `chebi_nactem_abstr_ann1_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:B-RNAMolecule)`, `bionlp_st_2013_gro_ner:B-SequenceHomologyAnalysis)`, `chemdner_TEXT:MESH:D005054)`, `bionlp_st_2013_ge_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:B-CellularProcess)`, `bionlp_st_2013_ge_RE:Site2)`, `verspoor_2013_ner:B-Phenomena)`, `chia_ner:I-Temporal)`, `bionlp_st_2013_gro_NER:I-Localization)`, `bionlp_st_2013_cg_NER:B-Ubiquitination)`, `chemdner_TEXT:MESH:D009020)`, `bionlp_st_2013_cg_RE:FromLoc)`, `mlee_ner:B-Organism_substance)`, `genia_term_corpus_ner:I-tissue)`, `medmentions_st21pv_ner:I-T082)`, `chemdner_TEXT:MESH:D054358)`, `medmentions_full_ner:I-T052)`, `chemdner_TEXT:MESH:D005459)`, `chemdner_TEXT:MESH:D047188)`, `medmentions_full_ner:I-T031)`, `chemdner_TEXT:MESH:D013890)`, `chemdner_TEXT:MESH:D004573)`, `genia_term_corpus_ner:B-peptide)`, `an_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-MessengerRNA)`, `medmentions_full_ner:B-T171)`, `bionlp_st_2013_gro_NER:B-Affecting)`, `genia_term_corpus_ner:I-body_part)`, `bionlp_st_2013_gro_ner:B-Prokaryote)`, `chemdner_TEXT:MESH:D013844)`, `medmentions_full_ner:I-T061)`, `bionlp_st_2013_pc_NER:B-Negative_regulation)`, `bionlp_st_2013_gro_ner:I-EukaryoticCell)`, `pdr_ner:I-Plant)`, `chemdner_TEXT:MESH:D024341)`, `medmentions_full_ner:I-T092)`, `chemdner_TEXT:MESH:D020319)`, `bionlp_st_2013_cg_NER:B-Cell_transformation)`, `bionlp_st_2013_gro_NER:B-BindingOfTranscriptionFactorToDNA)`, `an_em_ner:I-Anatomical_system)`, `bionlp_st_2011_epi_NER:B-Hydroxylation)`, `bionlp_st_2013_gro_ner:I-Exon)`, `cellfinder_ner:B-Species)`, `bionlp_st_2013_gro_NER:B-Pathway)`, `bionlp_st_2013_ge_NER:B-Protein_modification)`, `bionlp_st_2013_gro_ner:I-FusionGene)`, `bionlp_st_2011_rel_ner:B-Entity)`, `bionlp_st_2011_id_RE:CSite)`, `bionlp_st_2013_ge_NER:B-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-BindingAssay)`, `bionlp_st_2013_gro_NER:B-CellDivision)`, `bionlp_st_2019_bb_ner:I-Microorganism)`, `medmentions_full_ner:I-T059)`, `chemdner_TEXT:MESH:D011108)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-GeneRegion)`, `bionlp_st_2013_cg_COREF:None)`, `chemdner_TEXT:MESH:D010261)`, `mlee_NER:B-Binding)`, `chemprot_ner:I-CHEMICAL)`, `bionlp_st_2011_id_RE:ToLoc)`, `biorelex_ner:I-organelle)`, `chemdner_TEXT:MESH:D004318)`, `genia_term_corpus_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:B-RNAPolymerase)`, `bionlp_st_2013_gro_ner:B-CellComponent)`, `bionlp_st_2013_gro_NER:B-RegulationOfGeneExpression)`, `bionlp_st_2013_gro_ner:B-Peptide)`, `bionlp_shared_task_2009_NER:B-Transcription)`, `biorelex_ner:B-tissue)`, `pico_extraction_ner:B-participant)`, `chia_ner:I-Visit)`, `chemdner_TEXT:MESH:D011807)`, `chemdner_TEXT:MESH:D014501)`, `bionlp_st_2013_gro_NER:I-IntraCellularProcess)`, `ehr_rel_sts:7)`, `pico_extraction_ner:I-intervention)`, `chemdner_TEXT:MESH:D001599)`, `bionlp_st_2013_gro_ner:I-RegulatoryDNARegion)`, `medmentions_st21pv_ner:I-T037)`, `chemdner_TEXT:MESH:D055768)`, `bionlp_st_2013_gro_ner:B-ChromosomalDNA)`, `chemdner_TEXT:MESH:D008550)`, `bionlp_st_2013_pc_RE:Site)`, `medmentions_full_ner:I-T087)`, `chemdner_TEXT:MESH:D001583)`, `bionlp_st_2011_epi_NER:B-Dehydroxylation)`, `ehr_rel_sts:3)`, `bionlp_st_2013_gro_ner:I-MutantProtein)`, `chemdner_TEXT:MESH:D011804)`, `medmentions_full_ner:B-T091)`, `bionlp_st_2013_cg_RE:CSite)`, `linnaeus_ner:O)`, `medmentions_st21pv_ner:B-T201)`, `verspoor_2013_ner:B-Disorder)`, `bionlp_st_2013_cg_NER:I-Death)`, `bioinfer_ner:I-Individual_protein)`, `medmentions_full_ner:B-T191)`, `verspoor_2013_ner:B-ethnicity)`, `chemdner_TEXT:MESH:D002083)`, `genia_term_corpus_ner:B-carbohydrate)`, `genia_term_corpus_ner:B-DNA_molecule)`, `medmentions_full_ner:B-T069)`, `pdr_NER:I-Treatment_of_disease)`, `mlee_ner:B-Anatomical_system)`, `chebi_nactem_fullpaper_ner:B-Spectral_Data)`, `chemdner_TEXT:MESH:D005419)`, `bionlp_st_2013_gro_ner:I-Nucleotide)`, `medmentions_full_ner:B-T194)`, `chemdner_TEXT:MESH:D005947)`, `chemdner_TEXT:MESH:D008627)`, `bionlp_st_2013_gro_NER:B-ExperimentalIntervention)`, `chemdner_TEXT:MESH:D011073)`, `chia_RE:Has_negation)`, `verspoor_2013_ner:I-mutation)`, `chemdner_TEXT:MESH:D004224)`, `chemdner_TEXT:MESH:D005663)`, `medmentions_full_ner:I-T094)`, `chemdner_TEXT:MESH:D006877)`, `ebm_pico_ner:B-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressor)`, `biorelex_ner:I-cell)`, `bionlp_st_2013_gro_NER:I-BindingOfProteinToDNA)`, `verspoor_2013_RE:None)`, `bionlp_st_2013_gro_NER:B-ProteinModification)`, `chemdner_TEXT:MESH:D047090)`, `medmentions_full_ner:I-T204)`, `chemdner_TEXT:MESH:D006843)`, `biorelex_ner:I-protein-family)`, `chemdner_TEXT:MESH:D012694)`, `bionlp_st_2013_gro_ner:B-TranslationFactor)`, `scai_chemical_ner:B-)`, `bionlp_st_2013_gro_ner:B-Exon)`, `medmentions_full_ner:I-T083)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivatorActivity)`, `medmentions_full_ner:I-T101)`, `medmentions_full_ner:B-T034)`, `bionlp_st_2013_gro_ner:I-Histone)`, `ddi_corpus_RE:MECHANISM)`, `mantra_gsc_en_emea_ner:I-PROC)`, `genia_term_corpus_ner:I-peptide)`, `bionlp_st_2013_cg_NER:B-Cell_proliferation)`, `chemdner_TEXT:MESH:D004140)`, `medmentions_full_ner:B-T083)`, `diann_iber_eval_en_ner:I-Disability)`, `bionlp_st_2013_gro_NER:B-PosttranslationalModification)`, `biorelex_ner:I-fusion-protein)`, `chemdner_TEXT:MESH:D020910)`, `chemdner_TEXT:MESH:D014747)`, `bionlp_st_2013_ge_NER:B-Gene_expression)`, `biorelex_ner:I-tissue)`, `mantra_gsc_en_patents_ner:B-LIVB)`, `medmentions_full_ner:O)`, `medmentions_full_ner:B-T077)`, `bionlp_st_2013_gro_ner:I-Operon)`, `chemdner_TEXT:MESH:D002392)`, `chemdner_TEXT:MESH:D014498)`, `chemdner_TEXT:MESH:D002368)`, `chemdner_TEXT:MESH:D018817)`, `bionlp_st_2013_ge_NER:I-Regulation)`, `genia_term_corpus_ner:B-atom)`, `chemdner_TEXT:MESH:D011092)`, `chemdner_TEXT:MESH:D015283)`, `chemdner_TEXT:MESH:D018698)`, `chemdner_TEXT:MESH:D009569)`, `muchmore_en_ner:I-umlsterm)`, `bionlp_st_2013_cg_NER:B-Death)`, `nlm_gene_ner:I-Other)`, `medmentions_full_ner:B-T109)`, `osiris_ner:I-variant)`, `ehr_rel_sts:6)`, `chemdner_TEXT:MESH:D001120)`, `mlee_ner:I-Protein_domain_or_region)`, `bionlp_st_2013_pc_NER:B-Dissociation)`, `bionlp_st_2013_cg_NER:B-Metastasis)`, `chemdner_TEXT:MESH:D014204)`, `chemdner_TEXT:MESH:D005857)`, `medmentions_full_ner:I-T030)`, `chemdner_TEXT:MESH:D019256)`, `bionlp_st_2013_gro_ner:B-Polymerase)`, `chia_ner:B-Negation)`, `bionlp_st_2013_gro_NER:B-CellularMetabolicProcess)`, `bionlp_st_2013_gro_NER:B-CellDifferentiation)`, `biorelex_ner:I-protein-motif)`, `medmentions_full_ner:I-T093)`, `chemdner_TEXT:MESH:D019820)`, `anat_em_ner:B-Pathological_formation)`, `bionlp_shared_task_2009_NER:B-Localization)`, `genia_term_corpus_ner:B-RNA_domain_or_region)`, `chemdner_TEXT:MESH:D014668)`, `bionlp_st_2013_pc_ner:I-Gene_or_gene_product)`, `chemdner_TEXT:MESH:D019207)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToProteinBindingSiteOfDNA)`, `medmentions_full_ner:B-T059)`, `bionlp_st_2013_gro_ner:B-Ligand)`, `bio_sim_verb_sts:6)`, `biorelex_ner:B-experimental-construct)`, `bionlp_st_2013_gro_ner:I-DNA)`, `pdr_NER:O)`, `chemdner_TEXT:MESH:D008670)`, `bionlp_st_2011_ge_RE:Cause)`, `chemdner_TEXT:MESH:D015232)`, `bionlp_st_2013_pc_NER:O)`, `bionlp_st_2013_gro_NER:B-FormationOfProteinDNAComplex)`, `medmentions_full_ner:B-T121)`, `bionlp_shared_task_2009_NER:B-Regulation)`, `chemdner_TEXT:MESH:D009534)`, `chemdner_TEXT:MESH:D014451)`, `bionlp_st_2011_id_RE:AtLoc)`, `chemdner_TEXT:MESH:D011799)`, `medmentions_st21pv_ner:B-T204)`, `genia_term_corpus_ner:I-protein_subunit)`, `biorelex_ner:I-assay)`, `chemdner_TEXT:MESH:D005680)`, `an_em_ner:I-Organism_substance)`, `chemdner_TEXT:MESH:D010368)`, `chemdner_TEXT:MESH:D000872)`, `bionlp_st_2011_id_NER:I-Gene_expression)`, `bionlp_st_2013_cg_NER:B-Regulation)`, `mlee_ner:I-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D001393)`, `medmentions_full_ner:I-T038)`, `chemdner_TEXT:MESH:D047311)`, `chemdner_TEXT:MESH:D011453)`, `chemdner_TEXT:MESH:D020106)`, `chemdner_TEXT:MESH:D019257)`, `bionlp_st_2013_gro_ner:B-NuclearReceptor)`, `chemdner_TEXT:MESH:D002117)`, `genia_term_corpus_ner:B-lipid)`, `bionlp_st_2013_gro_ner:B-SmallInterferingRNA)`, `chemdner_TEXT:MESH:D011205)`, `chemdner_TEXT:MESH:D002686)`, `bionlp_st_2013_gro_NER:B-Translation)`, `ebm_pico_ner:I-Intervention_Psychological)`, `mlee_ner:I-Drug_or_compound)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D000688)`, `bionlp_st_2011_ge_RE:None)`, `bionlp_st_2013_gro_ner:B-ProteinSubunit)`, `genia_term_corpus_ner:I-ANDother_nameother_name)`, `bionlp_st_2013_gro_NER:I-Heterodimerization)`, `pico_extraction_ner:B-intervention)`, `bionlp_st_2013_cg_ner:I-Organism)`, `bionlp_st_2013_gro_ner:I-ProteinDomain)`, `bionlp_st_2013_gro_NER:I-BindingToProtein)`, `scai_chemical_ner:I-)`, `biorelex_ner:B-experiment-tag)`, `ebm_pico_ner:B-Intervention_Physical)`, `bionlp_st_2013_cg_RE:ToLoc)`, `bionlp_st_2013_gro_NER:B-FormationOfTranscriptionFactorComplex)`, `linnaeus_ner:B-species)`, `medmentions_full_ner:I-T062)`, `chemdner_TEXT:MESH:D014640)`, `mlee_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D008701)`, `mlee_NER:O)`, `chemdner_TEXT:MESH:D014302)`, `genia_term_corpus_ner:B-RNA_family_or_group)`, `medmentions_full_ner:I-T091)`, `medmentions_full_ner:B-T022)`, `medmentions_full_ner:B-T074)`, `bionlp_st_2013_gro_NER:B-ProteinCatabolism)`, `bionlp_st_2013_gro_RE:hasPatient4)`, `chemdner_TEXT:MESH:D011388)`, `bionlp_st_2013_ge_NER:I-Phosphorylation)`, `bionlp_st_2013_gro_NER:I-CellAdhesion)`, `anat_em_ner:I-Organ)`, `medmentions_full_ner:B-T045)`, `chemdner_TEXT:MESH:D008727)`, `chebi_nactem_abstr_ann1_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-RNAPolymeraseII)`, `nlm_gene_ner:B-STARGENE)`, `mantra_gsc_en_emea_ner:B-OBJC)`, `bionlp_st_2013_gro_ner:B-DNABindingDomainOfProtein)`, `chemdner_TEXT:MESH:D010636)`, `chemdner_TEXT:MESH:D004061)`, `mlee_NER:I-Binding)`, `medmentions_full_ner:B-T075)`, `medmentions_full_ner:B-UnknownType)`, `chemdner_TEXT:MESH:D019081)`, `bionlp_st_2013_gro_NER:I-Binding)`, `medmentions_full_ner:I-T005)`, `chemdner_TEXT:MESH:D009821)` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_foo_en_5.2.0_3.0_1699292612679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_foo_en_5.2.0_3.0_1699292612679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_foo","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_foo","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.foo.by_leonweber").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_foo| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|420.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/leonweber/foo \ No newline at end of file From 69ce5c40ac26aa6a8693482c4acf79b40729ecb1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:11:51 +0700 Subject: [PATCH 267/667] Add model 2023-11-06-bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en --- ...arge_cased_finetuned_conll03_english_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en.md new file mode 100644 index 00000000000000..2d72049e13380e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Large Cased model (from dbmdz) +author: John Snow Labs +name: bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-large-cased-finetuned-conll03-english` is a English model originally trained by `dbmdz`. + +## Predicted Entities + +`PER`, `LOC`, `MISC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en_5.2.0_3.0_1699291043018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english_en_5.2.0_3.0_1699291043018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.cased_large_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_dbmdz_bert_large_cased_finetuned_conll03_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english \ No newline at end of file From 4af43769bfd6e8471617d4ad041b17814001519d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:12:51 +0700 Subject: [PATCH 268/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_norwegian_focal_v2_en --- ...ula_popoluca_tiny_norwegian_focal_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_norwegian_focal_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_norwegian_focal_v2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_norwegian_focal_v2_en.md new file mode 100644 index 00000000000000..36e3acb1f64ca0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_norwegian_focal_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_norwegian_focal_v2 BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_norwegian_focal_v2 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_norwegian_focal_v2` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_norwegian_focal_v2_en_5.2.0_3.0_1699300739842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_norwegian_focal_v2_en_5.2.0_3.0_1699300739842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_norwegian_focal_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_norwegian_focal_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_norwegian_focal_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_no_focal_v2 \ No newline at end of file From 0eba855c276d4add349fbc5caf9f69aeff4c1f17 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:13:52 +0700 Subject: [PATCH 269/667] Add model 2023-11-06-bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en --- ...t_tonga_tonga_islands_distilbert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en.md new file mode 100644 index 00000000000000..df2ac6a6995563 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner BertForTokenClassification from importsmart +author: John Snow Labs +name: bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner` is a English model originally trained by importsmart. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en_5.2.0_3.0_1699292534714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner_en_5.2.0_3.0_1699292534714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_importsmart_bert_tonga_tonga_islands_distilbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.2 MB| + +## References + +https://huggingface.co/importsmart/bert-to-distilbert-NER \ No newline at end of file From f6024453cb6dee7f51a1ca79f2ebea99effe8d71 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:14:52 +0700 Subject: [PATCH 270/667] Add model 2023-11-06-bert_ner_ksaluja_bert_finetuned_ner_en --- ...-bert_ner_ksaluja_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ksaluja_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ksaluja_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ksaluja_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..2241c34c50717b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ksaluja_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_ksaluja_bert_finetuned_ner BertForTokenClassification from kSaluja +author: John Snow Labs +name: bert_ner_ksaluja_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ksaluja_bert_finetuned_ner` is a English model originally trained by kSaluja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ksaluja_bert_finetuned_ner_en_5.2.0_3.0_1699293363364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ksaluja_bert_finetuned_ner_en_5.2.0_3.0_1699293363364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ksaluja_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_ksaluja_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ksaluja_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/kSaluja/bert-finetuned-ner \ No newline at end of file From af33dee8da9cc287a9322a1afee468368dd765e1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:15:52 +0700 Subject: [PATCH 271/667] Add model 2023-11-06-bert_ner_bert_split_title_org_en --- ...-11-06-bert_ner_bert_split_title_org_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_split_title_org_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_split_title_org_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_split_title_org_en.md new file mode 100644 index 00000000000000..7ae136c1447c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_split_title_org_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from pkushiqiang) +author: John Snow Labs +name: bert_ner_bert_split_title_org +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-split-title-org` is a English model originally trained by `pkushiqiang`. + +## Predicted Entities + +`org`, `jbttl_extra`, `degree`, `major`, `job_title` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_split_title_org_en_5.2.0_3.0_1699290907094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_split_title_org_en_5.2.0_3.0_1699290907094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_split_title_org","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_split_title_org","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.split_title_org.by_pkushiqiang").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_split_title_org| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/pkushiqiang/bert-split-title-org \ No newline at end of file From d08cdb4c5cda238f8319a9b3d479fa412ae4bce1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:16:53 +0700 Subject: [PATCH 272/667] Add model 2023-11-06-bent_pubmedbert_ner_gene_en --- .../2023-11-06-bent_pubmedbert_ner_gene_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_gene_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_gene_en.md b/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_gene_en.md new file mode 100644 index 00000000000000..29a505f38376b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_gene_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_gene BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_gene +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_gene` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_gene_en_5.2.0_3.0_1699304365196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_gene_en_5.2.0_3.0_1699304365196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_gene","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_gene", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_gene| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Gene \ No newline at end of file From 74f477e2af2d82f2ff5ab9301b65ec025f1819c7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:17:53 +0700 Subject: [PATCH 273/667] Add model 2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t1_en --- ..._original_scibert_bc5cdr_chemical_t1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t1_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t1_en.md new file mode 100644 index 00000000000000..ed30aff8929b8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_t1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_scibert_bc5cdr_chemical_t1 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_scibert_bc5cdr_chemical_t1 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_scibert_bc5cdr_chemical_t1` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc5cdr_chemical_t1_en_5.2.0_3.0_1699280367481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc5cdr_chemical_t1_en_5.2.0_3.0_1699280367481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_scibert_bc5cdr_chemical_t1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_scibert_bc5cdr_chemical_t1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_scibert_bc5cdr_chemical_t1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-SciBERT-BC5CDR-Chemical-T1 \ No newline at end of file From c564ad8e0c9eca7d840a5a896408694c3551c7d2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:18:53 +0700 Subject: [PATCH 274/667] Add model 2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en --- ...course_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..04e893efb7cc32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from huggingface-course) +author: John Snow Labs +name: bert_ner_huggingface_course_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `huggingface-course`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699292365020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_huggingface_course_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699292365020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_huggingface_course_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_huggingface_course_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_huggingface_course").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_huggingface_course_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/huggingface-course/bert-finetuned-ner-accelerate \ No newline at end of file From f1eb2272d5323771e5d2f6a35fc1a082a786d92f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:19:53 +0700 Subject: [PATCH 275/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_ktoto_punctuator_en --- ...ayula_popoluca_tiny_ktoto_punctuator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_ktoto_punctuator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_ktoto_punctuator_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_ktoto_punctuator_en.md new file mode 100644 index 00000000000000..124137514ae93b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_ktoto_punctuator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_ktoto_punctuator BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_ktoto_punctuator +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_ktoto_punctuator` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_ktoto_punctuator_en_5.2.0_3.0_1699304453526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_ktoto_punctuator_en_5.2.0_3.0_1699304453526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_ktoto_punctuator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_ktoto_punctuator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_ktoto_punctuator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_ktoto_punctuator \ No newline at end of file From aebcfec8e981e681e1087dabb230de439c1460c6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:20:54 +0700 Subject: [PATCH 276/667] Add model 2023-11-06-bert_ner_ncduy_bert_finetuned_ner_en --- ...06-bert_ner_ncduy_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ncduy_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ncduy_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ncduy_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..834d36167dd333 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ncduy_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from ncduy) +author: John Snow Labs +name: bert_ner_ncduy_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `ncduy`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ncduy_bert_finetuned_ner_en_5.2.0_3.0_1699298376329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ncduy_bert_finetuned_ner_en_5.2.0_3.0_1699298376329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ncduy_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ncduy_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_ncduy").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ncduy_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ncduy/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 2de2ecb02be185e9c3a13c2f72129353161491a1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:21:54 +0700 Subject: [PATCH 277/667] Add model 2023-11-06-bert_italian_uncased_ner_it --- .../2023-11-06-bert_italian_uncased_ner_it.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_italian_uncased_ner_it.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_italian_uncased_ner_it.md b/docs/_posts/ahmedlone127/2023-11-06-bert_italian_uncased_ner_it.md new file mode 100644 index 00000000000000..4445da7b9773b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_italian_uncased_ner_it.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Italian bert_italian_uncased_ner BertForTokenClassification from osiria +author: John Snow Labs +name: bert_italian_uncased_ner +date: 2023-11-06 +tags: [bert, it, open_source, token_classification, onnx] +task: Named Entity Recognition +language: it +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_italian_uncased_ner` is a Italian model originally trained by osiria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_italian_uncased_ner_it_5.2.0_3.0_1699304734543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_italian_uncased_ner_it_5.2.0_3.0_1699304734543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_italian_uncased_ner","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_italian_uncased_ner", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_italian_uncased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|it| +|Size:|407.1 MB| + +## References + +https://huggingface.co/osiria/bert-italian-uncased-ner \ No newline at end of file From 48287654808ee666453251da09f6fb1cf6c982d5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:22:54 +0700 Subject: [PATCH 278/667] Add model 2023-11-06-bert_ner_keyword_tag_model_en --- ...023-11-06-bert_ner_keyword_tag_model_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_en.md new file mode 100644 index 00000000000000..322c6b86b6e13c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`occasion`, `cuisines`, `mealcourse`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_en_5.2.0_3.0_1699292413018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_en_5.2.0_3.0_1699292413018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model \ No newline at end of file From 2532ca445c304f6749d59ceea32f893f64337378 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:23:55 +0700 Subject: [PATCH 279/667] Add model 2023-11-06-bert_ner_ner_news_portuguese_pt --- ...3-11-06-bert_ner_ner_news_portuguese_pt.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_news_portuguese_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_news_portuguese_pt.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_news_portuguese_pt.md new file mode 100644 index 00000000000000..92b94c1bc0250b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_news_portuguese_pt.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Portuguese Named Entity Recognition (from monilouise) +author: John Snow Labs +name: bert_ner_ner_news_portuguese +date: 2023-11-06 +tags: [bert, ner, token_classification, pt, open_source, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `ner_news_portuguese` is a Portuguese model orginally trained by `monilouise`. + +## Predicted Entities + +`PUB`, `PESSOA`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_news_portuguese_pt_5.2.0_3.0_1699296116605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_news_portuguese_pt_5.2.0_3.0_1699296116605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_news_portuguese","pt") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Eu amo Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_news_portuguese","pt") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Eu amo Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("pt.ner.bert.news.").predict("""Eu amo Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_news_portuguese| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|406.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/monilouise/ner_news_portuguese +- https://github.com/neuralmind-ai/portuguese-bert/blob/master/README.md \ No newline at end of file From 1b8648182afb0412cf20fb96554851e3dbf4d376 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:24:55 +0700 Subject: [PATCH 280/667] Add model 2023-11-06-bert_ner_buehlpa_bert_finetuned_ner_en --- ...-bert_ner_buehlpa_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_buehlpa_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_buehlpa_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_buehlpa_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..17fe5474470cd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_buehlpa_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from buehlpa) +author: John Snow Labs +name: bert_ner_buehlpa_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `buehlpa`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_buehlpa_bert_finetuned_ner_en_5.2.0_3.0_1699291833509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_buehlpa_bert_finetuned_ner_en_5.2.0_3.0_1699291833509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_buehlpa_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_buehlpa_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_buehlpa").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_buehlpa_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/buehlpa/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 6bb0a4d02d381ff1b0a10023e66a0ea90defc928 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:25:55 +0700 Subject: [PATCH 281/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_german_upos_de --- ...ayula_popoluca_bert_base_german_upos_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_german_upos_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_german_upos_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_german_upos_de.md new file mode 100644 index 00000000000000..bb1c45383b66f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_german_upos_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German bert_sayula_popoluca_bert_base_german_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_german_upos +date: 2023-11-06 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_german_upos` is a German model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_german_upos_de_5.2.0_3.0_1699300997630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_german_upos_de_5.2.0_3.0_1699300997630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_german_upos","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_german_upos", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_german_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.9 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-base-german-upos \ No newline at end of file From a9497fc058f048140b27b058f7c1094eb82a6770 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:26:56 +0700 Subject: [PATCH 282/667] Add model 2023-11-06-bert_sayula_popoluca_chinese_roberta_base_upos_zh --- ...a_popoluca_chinese_roberta_base_upos_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_base_upos_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_base_upos_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_base_upos_zh.md new file mode 100644 index 00000000000000..7108f987d2c3fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_base_upos_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_sayula_popoluca_chinese_roberta_base_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_chinese_roberta_base_upos +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_chinese_roberta_base_upos` is a Chinese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_chinese_roberta_base_upos_zh_5.2.0_3.0_1699305982964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_chinese_roberta_base_upos_zh_5.2.0_3.0_1699305982964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_chinese_roberta_base_upos","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_chinese_roberta_base_upos", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_chinese_roberta_base_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/KoichiYasuoka/chinese-roberta-base-upos \ No newline at end of file From 28f13443531f3fdeb6599056a03d8af96e94ae93 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:27:56 +0700 Subject: [PATCH 283/667] Add model 2023-11-06-bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en --- ..._tag_model_6000_9_16_more_ingredient_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en.md new file mode 100644 index 00000000000000..859c8ed2627113 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_6000_9_16_more_ingredient +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-6000-9-16_more_ingredient` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`negingredient`, `occasion`, `mealcourse`, `cuisines`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en_5.2.0_3.0_1699294967428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_6000_9_16_more_ingredient_en_5.2.0_3.0_1699294967428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_6000_9_16_more_ingredient","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_6000_9_16_more_ingredient","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.ingredient.6000_9_16.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_6000_9_16_more_ingredient| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-6000-9-16_more_ingredient \ No newline at end of file From 622226b4decaffd0ae7739696e1d68e3ebf9487e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:28:56 +0700 Subject: [PATCH 284/667] Add model 2023-11-06-bert_ner_epiextract4gard_en --- .../2023-11-06-bert_ner_epiextract4gard_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_epiextract4gard_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_epiextract4gard_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_epiextract4gard_en.md new file mode 100644 index 00000000000000..2223b23af8b311 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_epiextract4gard_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_epiextract4gard BertForTokenClassification from wzkariampuzha +author: John Snow Labs +name: bert_ner_epiextract4gard +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_epiextract4gard` is a English model originally trained by wzkariampuzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_epiextract4gard_en_5.2.0_3.0_1699278256014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_epiextract4gard_en_5.2.0_3.0_1699278256014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_epiextract4gard","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_epiextract4gard", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_epiextract4gard| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/wzkariampuzha/EpiExtract4GARD \ No newline at end of file From a14983f8334c532fbc9e0037a89897af31cda372 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:29:56 +0700 Subject: [PATCH 285/667] Add model 2023-11-06-bert_ner_amir36_bert_finetuned_ner_en --- ...6-bert_ner_amir36_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_amir36_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_amir36_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_amir36_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..140145a2306c73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_amir36_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from amir36) +author: John Snow Labs +name: bert_ner_amir36_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `amir36`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_amir36_bert_finetuned_ner_en_5.2.0_3.0_1699284193765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_amir36_bert_finetuned_ner_en_5.2.0_3.0_1699284193765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_amir36_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_amir36_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_amir36").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_amir36_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/amir36/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 79f3acf302f03a6b94e8d16b48349f1056586f9c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:30:57 +0700 Subject: [PATCH 286/667] Add model 2023-11-06-bert_ner_mldev_bert_finetuned_ner_en --- ...06-bert_ner_mldev_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mldev_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mldev_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mldev_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..2fd333419f9a5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mldev_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mldev) +author: John Snow Labs +name: bert_ner_mldev_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `mldev`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mldev_bert_finetuned_ner_en_5.2.0_3.0_1699295583692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mldev_bert_finetuned_ner_en_5.2.0_3.0_1699295583692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mldev_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mldev_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_mldev").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mldev_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mldev/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 27de354614117116f9f46a7566b809cfeec6dfa6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:31:57 +0700 Subject: [PATCH 287/667] Add model 2023-11-06-bert_base_multilingual_cased_masakhaner_xx --- ...t_base_multilingual_cased_masakhaner_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_base_multilingual_cased_masakhaner_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_base_multilingual_cased_masakhaner_xx.md b/docs/_posts/ahmedlone127/2023-11-06-bert_base_multilingual_cased_masakhaner_xx.md new file mode 100644 index 00000000000000..4dd59671af4a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_base_multilingual_cased_masakhaner_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_masakhaner BertForTokenClassification from Davlan +author: John Snow Labs +name: bert_base_multilingual_cased_masakhaner +date: 2023-11-06 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_masakhaner` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_masakhaner_xx_5.2.0_3.0_1699306245905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_masakhaner_xx_5.2.0_3.0_1699306245905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_masakhaner","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_multilingual_cased_masakhaner", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_masakhaner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-masakhaner \ No newline at end of file From 90500e7ddf1e09717a24ecc68eda52f830185d5f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:32:57 +0700 Subject: [PATCH 288/667] Add model 2023-11-06-bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en --- ...a_parsbert_finetuned_sayula_popoluca_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en.md new file mode 100644 index 00000000000000..8786036f1d7c02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca BertForTokenClassification from sepidmnorozy +author: John Snow Labs +name: bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca` is a English model originally trained by sepidmnorozy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en_5.2.0_3.0_1699306247944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca_en_5.2.0_3.0_1699306247944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_parsbert_finetuned_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|606.5 MB| + +## References + +https://huggingface.co/sepidmnorozy/parsbert-finetuned-pos \ No newline at end of file From 23f838ccd861606fc1b150ad67aee24deffb11ab Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:33:58 +0700 Subject: [PATCH 289/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_bb_wd_en --- ...1-06-bert_sayula_popoluca_tiny_bb_wd_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_bb_wd_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_bb_wd_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_bb_wd_en.md new file mode 100644 index 00000000000000..b39d0d724fb42c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_bb_wd_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_bb_wd BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_bb_wd +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_bb_wd` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_bb_wd_en_5.2.0_3.0_1699300563685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_bb_wd_en_5.2.0_3.0_1699300563685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_bb_wd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_bb_wd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_bb_wd| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_bb_wd \ No newline at end of file From 897e49e8d8d56a24cf82864d95a15672436d52d9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:34:58 +0700 Subject: [PATCH 290/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar --- ...ic_camelbert_msa_sayula_popoluca_egy_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar.md new file mode 100644 index 00000000000000..ca0a1c0f6bb757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar_5.2.0_3.0_1699302624301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy_ar_5.2.0_3.0_1699302624301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_egy| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.4 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-pos-egy \ No newline at end of file From bb6d76424af61e575ed178a976c0c65bd18abbe4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:35:58 +0700 Subject: [PATCH 291/667] Add model 2023-11-06-bert_sayula_popoluca_amharicwicpostag10tags_en --- ...yula_popoluca_amharicwicpostag10tags_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag10tags_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag10tags_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag10tags_en.md new file mode 100644 index 00000000000000..e74343b3bdcfeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_amharicwicpostag10tags_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_amharicwicpostag10tags BertForTokenClassification from mitiku +author: John Snow Labs +name: bert_sayula_popoluca_amharicwicpostag10tags +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_amharicwicpostag10tags` is a English model originally trained by mitiku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_amharicwicpostag10tags_en_5.2.0_3.0_1699299256671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_amharicwicpostag10tags_en_5.2.0_3.0_1699299256671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_amharicwicpostag10tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_amharicwicpostag10tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_amharicwicpostag10tags| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/mitiku/AmharicWICPostag10Tags \ No newline at end of file From f927702e7dfd01ff2515251846e3be393d1a066d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:36:59 +0700 Subject: [PATCH 292/667] Add model 2023-11-06-bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en --- ..._tag_model_4000_9_16_more_ingredient_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en.md new file mode 100644 index 00000000000000..1f18e3bce57736 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_4000_9_16_more_ingredient +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-4000-9-16_more_ingredient` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`negingredient`, `occasion`, `mealcourse`, `cuisines`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en_5.2.0_3.0_1699293952171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_4000_9_16_more_ingredient_en_5.2.0_3.0_1699293952171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_4000_9_16_more_ingredient","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_4000_9_16_more_ingredient","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.ingredient.4000_9_16.by_media1129").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_4000_9_16_more_ingredient| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-4000-9-16_more_ingredient \ No newline at end of file From d3d33c46b88c60907a5d2131990a29621c359a7e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:37:59 +0700 Subject: [PATCH 293/667] Add model 2023-11-06-bert_ner_nepal_bhasa_test_model_en --- ...1-06-bert_ner_nepal_bhasa_test_model_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model_en.md new file mode 100644 index 00000000000000..87429cfee33e87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nepal_bhasa_test_model_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_nepal_bhasa_test_model BertForTokenClassification from kSaluja +author: John Snow Labs +name: bert_ner_nepal_bhasa_test_model +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_nepal_bhasa_test_model` is a English model originally trained by kSaluja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nepal_bhasa_test_model_en_5.2.0_3.0_1699298365359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nepal_bhasa_test_model_en_5.2.0_3.0_1699298365359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nepal_bhasa_test_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_nepal_bhasa_test_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nepal_bhasa_test_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kSaluja/new-test-model \ No newline at end of file From 129dfee18d91a41e0fe0d7d1dd69afbaa6a5143e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:38:59 +0700 Subject: [PATCH 294/667] Add model 2023-11-06-bert_sayula_popoluca_estbert_upos_128_en --- ...ert_sayula_popoluca_estbert_upos_128_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_estbert_upos_128_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_estbert_upos_128_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_estbert_upos_128_en.md new file mode 100644 index 00000000000000..ace88f84084e04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_estbert_upos_128_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_estbert_upos_128 BertForTokenClassification from tartuNLP +author: John Snow Labs +name: bert_sayula_popoluca_estbert_upos_128 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_estbert_upos_128` is a English model originally trained by tartuNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_estbert_upos_128_en_5.2.0_3.0_1699299768492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_estbert_upos_128_en_5.2.0_3.0_1699299768492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_estbert_upos_128","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_estbert_upos_128", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_estbert_upos_128| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|463.5 MB| + +## References + +https://huggingface.co/tartuNLP/EstBERT_UPOS_128 \ No newline at end of file From 337b1712cde6d3eb0d66aa5e7415e04890b516d6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:39:59 +0700 Subject: [PATCH 295/667] Add model 2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_512_en --- ...er_bc5cdr_chem_modified_bluebert_512_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_512_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_512_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_512_en.md new file mode 100644 index 00000000000000..54c47852e024bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bc5cdr_chem_modified_bluebert_512_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bc5cdr_chem_modified_bluebert_512 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bc5cdr_chem_modified_bluebert_512 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bc5cdr_chem_modified_bluebert_512` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_chem_modified_bluebert_512_en_5.2.0_3.0_1699272391082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bc5cdr_chem_modified_bluebert_512_en_5.2.0_3.0_1699272391082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bc5cdr_chem_modified_bluebert_512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bc5cdr_chem_modified_bluebert_512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bc5cdr_chem_modified_bluebert_512| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BC5CDR-Chem-Modified-BlueBERT-512 \ No newline at end of file From 526c7d9a60dc543487283271d7127d9624e041f2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:41:00 +0700 Subject: [PATCH 296/667] Add model 2023-11-06-bert_ner_swedish_ner_sv --- .../2023-11-06-bert_ner_swedish_ner_sv.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_swedish_ner_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_swedish_ner_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_swedish_ner_sv.md new file mode 100644 index 00000000000000..1c6c6b23e4b3fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_swedish_ner_sv.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Swedish bert_ner_swedish_ner BertForTokenClassification from RecordedFuture +author: John Snow Labs +name: bert_ner_swedish_ner +date: 2023-11-06 +tags: [bert, sv, open_source, token_classification, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_swedish_ner` is a Swedish model originally trained by RecordedFuture. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_swedish_ner_sv_5.2.0_3.0_1699283270683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_swedish_ner_sv_5.2.0_3.0_1699283270683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_swedish_ner","sv") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_swedish_ner", "sv") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_swedish_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.3 MB| + +## References + +https://huggingface.co/RecordedFuture/Swedish-NER \ No newline at end of file From d406f4b55e702ea04602b3cfef4b103e37b9befd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:42:00 +0700 Subject: [PATCH 297/667] Add model 2023-11-06-bert_ner_bunsen_base_best_en --- ...2023-11-06-bert_ner_bunsen_base_best_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bunsen_base_best_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bunsen_base_best_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bunsen_base_best_en.md new file mode 100644 index 00000000000000..181356b9084ff2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bunsen_base_best_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Base Cased model (from leonweber) +author: John Snow Labs +name: bert_ner_bunsen_base_best +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bunsen_base_best` is a English model originally trained by `leonweber`. + +## Predicted Entities + +`medmentions_full_ner:B-T085)`, `bionlp_st_2013_gro_ner:B-Ribosome)`, `chemdner_TEXT:MESH:D013830)`, `anat_em_ner:O)`, `cellfinder_ner:I-GeneProtein)`, `ncbi_disease_ner:B-CompositeMention)`, `bionlp_st_2013_gro_ner:B-Virus)`, `medmentions_full_ner:I-T129)`, `scai_disease_ner:B-DISEASE)`, `biorelex_ner:B-chemical)`, `chemdner_TEXT:MESH:D011166)`, `medmentions_st21pv_ner:I-T204)`, `chemdner_TEXT:MESH:D008345)`, `bionlp_st_2013_gro_NER:B-RegulationOfFunction)`, `mlee_ner:I-Cell)`, `bionlp_st_2013_gro_NER:I-RNABiosynthesis)`, `biorelex_ner:I-RNA-family)`, `bionlp_st_2013_gro_NER:B-ResponseToChemicalStimulus)`, `bionlp_st_2011_epi_NER:B-Dephosphorylation)`, `chemdner_TEXT:MESH:D003035)`, `chemdner_TEXT:MESH:D013440)`, `chemdner_TEXT:MESH:D037341)`, `chemdner_TEXT:MESH:D009532)`, `chemdner_TEXT:MESH:D019216)`, `chemdner_TEXT:MESH:D036701)`, `chemdner_TEXT:MESH:D011107)`, `bionlp_st_2013_cg_NER:B-Translation)`, `genia_term_corpus_ner:B-cell_component)`, `medmentions_full_ner:I-T065)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfDNA)`, `anat_em_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D000225)`, `genia_term_corpus_ner:I-ORDNA_domain_or_regionDNA_domain_or_region)`, `medmentions_full_ner:I-T015)`, `chemdner_TEXT:MESH:D008239)`, `bionlp_st_2013_cg_NER:I-Binding)`, `bionlp_st_2013_cg_NER:B-Amino_acid_catabolism)`, `cellfinder_ner:B-CellComponent)`, `bionlp_st_2013_gro_NER:I-MetabolicPathway)`, `bionlp_st_2013_gro_ner:B-ProteinIdentification)`, `bionlp_st_2011_ge_ner:O)`, `bionlp_st_2011_id_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelixTF)`, `mirna_ner:B-Relation_Trigger)`, `bionlp_st_2011_ge_NER:B-Regulation)`, `bionlp_st_2013_cg_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D008055)`, `chemdner_TEXT:MESH:D009944)`, `verspoor_2013_ner:I-gene)`, `bionlp_st_2013_ge_ner:O)`, `meddocan_ner:B-SEXO_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D003907)`, `mlee_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D010569)`, `mlee_NER:I-Growth)`, `meddocan_ner:B-NUMERO_TELEFONO)`, `chemdner_TEXT:MESH:D036145)`, `medmentions_full_ner:I-T196)`, `ehr_rel_sts:1)`, `bionlp_st_2013_gro_NER:B-CellularComponentOrganizationAndBiogenesis)`, `chemdner_TEXT:MESH:D009285)`, `bionlp_st_2013_gro_NER:B-ProteinMetabolism)`, `chemdner_TEXT:MESH:D016718)`, `bionlp_st_2013_gro_NER:I-BindingOfTFToTFBindingSiteOfProtein)`, `medmentions_full_ner:I-T074)`, `chemdner_TEXT:MESH:D000432)`, `bionlp_st_2013_gro_NER:I-CellFateDetermination)`, `chia_ner:I-Reference_point)`, `bionlp_st_2013_gro_ner:B-Histone)`, `lll_RE:None)`, `scai_disease_ner:B-ADVERSE)`, `medmentions_full_ner:B-T130)`, `bionlp_st_2013_gro_NER:I-CellCyclePhaseTransition)`, `chemdner_TEXT:MESH:D000480)`, `chemdner_TEXT:MESH:D001556)`, `bionlp_st_2013_gro_ner:B-Nucleus)`, `bionlp_st_2013_gro_ner:B-AP2EREBPRelatedDomain)`, `chemdner_TEXT:MESH:D007854)`, `chemdner_TEXT:MESH:D009499)`, `genia_term_corpus_ner:B-polynucleotide)`, `bionlp_st_2013_gro_NER:I-Transcription)`, `chemdner_TEXT:MESH:D007213)`, `bionlp_st_2013_ge_NER:B-Regulation)`, `bionlp_st_2011_epi_NER:B-DNA_methylation)`, `medmentions_st21pv_ner:B-T031)`, `bionlp_st_2013_ge_NER:I-Gene_expression)`, `chemdner_TEXT:MESH:D007651)`, `bionlp_st_2013_gro_NER:B-OrganismalProcess)`, `bionlp_st_2011_epi_COREF:None)`, `medmentions_st21pv_ner:I-T062)`, `chemdner_TEXT:MESH:D002047)`, `chemdner_TEXT:MESH:D012822)`, `mantra_gsc_en_patents_ner:B-DEVI)`, `medmentions_full_ner:I-T071)`, `chemdner_TEXT:MESH:D013739)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfGeneExpression)`, `genia_term_corpus_ner:B-other_name)`, `medmentions_full_ner:B-T018)`, `chemdner_TEXT:MESH:D015242)`, `bionlp_st_2013_cg_NER:O)`, `chemdner_TEXT:MESH:D019469)`, `ncbi_disease_ner:B-DiseaseClass)`, `ebm_pico_ner:B-Intervention_Surgical)`, `chemdner_TEXT:MESH:D011422)`, `chemdner_TEXT:MESH:D002112)`, `chemdner_TEXT:MESH:D005682)`, `anat_em_ner:B-Immaterial_anatomical_entity)`, `bionlp_st_2011_epi_ner:B-Entity)`, `medmentions_full_ner:I-T169)`, `mlee_ner:B-Immaterial_anatomical_entity)`, `verspoor_2013_ner:B-Physiology)`, `cellfinder_ner:I-CellType)`, `chemdner_TEXT:MESH:D011122)`, `chemdner_TEXT:MESH:D010622)`, `chemdner_TEXT:MESH:D017378)`, `bionlp_st_2011_ge_RE:Theme)`, `chemdner_TEXT:MESH:D000431)`, `medmentions_full_ner:I-T102)`, `medmentions_full_ner:B-T097)`, `chemdner_TEXT:MESH:D007529)`, `chemdner_TEXT:MESH:D045265)`, `chemdner_TEXT:MESH:D005971)`, `an_em_ner:I-Multi-tissue_structure)`, `genia_term_corpus_ner:I-ANDDNA_family_or_groupDNA_family_or_group)`, `medmentions_full_ner:I-T080)`, `chemdner_TEXT:MESH:D002207)`, `chia_ner:I-Qualifier)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscriptionByTranscriptionRepressor)`, `an_em_ner:I-Immaterial_anatomical_entity)`, `biosses_sts:5)`, `chemdner_TEXT:MESH:D000079963)`, `chemdner_TEXT:MESH:D013196)`, `ehr_rel_sts:2)`, `chemdner_TEXT:MESH:D006152)`, `bionlp_st_2013_gro_NER:B-RegulationOfProcess)`, `mlee_NER:I-Development)`, `medmentions_full_ner:B-T197)`, `bionlp_st_2013_gro_ner:B-NucleicAcid)`, `medmentions_st21pv_ner:I-T017)`, `medmentions_full_ner:I-T046)`, `medmentions_full_ner:B-T204)`, `bionlp_st_2013_gro_NER:B-CellularDevelopmentalProcess)`, `bionlp_st_2013_cg_ner:B-Immaterial_anatomical_entity)`, `chemdner_TEXT:MESH:D014212)`, `bionlp_st_2013_cg_NER:B-Protein_processing)`, `chemdner_TEXT:MESH:D008926)`, `chia_ner:B-Visit)`, `bionlp_st_2011_ge_NER:B-Negative_regulation)`, `mantra_gsc_en_medline_ner:I-OBJC)`, `bionlp_st_2013_gro_ner:I-RNAMolecule)`, `chemdner_TEXT:MESH:D014812)`, `linnaeus_filtered_ner:I-species)`, `chebi_nactem_fullpaper_ner:B-Chemical)`, `bionlp_st_2011_ge_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_ner:B-MutantGene)`, `chemdner_TEXT:MESH:D014859)`, `bionlp_st_2019_bb_ner:B-Phenotype)`, `bionlp_st_2013_gro_NER:I-BindingOfTFToTFBindingSiteOfDNA)`, `diann_iber_eval_en_ner:I-Neg)`, `ddi_corpus_ner:B-DRUG_N)`, `meddocan_ner:B-ID_TITULACION_PERSONAL_SANITARIO)`, `bionlp_st_2013_cg_ner:B-Organ)`, `chemdner_TEXT:MESH:D009320)`, `bionlp_st_2013_cg_ner:I-Organism_subdivision)`, `bionlp_st_2013_cg_ner:B-Cellular_component)`, `chemdner_TEXT:MESH:D003188)`, `chemdner_TEXT:MESH:D001241)`, `chemdner_TEXT:MESH:D004811)`, `bioinfer_ner:I-GeneproteinRNA)`, `chemdner_TEXT:MESH:D002248)`, `bionlp_shared_task_2009_NER:B-Negative_regulation)`, `chemdner_TEXT:MESH:D000143)`, `chemdner_TEXT:MESH:D007099)`, `nlm_gene_ner:O)`, `chemdner_TEXT:MESH:D005485)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorBindingSiteOfDNA)`, `bionlp_st_2013_gro_ner:B-PhysicalContact)`, `medmentions_full_ner:B-T167)`, `medmentions_st21pv_ner:B-T091)`, `seth_corpus_ner:I-Gene)`, `bionlp_st_2011_ge_COREF:coref)`, `bionlp_st_2011_ge_NER:B-Gene_expression)`, `medmentions_full_ner:B-T031)`, `genia_relation_corpus_RE:None)`, `genia_term_corpus_ner:I-ANDDNA_domain_or_regionDNA_domain_or_region)`, `chemdner_TEXT:MESH:D014970)`, `bionlp_st_2013_gro_NER:B-Mutation)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivator)`, `chemdner_TEXT:MESH:D002217)`, `chemdner_TEXT:MESH:D003367)`, `medmentions_full_ner:I-UnknownType)`, `chemdner_TEXT:MESH:D002998)`, `bionlp_st_2013_gro_ner:I-Phenotype)`, `genia_term_corpus_ner:B-ANDDNA_family_or_groupDNA_family_or_group)`, `hprd50_RE:PPI)`, `chemdner_TEXT:MESH:D002118)`, `scai_chemical_ner:B-IUPAC)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfProtein)`, `verspoor_2013_ner:B-mutation)`, `chemdner_TEXT:MESH:D011719)`, `chemdner_TEXT:MESH:D013729)`, `bionlp_shared_task_2009_ner:O)`, `chemdner_TEXT:MESH:D005840)`, `chemdner_TEXT:MESH:D009287)`, `medmentions_full_ner:B-T029)`, `chemdner_TEXT:MESH:D037742)`, `medmentions_full_ner:I-T200)`, `chemdner_TEXT:MESH:D012503)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndRNA)`, `mirna_ner:I-Non-Specific_miRNAs)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfProtein)`, `bionlp_st_2013_pc_NER:B-Deacetylation)`, `meddocan_ner:B-NOMBRE_PERSONAL_SANITARIO)`, `chemprot_RE:CPR:7)`, `chia_ner:I-Value)`, `medmentions_full_ner:I-T048)`, `chemprot_ner:B-GENE-Y)`, `bionlp_st_2013_cg_NER:B-Reproduction)`, `pharmaconer_ner:B-UNCLEAR)`, `bionlp_st_2011_id_ner:I-Regulon-operon)`, `ebm_pico_ner:I-Outcome_Adverse-effects)`, `bioinfer_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-bZIPTF)`, `mirna_ner:I-GenesProteins)`, `biorelex_ner:I-process)`, `chemdner_TEXT:MESH:D001555)`, `genia_term_corpus_ner:B-DNA_domain_or_region)`, `cellfinder_ner:O)`, `bionlp_st_2013_gro_ner:I-MutatedProtein)`, `bionlp_st_2013_gro_NER:I-CellularComponentOrganizationAndBiogenesis)`, `spl_adr_200db_train_ner:O)`, `medmentions_full_ner:I-T026)`, `chemdner_TEXT:MESH:D013619)`, `bionlp_st_2013_gro_NER:I-BindingToRNA)`, `biorelex_ner:I-drug)`, `bionlp_st_2013_pc_NER:B-Translation)`, `mantra_gsc_en_emea_ner:B-LIVB)`, `mantra_gsc_en_patents_ner:B-PROC)`, `bionlp_st_2013_pc_NER:B-Binding)`, `bionlp_st_2013_gro_NER:B-ModificationOfMolecularEntity)`, `bionlp_st_2013_cg_NER:I-Cell_transformation)`, `scai_chemical_ner:B-TRIVIALVAR)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomain)`, `bionlp_st_2013_gro_NER:I-TranscriptionInitiation)`, `chemdner_TEXT:MESH:D010907)`, `bionlp_st_2013_gro_ner:B-InorganicChemical)`, `bionlp_st_2013_pc_RE:None)`, `chemdner_TEXT:MESH:D002922)`, `chemdner_TEXT:MESH:D010743)`, `bionlp_st_2019_bb_ner:O)`, `medmentions_full_ner:I-T001)`, `chemdner_TEXT:MESH:D001381)`, `bionlp_shared_task_2009_ner:I-Protein)`, `bionlp_st_2013_gro_ner:B-Spliceosome)`, `bionlp_st_2013_gro_ner:I-HMGTF)`, `minimayosrs_sts:3)`, `ddi_corpus_RE:ADVISE)`, `mlee_NER:B-Dissociation)`, `bionlp_st_2013_gro_ner:I-Holoenzyme)`, `chemdner_TEXT:MESH:D001552)`, `bionlp_st_2013_gro_ner:B-bHLH)`, `chemdner_TEXT:MESH:D000109)`, `chemdner_TEXT:MESH:D013449)`, `bionlp_st_2013_gro_ner:I-GeneRegion)`, `medmentions_full_ner:B-T019)`, `scai_chemical_ner:B-TRIVIAL)`, `mlee_ner:B-Gene_or_gene_product)`, `biosses_sts:3)`, `bionlp_st_2013_cg_NER:I-Pathway)`, `bionlp_st_2011_id_ner:I-Organism)`, `bionlp_st_2013_gro_ner:B-tRNA)`, `chemdner_TEXT:MESH:D013109)`, `mlee_ner:I-Immaterial_anatomical_entity)`, `medmentions_full_ner:B-T065)`, `ebm_pico_ner:I-Participant_Sample-size)`, `genia_term_corpus_ner:I-protein_family_or_group)`, `chemdner_TEXT:MESH:D002444)`, `chemdner_TEXT:MESH:D063388)`, `mlee_NER:B-Translation)`, `chemdner_TEXT:MESH:D007052)`, `bionlp_st_2013_gro_ner:B-Gene)`, `chia_ner:B-Scope)`, `bionlp_st_2013_ge_NER:I-Positive_regulation)`, `chemdner_TEXT:MESH:D007785)`, `medmentions_st21pv_ner:I-T097)`, `iepa_RE:None)`, `medmentions_full_ner:B-T001)`, `medmentions_full_ner:I-T194)`, `chemdner_TEXT:MESH:D047309)`, `bionlp_st_2013_gro_ner:B-Substrate)`, `chemdner_TEXT:MESH:D002186)`, `ebm_pico_ner:B-Outcome_Other)`, `bionlp_st_2013_gro_NER:I-OrganismalProcess)`, `bionlp_st_2013_gro_ner:B-Ion)`, `bionlp_st_2013_gro_NER:I-ProteinBiosynthesis)`, `chia_ner:B-Drug)`, `bionlp_st_2013_gro_ner:I-MolecularEntity)`, `cadec_ner:I-Symptom)`, `anat_em_ner:B-Cellular_component)`, `bionlp_st_2013_cg_ner:B-Multi-tissue_structure)`, `medmentions_full_ner:I-T122)`, `an_em_ner:B-Cell)`, `chemdner_TEXT:MESH:D011564)`, `bionlp_st_2013_gro_NER:B-Splicing)`, `bionlp_st_2013_cg_NER:I-Metabolism)`, `bionlp_st_2013_pc_NER:B-Activation)`, `bionlp_st_2013_gro_ner:I-BindingSiteOfProtein)`, `bionlp_st_2011_id_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:I-Ribosome)`, `nlmchem_ner:I-Chemical)`, `mirna_ner:I-Specific_miRNAs)`, `medmentions_full_ner:I-T012)`, `bionlp_st_2013_gro_NER:B-IntraCellularTransport)`, `bionlp_st_2011_id_NER:I-Transcription)`, `mantra_gsc_en_patents_ner:I-ANAT)`, `an_em_ner:B-Immaterial_anatomical_entity)`, `scai_chemical_ner:I-IUPAC)`, `distemist_ner:B-ENFERMEDAD)`, `bionlp_st_2011_epi_NER:B-Deubiquitination)`, `chemdner_TEXT:MESH:D007295)`, `meddocan_ner:I-NOMBRE_SUJETO_ASISTENCIA)`, `bionlp_st_2011_ge_NER:B-Binding)`, `bionlp_st_2013_pc_NER:B-Localization)`, `chia_ner:B-Procedure)`, `medmentions_full_ner:I-T109)`, `chemdner_TEXT:MESH:D002791)`, `mantra_gsc_en_medline_ner:I-CHEM)`, `chebi_nactem_fullpaper_ner:B-Biological_Activity)`, `ncbi_disease_ner:B-SpecificDisease)`, `medmentions_full_ner:B-T063)`, `chemdner_TEXT:MESH:D016595)`, `bionlp_st_2011_id_NER:B-Transcription)`, `bionlp_st_2013_gro_ner:B-DNAMolecule)`, `mlee_NER:B-Protein_processing)`, `biorelex_ner:B-protein-complex)`, `anat_em_ner:I-Cancer)`, `bionlp_st_2013_cg_RE:AtLoc)`, `medmentions_full_ner:I-T072)`, `bio_sim_verb_sts:2)`, `seth_corpus_ner:O)`, `medmentions_full_ner:B-T070)`, `biorelex_ner:I-experiment-tag)`, `chemdner_TEXT:MESH:D020126)`, `biorelex_ner:I-protein-RNA-complex)`, `bionlp_st_2013_pc_NER:I-Phosphorylation)`, `medmentions_st21pv_ner:I-T201)`, `genia_term_corpus_ner:B-protein_complex)`, `medmentions_full_ner:I-T125)`, `bionlp_st_2013_ge_ner:I-Entity)`, `chemdner_TEXT:MESH:D054659)`, `bionlp_st_2013_pc_RE:ToLoc)`, `medmentions_full_ner:B-T099)`, `bionlp_st_2013_gro_NER:B-Binding)`, `medmentions_full_ner:B-T114)`, `spl_adr_200db_train_ner:B-Factor)`, `bionlp_st_2013_gro_ner:B-HMG)`, `bionlp_st_2013_gro_ner:B-Operon)`, `bionlp_st_2013_ge_NER:I-Protein_catabolism)`, `ebm_pico_ner:I-Outcome_Pain)`, `bionlp_st_2013_ge_NER:B-Transcription)`, `chemdner_TEXT:MESH:D000880)`, `ebm_pico_ner:I-Outcome_Physical)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D006160)`, `gnormplus_ner:B-DomainMotif)`, `medmentions_full_ner:I-T016)`, `pharmaconer_ner:O)`, `pdr_ner:I-Disease)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToProteinBindingSiteOfProtein)`, `chemdner_TEXT:MESH:D002264)`, `genia_term_corpus_ner:I-protein_NA)`, `bionlp_shared_task_2009_NER:I-Negative_regulation)`, `medmentions_full_ner:I-T011)`, `bionlp_st_2013_gro_NER:I-CellularMetabolicProcess)`, `mqp_sts:1)`, `an_em_ner:I-Pathological_formation)`, `bionlp_st_2011_epi_NER:B-Deacetylation)`, `bionlp_st_2013_pc_RE:Theme)`, `medmentions_full_ner:I-T103)`, `bionlp_st_2011_epi_NER:B-Methylation)`, `ebm_pico_ner:B-Intervention_Psychological)`, `bionlp_st_2013_gro_ner:B-Stress)`, `genia_term_corpus_ner:B-multi_cell)`, `bionlp_st_2013_cg_NER:B-Positive_regulation)`, `anat_em_ner:I-Cellular_component)`, `spl_adr_200db_train_ner:I-Negation)`, `chemdner_TEXT:MESH:D000605)`, `bionlp_st_2013_gro_ner:B-RegulatoryDNARegion)`, `bionlp_st_2013_gro_ner:I-HomeoboxTF)`, `bionlp_st_2013_gro_NER:I-GeneSilencing)`, `ddi_corpus_ner:I-DRUG)`, `bionlp_st_2013_cg_NER:I-Growth)`, `mantra_gsc_en_medline_ner:B-OBJC)`, `mayosrs_sts:3)`, `bionlp_st_2013_gro_NER:B-RNAProcessing)`, `cellfinder_ner:B-CellType)`, `medmentions_full_ner:B-T007)`, `chemprot_ner:B-GENE-N)`, `biorelex_ner:B-brand)`, `ebm_pico_ner:B-Outcome_Mental)`, `bionlp_st_2013_gro_NER:B-RegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-EukaryoticCell)`, `genia_term_corpus_ner:I-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:I-T184)`, `bionlp_st_2013_gro_NER:B-RegulatoryProcess)`, `bionlp_st_2011_id_NER:B-Negative_regulation)`, `bionlp_st_2013_cg_NER:I-Development)`, `cellfinder_ner:I-Anatomy)`, `chia_ner:B-Condition)`, `chemdner_TEXT:MESH:D003065)`, `medmentions_full_ner:B-T012)`, `bionlp_st_2011_id_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorComplex)`, `bionlp_st_2013_cg_NER:I-Carcinogenesis)`, `medmentions_full_ner:B-T064)`, `medmentions_full_ner:B-T026)`, `nlmchem_ner:B-Chemical)`, `genia_term_corpus_ner:I-RNA_domain_or_region)`, `ebm_pico_ner:I-Intervention_Educational)`, `genia_term_corpus_ner:B-ANDcell_linecell_line)`, `distemist_ner:I-ENFERMEDAD)`, `genia_term_corpus_ner:B-protein_substructure)`, `bionlp_st_2013_gro_NER:I-ProteinTransport)`, `bionlp_st_2013_cg_NER:B-DNA_demethylation)`, `medmentions_full_ner:I-T058)`, `biorelex_ner:B-parameter)`, `chemdner_TEXT:MESH:D013006)`, `mirna_ner:I-Relation_Trigger)`, `bionlp_st_2013_gro_ner:B-PrimaryStructure)`, `bionlp_st_2013_gro_NER:I-Phosphorylation)`, `chemdner_TEXT:MESH:D003911)`, `pico_extraction_ner:I-participant)`, `chemdner_TEXT:MESH:D010938)`, `chia_ner:B-Person)`, `an_em_ner:B-Tissue)`, `medmentions_st21pv_ner:B-T170)`, `chemdner_TEXT:MESH:D013936)`, `chemdner_TEXT:MESH:D001080)`, `mlee_RE:None)`, `chemdner_TEXT:MESH:D013669)`, `chemdner_TEXT:MESH:D009943)`, `spl_adr_200db_train_ner:I-Factor)`, `chemdner_TEXT:MESH:D044004)`, `ebm_pico_ner:I-Participant_Sex)`, `chemdner_TEXT:MESH:D000409)`, `bionlp_st_2013_cg_NER:B-Cell_division)`, `medmentions_st21pv_ner:B-T033)`, `pcr_ner:I-Herb)`, `chemdner_TEXT:MESH:D020112)`, `bionlp_st_2013_pc_NER:B-Gene_expression)`, `bionlp_st_2011_rel_ner:O)`, `chemdner_TEXT:MESH:D008610)`, `bionlp_st_2013_gro_NER:B-BindingOfDNABindingDomainOfProteinToDNA)`, `bionlp_st_2013_gro_ner:I-Cell)`, `medmentions_full_ner:I-T055)`, `bionlp_st_2013_pc_NER:I-Negative_regulation)`, `chia_RE:Has_value)`, `tmvar_v1_ner:I-SNP)`, `biorelex_ner:I-experimental-construct)`, `genia_term_corpus_ner:B-)`, `chemdner_TEXT:MESH:D053978)`, `bionlp_st_2013_gro_ner:I-Stress)`, `mlee_ner:B-Pathological_formation)`, `bionlp_st_2013_cg_ner:O)`, `chemdner_TEXT:MESH:D007631)`, `chemdner_TEXT:MESH:D011084)`, `medmentions_full_ner:B-T080)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-TranscriptionCorepressor)`, `ehr_rel_sts:4)`, `mlee_ner:I-Gene_or_gene_product)`, `chemdner_TEXT:MESH:D003474)`, `medmentions_full_ner:B-T098)`, `scicite_TEXT:method)`, `medmentions_full_ner:B-T100)`, `chemdner_TEXT:MESH:D011849)`, `medmentions_full_ner:I-T039)`, `anat_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:I-Nucleus)`, `mlee_NER:I-Regulation)`, `bionlp_st_2013_gro_ner:I-NuclearReceptor)`, `bionlp_st_2013_ge_RE:None)`, `chemdner_TEXT:MESH:D019483)`, `bionlp_st_2013_cg_ner:B-Cell)`, `bionlp_st_2013_gro_ner:B-Holoenzyme)`, `bionlp_st_2011_epi_NER:I-Methylation)`, `bionlp_shared_task_2009_ner:B-Protein)`, `medmentions_st21pv_ner:I-T038)`, `bionlp_st_2013_gro_ner:I-DNARegion)`, `bionlp_st_2013_gro_NER:I-CellCyclePhase)`, `bionlp_st_2013_gro_ner:I-tRNA)`, `mlee_ner:I-Multi-tissue_structure)`, `chemprot_ner:O)`, `medmentions_full_ner:B-T094)`, `bionlp_st_2013_gro_RE:fromSpecies)`, `bionlp_st_2013_gro_NER:O)`, `bionlp_st_2013_gro_NER:B-Acetylation)`, `bioinfer_ner:I-Protein_family_or_group)`, `medmentions_st21pv_ner:I-T098)`, `pdr_ner:B-Disease)`, `chemdner_ner:I-Chemical)`, `bionlp_st_2013_cg_NER:B-Negative_regulation)`, `chebi_nactem_fullpaper_ner:B-Chemical_Structure)`, `bionlp_st_2011_ge_NER:I-Negative_regulation)`, `sciq_CLF:no)`, `diann_iber_eval_en_ner:O)`, `bionlp_shared_task_2009_NER:I-Binding)`, `mlee_NER:I-Cell_proliferation)`, `chebi_nactem_fullpaper_ner:B-Protein)`, `bionlp_st_2013_gro_NER:B-Phosphorylation)`, `bionlp_st_2011_epi_COREF:coref)`, `medmentions_full_ner:B-T200)`, `bionlp_st_2013_cg_ner:B-Tissue)`, `chemdner_TEXT:MESH:D000082)`, `chemdner_TEXT:MESH:D037201)`, `bionlp_st_2013_gro_ner:B-ComplexMolecularEntity)`, `bionlp_st_2011_ge_RE:ToLoc)`, `diann_iber_eval_en_ner:B-Neg)`, `bionlp_st_2013_gro_ner:B-RibosomalRNA)`, `bionlp_shared_task_2009_NER:I-Protein_catabolism)`, `chemdner_TEXT:MESH:D016912)`, `medmentions_full_ner:B-T017)`, `bionlp_st_2013_gro_ner:B-CpGIsland)`, `mlee_ner:I-Organism_substance)`, `medmentions_full_ner:I-T075)`, `bionlp_st_2013_gro_ner:I-SecondMessenger)`, `bioinfer_ner:B-Protein_family_or_group)`, `bionlp_st_2013_cg_NER:I-Negative_regulation)`, `mantra_gsc_en_emea_ner:B-CHEM)`, `genia_term_corpus_ner:B-DNA_NA)`, `chemdner_TEXT:MESH:D057888)`, `chemdner_TEXT:MESH:D006495)`, `chemdner_TEXT:MESH:D006575)`, `geokhoj_v1_TEXT:0)`, `bionlp_st_2013_gro_RE:locatedIn)`, `genia_term_corpus_ner:B-virus)`, `bionlp_st_2013_gro_ner:B-RuntLikeDomain)`, `medmentions_full_ner:B-T131)`, `bionlp_st_2013_gro_ner:I-ProteinCodingRegion)`, `chemdner_TEXT:MESH:D015525)`, `genia_term_corpus_ner:I-mono_cell)`, `chemdner_TEXT:MESH:D007840)`, `medmentions_full_ner:I-T098)`, `meddocan_ner:I-ID_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D009930)`, `genia_term_corpus_ner:I-polynucleotide)`, `biorelex_ner:I-protein-region)`, `bionlp_st_2011_id_NER:I-Process)`, `bionlp_st_2013_gro_NER:I-CellularProcess)`, `medmentions_full_ner:B-T023)`, `chemdner_TEXT:MESH:D008942)`, `medmentions_full_ner:I-T070)`, `biorelex_ner:B-organelle)`, `bionlp_st_2013_gro_NER:I-Decrease)`, `verspoor_2013_ner:I-size)`, `chemdner_TEXT:MESH:D002945)`, `ebm_pico_ner:B-Intervention_Other)`, `bionlp_st_2013_cg_ner:I-Simple_chemical)`, `chemdner_TEXT:MESH:D008751)`, `chia_RE:AND)`, `medmentions_full_ner:I-T028)`, `ebm_pico_ner:I-Intervention_Other)`, `chemdner_TEXT:MESH:D005472)`, `chemdner_TEXT:MESH:D005070)`, `gnormplus_ner:B-Gene)`, `medmentions_full_ner:I-T190)`, `mlee_NER:B-Breakdown)`, `bioinfer_ner:B-GeneproteinRNA)`, `bioinfer_ner:B-Gene)`, `chemdner_TEXT:MESH:D006835)`, `chemdner_TEXT:MESH:D004298)`, `chemdner_TEXT:MESH:D002951)`, `chia_ner:I-Device)`, `bionlp_st_2013_pc_NER:B-Conversion)`, `bionlp_shared_task_2009_NER:I-Transcription)`, `mlee_NER:B-DNA_methylation)`, `pubmed_qa_labeled_fold0_CLF:no)`, `minimayosrs_sts:1)`, `chemdner_TEXT:MESH:D002166)`, `chemdner_TEXT:MESH:D005934)`, `bionlp_st_2013_gro_NER:B-CatabolicPathway)`, `tmvar_v1_ner:I-ProteinMutation)`, `verspoor_2013_ner:I-Phenomena)`, `medmentions_full_ner:B-T011)`, `chemdner_TEXT:MESH:D001218)`, `medmentions_full_ner:B-T185)`, `mantra_gsc_en_patents_ner:I-PROC)`, `medmentions_full_ner:I-T120)`, `chia_ner:I-Procedure)`, `genia_term_corpus_ner:I-ANDcell_typecell_type)`, `bionlp_st_2011_id_ner:I-Entity)`, `pcr_ner:B-Chemical)`, `bionlp_st_2013_gro_NER:B-PositiveRegulation)`, `bionlp_st_2011_epi_ner:B-Protein)`, `medmentions_full_ner:B-T055)`, `spl_adr_200db_train_ner:I-Severity)`, `bionlp_st_2013_gro_ner:I-Ion)`, `bionlp_st_2011_id_RE:Cause)`, `bc5cdr_ner:I-Disease)`, `bionlp_st_2013_gro_ner:I-bHLH)`, `chemdner_TEXT:MESH:D001058)`, `bionlp_st_2013_gro_ner:I-AminoAcid)`, `bionlp_st_2011_epi_NER:B-Phosphorylation)`, `medmentions_full_ner:B-T086)`, `chemdner_TEXT:MESH:D004441)`, `medmentions_st21pv_ner:I-T007)`, `biorelex_ner:B-drug)`, `mantra_gsc_en_patents_ner:I-DISO)`, `medmentions_full_ner:I-T197)`, `meddocan_ner:I-FAMILIARES_SUJETO_ASISTENCIA)`, `bionlp_st_2011_ge_RE:AtLoc)`, `bionlp_st_2013_gro_NER:B-MolecularProcess)`, `bionlp_st_2011_ge_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:B-FormationOfTranscriptionInitiationComplex)`, `bionlp_st_2011_ge_NER:I-Binding)`, `mirna_ner:B-GenesProteins)`, `mirna_ner:B-Diseases)`, `mantra_gsc_en_emea_ner:I-DISO)`, `anat_em_ner:I-Multi-tissue_structure)`, `bioinfer_ner:O)`, `chemdner_TEXT:MESH:D017673)`, `bionlp_st_2013_gro_NER:B-Methylation)`, `genia_term_corpus_ner:I-AND_NOTcell_typecell_type)`, `bionlp_st_2013_cg_NER:I-Positive_regulation)`, `bionlp_st_2013_cg_NER:B-Carcinogenesis)`, `chemdner_TEXT:MESH:D009543)`, `gnormplus_ner:I-Gene)`, `bionlp_st_2013_cg_RE:Participant)`, `chemdner_TEXT:MESH:D019804)`, `seth_corpus_RE:Equals)`, `medmentions_full_ner:I-T082)`, `hprd50_ner:O)`, `bionlp_st_2013_gro_ner:B-OxidativeStress)`, `chemdner_TEXT:MESH:D014227)`, `bio_sim_verb_sts:7)`, `bionlp_st_2011_ge_NER:I-Protein_catabolism)`, `bionlp_st_2011_ge_NER:B-Localization)`, `chemdner_TEXT:MESH:D001224)`, `chemdner_TEXT:MESH:D009842)`, `bionlp_st_2013_cg_ner:B-Amino_acid)`, `bionlp_st_2013_gro_NER:B-CellCyclePhase)`, `chemdner_TEXT:MESH:D002245)`, `bionlp_st_2013_ge_NER:I-Ubiquitination)`, `bionlp_st_2013_cg_NER:I-Cell_death)`, `pico_extraction_ner:O)`, `chemdner_TEXT:MESH:D000596)`, `chemdner_TEXT:MESH:D000638)`, `an_em_ner:B-Developing_anatomical_structure)`, `bionlp_st_2019_bb_ner:I-Phenotype)`, `bionlp_st_2013_gro_NER:I-CellDeath)`, `mantra_gsc_en_patents_ner:B-PHYS)`, `chemdner_TEXT:MESH:D009705)`, `genia_term_corpus_ner:B-protein_molecule)`, `mantra_gsc_en_medline_ner:B-PHEN)`, `bionlp_st_2013_gro_NER:I-PosttranslationalModification)`, `ddi_corpus_ner:B-BRAND)`, `mantra_gsc_en_medline_ner:B-DEVI)`, `mlee_NER:I-Planned_process)`, `tmvar_v1_ner:O)`, `bionlp_st_2011_ge_NER:I-Phosphorylation)`, `genia_term_corpus_ner:I-ANDprotein_substructureprotein_substructure)`, `medmentions_st21pv_ner:B-T007)`, `bionlp_st_2013_cg_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-NucleicAcid)`, `medmentions_full_ner:I-T044)`, `chia_ner:I-Person)`, `chemdner_TEXT:MESH:D016572)`, `scai_disease_ner:O)`, `bionlp_st_2013_gro_ner:B-TranscriptionCofactor)`, `chemdner_TEXT:MESH:D002762)`, `chemdner_TEXT:MESH:D011685)`, `chemdner_TEXT:MESH:D005031)`, `scai_disease_ner:I-ADVERSE)`, `biorelex_ner:I-protein-isoform)`, `bionlp_shared_task_2009_COREF:None)`, `meddocan_ner:B-EDAD_SUJETO_ASISTENCIA)`, `genia_term_corpus_ner:I-lipid)`, `biorelex_ner:B-RNA)`, `chemdner_TEXT:MESH:D018020)`, `scai_chemical_ner:B-FAMILY)`, `meddocan_ner:B-ID_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D017382)`, `chemdner_TEXT:MESH:D006027)`, `chemdner_TEXT:MESH:D018942)`, `medmentions_full_ner:I-T024)`, `chemdner_TEXT:MESH:D008050)`, `bionlp_st_2013_cg_NER:B-Glycosylation)`, `chemdner_TEXT:MESH:D019342)`, `chemdner_TEXT:MESH:D008774)`, `bionlp_st_2011_ge_RE:CSite)`, `bionlp_st_2013_gro_ner:B-HMGTF)`, `chemdner_ner:B-Chemical)`, `bioscope_papers_ner:B-negation)`, `biorelex_RE:bind)`, `bioinfer_ner:B-Protein_complex)`, `bionlp_st_2011_epi_NER:B-Ubiquitination)`, `bionlp_st_2013_gro_NER:I-RegulationOfTranscription)`, `chemdner_TEXT:MESH:D011134)`, `bionlp_st_2011_rel_ner:I-Entity)`, `mantra_gsc_en_medline_ner:I-PROC)`, `ncbi_disease_ner:I-DiseaseClass)`, `chemdner_TEXT:MESH:D014315)`, `bionlp_st_2013_gro_ner:I-Chromosome)`, `chemdner_TEXT:MESH:D000639)`, `chemdner_TEXT:MESH:D005740)`, `bionlp_st_2013_gro_ner:I-MolecularFunction)`, `verspoor_2013_ner:B-gene)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomainTF)`, `bionlp_st_2013_gro_ner:B-DNARegion)`, `meddocan_ner:I-NUMERO_FAX)`, `ebm_pico_ner:B-Intervention_Educational)`, `medmentions_st21pv_ner:B-T005)`, `medmentions_full_ner:I-T022)`, `gnormplus_ner:B-FamilyName)`, `bionlp_st_2011_epi_RE:Contextgene)`, `bionlp_st_2013_pc_NER:B-Demethylation)`, `chia_ner:I-Observation)`, `medmentions_full_ner:I-T089)`, `bionlp_st_2013_gro_ner:I-ComplexMolecularEntity)`, `bionlp_st_2013_gro_ner:B-Lipid)`, `biorelex_ner:I-gene)`, `chemdner_TEXT:MESH:D003300)`, `chemdner_TEXT:MESH:D008903)`, `verspoor_2013_RE:relatedTo)`, `bionlp_st_2011_epi_NER:I-DNA_methylation)`, `genia_term_corpus_ner:I-cell_component)`, `bionlp_st_2011_ge_COREF:None)`, `ebm_pico_ner:B-Participant_Sample-size)`, `chemdner_TEXT:MESH:D043823)`, `chemdner_TEXT:MESH:D004958)`, `bionlp_st_2013_gro_ner:I-RNA)`, `chemdner_TEXT:MESH:D006150)`, `bionlp_st_2013_gro_ner:B-MolecularStructure)`, `meddocan_ner:B-OTROS_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D007457)`, `bionlp_st_2013_gro_ner:I-OxidativeStress)`, `scai_chemical_ner:B-PARTIUPAC)`, `mlee_NER:I-Blood_vessel_development)`, `bionlp_shared_task_2009_ner:B-Entity)`, `bionlp_st_2013_ge_RE:CSite)`, `medmentions_full_ner:B-T058)`, `chemdner_TEXT:MESH:D000628)`, `ebm_pico_ner:I-Intervention_Surgical)`, `an_em_ner:I-Organ)`, `bionlp_st_2013_gro_NER:B-Increase)`, `iepa_RE:PPI)`, `mlee_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D014284)`, `chemdner_TEXT:MESH:D014260)`, `bionlp_st_2011_epi_NER:I-Glycosylation)`, `bionlp_st_2013_gro_NER:B-BindingToProtein)`, `bionlp_st_2013_gro_NER:B-BindingToRNA)`, `medmentions_full_ner:I-T047)`, `bionlp_st_2013_gro_NER:B-Localization)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfGeneExpression)`, `medmentions_full_ner:I-T051)`, `bionlp_st_2011_id_COREF:None)`, `chemdner_TEXT:MESH:D011744)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToDNA)`, `bionlp_st_2013_gro_ner:B-CatalyticActivity)`, `chebi_nactem_abstr_ann1_ner:I-Biological_Activity)`, `cadec_ner:B-Symptom)`, `bio_sim_verb_sts:1)`, `chemdner_TEXT:MESH:D012402)`, `bionlp_st_2013_gro_ner:B-bZIPTF)`, `chemdner_TEXT:MESH:D003913)`, `bionlp_shared_task_2009_RE:Site)`, `bionlp_st_2013_gro_ner:I-AntisenseRNA)`, `bionlp_st_2013_gro_NER:B-ProteinTargeting)`, `bionlp_st_2013_gro_NER:B-GeneExpression)`, `bionlp_st_2013_cg_NER:I-Blood_vessel_development)`, `mantra_gsc_en_patents_ner:I-CHEM)`, `mayosrs_sts:2)`, `chemdner_TEXT:MESH:D001645)`, `bionlp_st_2011_ge_NER:I-Transcription)`, `bionlp_st_2011_epi_NER:B-Acetylation)`, `medmentions_full_ner:B-T002)`, `verspoor_2013_ner:I-Concepts_Ideas)`, `hprd50_RE:None)`, `ddi_corpus_ner:O)`, `chemdner_TEXT:MESH:D014131)`, `ebm_pico_ner:B-Outcome_Physical)`, `medmentions_st21pv_ner:B-T103)`, `chemdner_TEXT:MESH:D016650)`, `mlee_NER:B-Cell_proliferation)`, `bionlp_st_2013_gro_ner:I-TranscriptionCoactivator)`, `chebi_nactem_fullpaper_ner:I-Chemical)`, `chemdner_TEXT:MESH:D013256)`, `biorelex_ner:I-protein-DNA-complex)`, `chemdner_TEXT:MESH:D008767)`, `bioinfer_RE:None)`, `nlm_gene_ner:B-Gene)`, `bionlp_st_2013_gro_ner:B-ReporterGene)`, `biosses_sts:1)`, `chemdner_TEXT:MESH:D000493)`, `chemdner_TEXT:MESH:D011374)`, `cadec_ner:I-Drug)`, `ebm_pico_ner:B-Intervention_Control)`, `bionlp_st_2013_pc_NER:I-Pathway)`, `chemprot_RE:CPR:3)`, `bionlp_st_2013_cg_ner:I-Amino_acid)`, `chemdner_TEXT:MESH:D005557)`, `bionlp_st_2011_ge_RE:Site)`, `bionlp_st_2013_pc_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:I-Elongation)`, `bionlp_st_2011_ge_NER:I-Localization)`, `spl_adr_200db_train_ner:B-Negation)`, `chemdner_TEXT:MESH:D010455)`, `nlm_gene_ner:B-GENERIF)`, `bionlp_st_2013_gro_NER:B-BindingOfTFToTFBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D017953)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscription)`, `osiris_ner:B-gene)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressor)`, `medmentions_full_ner:I-T131)`, `genia_term_corpus_ner:B-protein_family_or_group)`, `genia_term_corpus_ner:B-cell_type)`, `chemdner_TEXT:MESH:D013759)`, `chemdner_TEXT:MESH:D002247)`, `meddocan_ner:I-NOMBRE_PERSONAL_SANITARIO)`, `scai_chemical_ner:I-FAMILY)`, `chemdner_TEXT:MESH:D006020)`, `biorelex_ner:B-DNA)`, `chebi_nactem_abstr_ann1_ner:I-Spectral_Data)`, `mantra_gsc_en_medline_ner:B-DISO)`, `pharmaconer_ner:B-NORMALIZABLES)`, `chemdner_TEXT:MESH:D019829)`, `ncbi_disease_ner:I-CompositeMention)`, `chemdner_TEXT:MESH:D013876)`, `chebi_nactem_fullpaper_ner:I-Spectral_Data)`, `biorelex_ner:I-DNA)`, `chemdner_TEXT:MESH:D005492)`, `chemdner_TEXT:MESH:D011810)`, `chemdner_TEXT:MESH:D008563)`, `chemdner_TEXT:MESH:D015735)`, `bionlp_st_2019_bb_ner:B-Microorganism)`, `ddi_corpus_RE:INT)`, `medmentions_st21pv_ner:B-T038)`, `bionlp_st_2013_gro_NER:B-CellCyclePhaseTransition)`, `cellfinder_ner:B-CellLine)`, `pdr_RE:Cause)`, `meddocan_ner:B-PAIS)`, `chemdner_TEXT:MESH:D011433)`, `chemdner_TEXT:MESH:D011720)`, `chemdner_TEXT:MESH:D020156)`, `ebm_pico_ner:O)`, `mlee_ner:B-Organ)`, `chemdner_TEXT:MESH:D012721)`, `chebi_nactem_fullpaper_ner:I-Biological_Activity)`, `bionlp_st_2013_cg_COREF:coref)`, `chemdner_TEXT:MESH:D006918)`, `medmentions_full_ner:B-T092)`, `genia_term_corpus_ner:B-protein_NA)`, `bionlp_st_2013_ge_ner:B-Entity)`, `an_em_ner:B-Multi-tissue_structure)`, `chia_ner:I-Measurement)`, `chia_RE:Has_temporal)`, `bionlp_st_2011_id_NER:B-Protein_catabolism)`, `bionlp_st_2013_gro_NER:B-CellAdhesion)`, `bionlp_st_2013_gro_ner:B-DNABindingSite)`, `biorelex_ner:B-organism)`, `scai_disease_ner:I-DISEASE)`, `bionlp_st_2013_gro_ner:I-DNABindingSite)`, `chemdner_TEXT:MESH:D016607)`, `chemdner_TEXT:MESH:D030421)`, `bionlp_st_2013_pc_NER:I-Binding)`, `medmentions_full_ner:I-T029)`, `chemdner_TEXT:MESH:D001569)`, `genia_term_corpus_ner:B-ANDcell_typecell_type)`, `scai_chemical_ner:B-SUM)`, `chemdner_TEXT:MESH:D007656)`, `medmentions_full_ner:B-T082)`, `chemdner_TEXT:MESH:D009525)`, `medmentions_full_ner:B-T079)`, `bionlp_st_2013_cg_NER:B-Synthesis)`, `biorelex_ner:B-process)`, `bionlp_st_2013_ge_RE:Theme)`, `chemdner_TEXT:MESH:D012825)`, `chemdner_TEXT:MESH:D005462)`, `bionlp_st_2013_cg_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-CellCycle)`, `cellfinder_ner:I-CellLine)`, `bionlp_st_2013_gro_ner:I-DNABindingDomainOfProtein)`, `medmentions_st21pv_ner:B-T168)`, `genia_term_corpus_ner:B-body_part)`, `genia_term_corpus_ner:B-ANDprotein_family_or_groupprotein_family_or_group)`, `mlee_ner:B-Tissue)`, `meddocan_ner:B-ID_ASEGURAMIENTO)`, `mlee_NER:I-Localization)`, `medmentions_full_ner:B-T125)`, `meddocan_ner:I-CENTRO_SALUD)`, `bionlp_st_2013_cg_NER:B-Infection)`, `chebi_nactem_abstr_ann1_ner:I-Protein)`, `chemdner_TEXT:MESH:D009570)`, `medmentions_full_ner:I-T045)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivator)`, `verspoor_2013_ner:B-disease)`, `medmentions_full_ner:I-T056)`, `medmentions_full_ner:B-T050)`, `bionlp_st_2013_gro_ner:B-MolecularFunction)`, `medmentions_full_ner:B-T060)`, `bionlp_st_2013_gro_ner:B-Cell)`, `medmentions_full_ner:I-T060)`, `bionlp_st_2013_pc_NER:I-Gene_expression)`, `genia_term_corpus_ner:B-RNA_NA)`, `bionlp_st_2013_gro_ner:I-MessengerRNA)`, `medmentions_full_ner:I-T086)`, `an_em_RE:Part-of)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_gro_NER:I-Splicing)`, `bioinfer_RE:PPI)`, `bioscope_papers_ner:I-speculation)`, `bionlp_st_2013_gro_ner:B-HomeoBox)`, `medmentions_full_ner:B-T004)`, `chia_ner:I-Drug)`, `bionlp_st_2013_gro_ner:B-FusionOfGeneWithReporterGene)`, `genia_term_corpus_ner:I-cell_line)`, `chebi_nactem_abstr_ann1_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-ExpressionProfiling)`, `chemdner_TEXT:MESH:D004390)`, `medmentions_full_ner:B-T016)`, `bionlp_st_2013_cg_NER:B-Growth)`, `medmentions_full_ner:I-T170)`, `medmentions_full_ner:B-T093)`, `genia_term_corpus_ner:I-inorganic)`, `mlee_NER:B-Planned_process)`, `bionlp_st_2013_gro_RE:hasPart)`, `bionlp_st_2013_gro_ner:B-BasicDomain)`, `chemdner_TEXT:MESH:D050091)`, `medmentions_st21pv_ner:B-T037)`, `chemdner_TEXT:MESH:D011522)`, `bionlp_st_2013_ge_NER:B-Deacetylation)`, `chemdner_TEXT:MESH:D004008)`, `chemdner_TEXT:MESH:D013972)`, `bionlp_st_2013_gro_NER:B-SignalingPathway)`, `bionlp_st_2013_gro_ner:B-Promoter)`, `chemdner_TEXT:MESH:D012701)`, `an_em_COREF:None)`, `bionlp_st_2019_bb_RE:None)`, `mlee_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_NER:I-Translation)`, `chemdner_TEXT:MESH:D013453)`, `genia_term_corpus_ner:I-ANDprotein_moleculeprotein_molecule)`, `chemdner_TEXT:MESH:D002746)`, `chebi_nactem_abstr_ann1_ner:O)`, `bionlp_st_2013_pc_ner:O)`, `mayosrs_sts:7)`, `bionlp_st_2013_cg_NER:B-Pathway)`, `verspoor_2013_ner:I-age)`, `biorelex_ner:I-peptide)`, `medmentions_full_ner:I-T096)`, `chebi_nactem_fullpaper_ner:I-Chemical_Structure)`, `chemdner_TEXT:MESH:D007211)`, `medmentions_full_ner:I-T018)`, `medmentions_full_ner:B-T201)`, `bionlp_st_2013_gro_NER:B-BindingOfTFToTFBindingSiteOfProtein)`, `medmentions_full_ner:B-T054)`, `ebm_pico_ner:I-Intervention_Pharmacological)`, `chemdner_TEXT:MESH:D010672)`, `chemdner_TEXT:MESH:D004492)`, `chemdner_TEXT:MESH:D008094)`, `chemdner_TEXT:MESH:D002227)`, `chemdner_TEXT:MESH:D009553)`, `bionlp_st_2013_gro_NER:I-ResponseProcess)`, `chemdner_TEXT:MESH:D006046)`, `ebm_pico_ner:B-Participant_Condition)`, `nlm_gene_ner:I-Gene)`, `bionlp_st_2019_bb_ner:I-Habitat)`, `bionlp_shared_task_2009_COREF:coref)`, `chemdner_TEXT:MESH:D005640)`, `mantra_gsc_en_emea_ner:B-PHYS)`, `mantra_gsc_en_patents_ner:B-DISO)`, `bionlp_st_2013_gro_ner:B-Heterochromatin)`, `bionlp_st_2013_gro_NER:I-CellCycle)`, `bionlp_st_2013_cg_NER:I-Cell_proliferation)`, `bionlp_st_2013_cg_ner:B-Simple_chemical)`, `genia_term_corpus_ner:I-cell_type)`, `chemdner_TEXT:MESH:D003553)`, `bionlp_st_2013_ge_RE:Theme2)`, `tmvar_v1_ner:B-ProteinMutation)`, `chemdner_TEXT:MESH:D012717)`, `chemdner_TEXT:MESH:D026121)`, `chemdner_TEXT:MESH:D008687)`, `bionlp_st_2013_gro_NER:I-TranscriptionTermination)`, `medmentions_full_ner:B-T028)`, `biorelex_ner:B-assay)`, `genia_term_corpus_ner:B-tissue)`, `chemdner_TEXT:MESH:D009173)`, `bionlp_st_2013_gro_ner:B-TranscriptionCoactivator)`, `genia_term_corpus_ner:B-amino_acid_monomer)`, `mantra_gsc_en_emea_ner:B-DEVI)`, `bionlp_st_2013_gro_NER:B-Growth)`, `chemdner_TEXT:MESH:D017374)`, `genia_term_corpus_ner:B-other_artificial_source)`, `medmentions_full_ner:B-T072)`, `bionlp_st_2013_gro_NER:B-CellGrowth)`, `bionlp_st_2013_gro_ner:I-DoubleStrandDNA)`, `chemdner_ner:O)`, `bionlp_shared_task_2009_NER:I-Localization)`, `bionlp_st_2013_gro_NER:B-RegulationOfPathway)`, `genia_term_corpus_ner:I-amino_acid_monomer)`, `bionlp_st_2013_gro_NER:I-SPhase)`, `an_em_ner:B-Organism_substance)`, `medmentions_full_ner:B-T052)`, `meddocan_ner:B-TERRITORIO)`, `genia_term_corpus_ner:B-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:B-T096)`, `chemdner_TEXT:MESH:D056831)`, `chemdner_TEXT:MESH:D010755)`, `pdr_NER:I-Cause_of_disease)`, `mlee_NER:B-Phosphorylation)`, `medmentions_full_ner:I-T064)`, `chemdner_TEXT:MESH:D005978)`, `mantra_gsc_en_medline_ner:I-PHEN)`, `bionlp_st_2013_cg_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_NER:B-Modification)`, `bionlp_st_2013_gro_ner:B-ProteinComplex)`, `bionlp_st_2013_gro_ner:B-DoubleStrandDNA)`, `medmentions_full_ner:B-T068)`, `medmentions_full_ner:I-T034)`, `bionlp_st_2011_epi_NER:B-Catalysis)`, `biosses_sts:0)`, `bionlp_st_2013_cg_ner:B-Organism_substance)`, `chemdner_TEXT:MESH:D055549)`, `bionlp_st_2013_cg_NER:B-Glycolysis)`, `chemdner_TEXT:MESH:D001761)`, `chemdner_TEXT:MESH:D011728)`, `bionlp_st_2013_gro_ner:B-Function)`, `medmentions_full_ner:I-T033)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfTranscriptionOfGene)`, `medmentions_full_ner:I-T053)`, `bionlp_st_2013_gro_ner:B-Protein)`, `genia_term_corpus_ner:I-ANDprotein_family_or_groupprotein_family_or_group)`, `bionlp_st_2013_gro_NER:I-CatabolicPathway)`, `biorelex_ner:I-chemical)`, `chemdner_TEXT:MESH:D013185)`, `biorelex_ner:I-RNA)`, `chemdner_TEXT:MESH:D009838)`, `medmentions_full_ner:I-T008)`, `meddocan_ner:B-INSTITUCION)`, `chemdner_TEXT:MESH:D002104)`, `bionlp_st_2013_gro_NER:B-RNABiosynthesis)`, `verspoor_2013_ner:I-ethnicity)`, `bionlp_st_2013_gro_ner:I-SmallInterferingRNA)`, `chemdner_TEXT:MESH:D026023)`, `mlee_ner:O)`, `bionlp_st_2013_gro_NER:I-CellHomeostasis)`, `bionlp_st_2013_pc_NER:B-Pathway)`, `gnormplus_ner:I-DomainMotif)`, `bionlp_st_2013_gro_ner:I-OpenReadingFrame)`, `bionlp_st_2013_gro_NER:I-RegulationOfGeneExpression)`, `muchmore_en_ner:O)`, `chemdner_TEXT:MESH:D000911)`, `bionlp_st_2011_epi_NER:B-DNA_demethylation)`, `meddocan_ner:B-CENTRO_SALUD)`, `bionlp_st_2013_gro_ner:I-RuntLikeDomain)`, `chemdner_TEXT:MESH:D010748)`, `medmentions_full_ner:B-T008)`, `biorelex_ner:B-protein-RNA-complex)`, `bionlp_st_2013_cg_NER:I-Planned_process)`, `chemdner_TEXT:MESH:D014867)`, `mantra_gsc_en_patents_ner:I-LIVB)`, `bionlp_st_2013_gro_NER:I-Silencing)`, `chemdner_TEXT:MESH:D015306)`, `chemdner_TEXT:MESH:D001679)`, `bionlp_shared_task_2009_NER:I-Positive_regulation)`, `linnaeus_filtered_ner:O)`, `chia_RE:Has_multiplier)`, `medmentions_full_ner:B-T116)`, `bionlp_shared_task_2009_NER:B-Positive_regulation)`, `anat_em_ner:B-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D011137)`, `chemdner_TEXT:MESH:D048271)`, `chemdner_TEXT:MESH:D003975)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressorActivity)`, `bionlp_st_2011_id_ner:B-Protein)`, `bionlp_st_2013_gro_NER:I-Mutation)`, `chemdner_TEXT:MESH:D001572)`, `mantra_gsc_en_patents_ner:B-CHEM)`, `mantra_gsc_en_medline_ner:I-DEVI)`, `bionlp_st_2013_gro_ner:B-Enzyme)`, `medmentions_full_ner:B-T056)`, `meddocan_ner:I-TERRITORIO)`, `mantra_gsc_en_patents_ner:B-OBJC)`, `medmentions_full_ner:B-T073)`, `anat_em_ner:I-Tissue)`, `chemdner_TEXT:MESH:D047310)`, `chia_ner:I-Scope)`, `ncbi_disease_ner:B-Modifier)`, `medmentions_st21pv_ner:B-T082)`, `medmentions_full_ner:I-T054)`, `genia_term_corpus_ner:I-carbohydrate)`, `bionlp_st_2013_cg_RE:Theme)`, `chemdner_TEXT:MESH:D009538)`, `chemdner_TEXT:MESH:D008691)`, `genia_term_corpus_ner:B-ANDprotein_substructureprotein_substructure)`, `bionlp_st_2013_cg_ner:I-Tissue)`, `chia_ner:B-Device)`, `chemdner_TEXT:MESH:D002784)`, `medmentions_full_ner:I-T007)`, `bionlp_st_2013_gro_ner:I-DNAFragment)`, `spl_adr_200db_train_ner:I-AdverseReaction)`, `bionlp_st_2013_cg_NER:B-Catabolism)`, `chemdner_TEXT:MESH:D013779)`, `bionlp_st_2013_pc_NER:B-Regulation)`, `bionlp_st_2013_gro_NER:I-Disease)`, `chia_ner:I-Condition)`, `chemdner_TEXT:MESH:D012370)`, `bionlp_st_2013_ge_NER:O)`, `bionlp_st_2013_pc_NER:B-Deubiquitination)`, `bionlp_st_2013_pc_NER:I-Translation)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_cg_NER:B-DNA_methylation)`, `bioscope_papers_ner:B-speculation)`, `chemdner_TEXT:MESH:D018130)`, `bionlp_st_2013_gro_ner:B-RNAPolymeraseII)`, `medmentions_st21pv_ner:B-T098)`, `bionlp_st_2013_gro_NER:B-Elongation)`, `bionlp_st_2013_pc_RE:Cause)`, `seth_corpus_ner:B-RS)`, `bionlp_st_2013_ge_RE:ToLoc)`, `chemdner_TEXT:MESH:D000538)`, `medmentions_full_ner:B-T192)`, `medmentions_full_ner:B-T061)`, `medmentions_full_ner:B-T032)`, `bionlp_st_2013_gro_NER:B-Transport)`, `medmentions_full_ner:I-T014)`, `chemdner_TEXT:MESH:D004137)`, `medmentions_full_ner:B-T101)`, `bionlp_st_2013_gro_NER:B-Transcription)`, `bionlp_st_2013_pc_NER:B-Transport)`, `medmentions_full_ner:I-T203)`, `ebm_pico_ner:I-Intervention_Control)`, `genia_term_corpus_ner:I-atom)`, `chemdner_TEXT:MESH:D014230)`, `cadec_ner:B-Drug)`, `osiris_ner:I-gene)`, `mantra_gsc_en_patents_ner:B-ANAT)`, `ncbi_disease_ner:I-SpecificDisease)`, `bionlp_st_2013_gro_NER:I-CellGrowth)`, `chemdner_TEXT:MESH:D001205)`, `chemdner_TEXT:MESH:D016627)`, `meddocan_ner:B-FAMILIARES_SUJETO_ASISTENCIA)`, `genia_term_corpus_ner:B-protein_subunit)`, `bionlp_st_2013_gro_ner:I-CellComponent)`, `medmentions_full_ner:B-T049)`, `scai_chemical_ner:O)`, `chemdner_TEXT:MESH:D010840)`, `chemdner_TEXT:MESH:D008694)`, `mantra_gsc_en_patents_ner:B-PHEN)`, `bionlp_st_2013_cg_RE:Cause)`, `chemdner_TEXT:MESH:D012293)`, `bionlp_st_2013_gro_NER:B-Homodimerization)`, `chemdner_TEXT:MESH:D008070)`, `chia_RE:OR)`, `bionlp_st_2013_cg_ner:I-Gene_or_gene_product)`, `verspoor_2013_ner:I-disease)`, `muchmore_en_ner:B-umlsterm)`, `chemdner_TEXT:MESH:D011794)`, `medmentions_full_ner:I-T002)`, `chemdner_TEXT:MESH:D007649)`, `genia_term_corpus_ner:B-AND_NOTcell_typecell_type)`, `medmentions_full_ner:I-T023)`, `chemprot_RE:CPR:1)`, `chemdner_TEXT:MESH:D001786)`, `bionlp_st_2013_gro_ner:B-HomeoboxTF)`, `bionlp_st_2013_cg_ner:I-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-Attenuator)`, `bionlp_st_2019_bb_ner:B-Habitat)`, `chemdner_TEXT:MESH:D017931)`, `medmentions_full_ner:B-T047)`, `chemdner_TEXT:MESH:D006886)`, `genia_term_corpus_ner:I-)`, `medmentions_full_ner:B-T039)`, `chemdner_TEXT:MESH:D004220)`, `bionlp_st_2013_pc_RE:FromLoc)`, `nlm_gene_ner:I-GENERIF)`, `bionlp_st_2013_ge_NER:I-Protein_modification)`, `genia_term_corpus_ner:B-RNA_molecule)`, `chemdner_TEXT:MESH:D006854)`, `chemdner_TEXT:MESH:D006493)`, `chia_ner:B-Qualifier)`, `medmentions_full_ner:I-T013)`, `ehr_rel_sts:8)`, `an_em_RE:frag)`, `genia_term_corpus_ner:I-DNA_substructure)`, `chemdner_TEXT:MESH:D063065)`, `genia_term_corpus_ner:I-ANDprotein_complexprotein_complex)`, `pharmaconer_ner:I-NORMALIZABLES)`, `bionlp_st_2013_pc_NER:I-Dissociation)`, `medmentions_full_ner:I-T004)`, `bionlp_st_2013_cg_ner:B-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D010069)`, `bionlp_st_2013_gro_NER:I-Homodimerization)`, `chemdner_TEXT:MESH:D006147)`, `medmentions_full_ner:I-T041)`, `distemist_ner:O)`, `bionlp_st_2011_id_NER:B-Regulation)`, `bionlp_st_2013_gro_ner:O)`, `chemdner_TEXT:MESH:D008623)`, `bionlp_st_2013_ge_ner:I-Protein)`, `scai_chemical_ner:I-TRIVIAL)`, `an_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-BindingAssay)`, `bionlp_st_2013_gro_ner:I-HMG)`, `anat_em_ner:I-Anatomical_system)`, `chemdner_TEXT:MESH:D015034)`, `mlee_NER:B-Catabolism)`, `mantra_gsc_en_medline_ner:B-LIVB)`, `meddocan_ner:B-HOSPITAL)`, `ddi_corpus_ner:I-BRAND)`, `chia_ner:I-Multiplier)`, `bionlp_st_2013_gro_ner:I-SequenceHomologyAnalysis)`, `seth_corpus_RE:None)`, `bionlp_st_2013_cg_NER:B-Binding)`, `bioscope_papers_ner:I-negation)`, `cadec_ner:B-Finding)`, `chemdner_TEXT:MESH:D008741)`, `chemdner_TEXT:MESH:D052998)`, `chemdner_TEXT:MESH:D005227)`, `meddocan_ner:I-ID_TITULACION_PERSONAL_SANITARIO)`, `chemdner_TEXT:MESH:D009828)`, `spl_adr_200db_train_ner:B-Animal)`, `chemdner_TEXT:MESH:D010616)`, `bionlp_st_2013_gro_ner:I-ProteinComplex)`, `pico_extraction_ner:B-outcome)`, `mlee_NER:B-Negative_regulation)`, `chemdner_TEXT:MESH:D007093)`, `bionlp_st_2013_gro_NER:I-RNAProcessing)`, `biorelex_ner:I-reagent)`, `medmentions_st21pv_ner:I-T074)`, `bionlp_st_2013_gro_NER:B-BindingOfMolecularEntity)`, `chemdner_TEXT:MESH:D008911)`, `medmentions_full_ner:B-T033)`, `genia_term_corpus_ner:B-ANDprotein_complexprotein_complex)`, `medmentions_full_ner:I-T100)`, `chemdner_TEXT:MESH:D019259)`, `genia_term_corpus_ner:I-BUT_NOTother_nameother_name)`, `geokhoj_v1_TEXT:1)`, `bionlp_st_2013_cg_RE:Site)`, `medmentions_full_ner:B-T184)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelixTF)`, `bionlp_st_2013_cg_ner:I-Protein_domain_or_region)`, `genia_term_corpus_ner:I-other_organic_compound)`, `chemdner_TEXT:MESH:D010793)`, `bionlp_st_2011_id_NER:B-Phosphorylation)`, `chemdner_TEXT:MESH:D002482)`, `bionlp_st_2013_cg_NER:B-Breakdown)`, `biorelex_ner:I-disease)`, `genia_term_corpus_ner:B-DNA_substructure)`, `medmentions_full_ner:B-T127)`, `medmentions_full_ner:I-T185)`, `bionlp_shared_task_2009_RE:AtLoc)`, `medmentions_full_ner:I-T201)`, `chemdner_TEXT:MESH:D005290)`, `mlee_NER:I-Breakdown)`, `medmentions_full_ner:I-T063)`, `chemdner_TEXT:MESH:D017964)`, `an_em_ner:I-Tissue)`, `mlee_ner:I-Organism)`, `mantra_gsc_en_emea_ner:I-CHEM)`, `bionlp_st_2013_cg_ner:B-Anatomical_system)`, `genia_term_corpus_ner:B-ORDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_pc_NER:B-Degradation)`, `chemprot_RE:CPR:0)`, `genia_term_corpus_ner:B-inorganic)`, `chemdner_TEXT:MESH:D005466)`, `chia_ner:O)`, `medmentions_full_ner:B-T078)`, `mlee_NER:B-Growth)`, `mantra_gsc_en_emea_ner:B-PHEN)`, `chemdner_TEXT:MESH:D012545)`, `bionlp_st_2013_gro_NER:B-G1Phase)`, `chemdner_TEXT:MESH:D009841)`, `bionlp_st_2013_gro_ner:B-Chromatin)`, `bionlp_st_2011_epi_RE:Site)`, `medmentions_full_ner:B-T066)`, `genetaggold_ner:O)`, `bionlp_st_2013_cg_NER:I-Gene_expression)`, `medmentions_st21pv_ner:B-T092)`, `chemprot_RE:CPR:8)`, `bionlp_st_2013_cg_RE:Instrument)`, `nlm_gene_ner:I-Domain)`, `chemdner_TEXT:MESH:D006151)`, `bionlp_st_2011_id_ner:I-Protein)`, `meddocan_ner:I-FECHAS)`, `mlee_NER:B-Synthesis)`, `bionlp_st_2013_gro_NER:B-CellMotility)`, `scai_chemical_ner:B-MODIFIER)`, `pharmaconer_ner:B-PROTEINAS)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfTranscription)`, `osiris_ner:O)`, `mlee_NER:B-Acetylation)`, `medmentions_st21pv_ner:B-T062)`, `chemdner_TEXT:MESH:D017705)`, `bionlp_st_2013_gro_NER:I-TranscriptionOfGene)`, `genia_term_corpus_ner:I-protein_complex)`, `chemprot_RE:CPR:10)`, `medmentions_full_ner:B-T102)`, `medmentions_full_ner:I-T171)`, `chia_ner:B-Reference_point)`, `medmentions_full_ner:B-T015)`, `bionlp_st_2013_gro_ner:I-RNAPolymerase)`, `chebi_nactem_abstr_ann1_ner:B-Metabolite)`, `bionlp_st_2013_gro_NER:I-CellDifferentiation)`, `chemdner_TEXT:MESH:D006861)`, `pubmed_qa_labeled_fold0_CLF:maybe)`, `bionlp_st_2013_gro_ner:I-Sequence)`, `mlee_NER:B-Transcription)`, `bc5cdr_ner:B-Chemical)`, `chemdner_TEXT:MESH:D000072317)`, `bionlp_st_2013_gro_NER:B-Producing)`, `genia_term_corpus_ner:B-ANDprotein_moleculeprotein_molecule)`, `bionlp_st_2011_id_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-MolecularInteraction)`, `chemdner_TEXT:MESH:D014639)`, `bionlp_st_2013_gro_NER:I-Increase)`, `mlee_NER:I-Translation)`, `medmentions_full_ner:B-T087)`, `bioscope_abstracts_ner:B-speculation)`, `ebm_pico_ner:B-Outcome_Adverse-effects)`, `mantra_gsc_en_medline_ner:B-PHYS)`, `bionlp_st_2013_gro_ner:I-Lipid)`, `bionlp_st_2011_ge_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D005278)`, `bionlp_shared_task_2009_NER:B-Phosphorylation)`, `mlee_NER:I-Gene_expression)`, `bionlp_st_2011_epi_NER:I-Deacetylation)`, `chemdner_TEXT:MESH:D002110)`, `medmentions_full_ner:I-T121)`, `bionlp_st_2011_epi_ner:I-Entity)`, `bionlp_st_2019_bb_RE:Lives_In)`, `chemdner_TEXT:MESH:D001710)`, `anat_em_ner:B-Cancer)`, `bionlp_st_2013_gro_NER:B-RNASplicing)`, `mantra_gsc_en_medline_ner:I-ANAT)`, `chemdner_TEXT:MESH:D024508)`, `chemdner_TEXT:MESH:D000537)`, `mantra_gsc_en_medline_ner:I-DISO)`, `bionlp_st_2013_gro_ner:I-Prokaryote)`, `bionlp_st_2013_gro_ner:I-Chromatin)`, `meddocan_ner:B-NUMERO_FAX)`, `bionlp_st_2013_gro_ner:B-Nucleotide)`, `linnaeus_ner:I-species)`, `verspoor_2013_ner:I-body-part)`, `bionlp_st_2013_gro_ner:B-DNAFragment)`, `bionlp_st_2013_gro_ner:B-PositiveTranscriptionRegulator)`, `medmentions_full_ner:I-T049)`, `bionlp_st_2011_ge_ner:B-Entity)`, `medmentions_full_ner:I-T017)`, `bionlp_st_2013_gro_NER:B-TranscriptionOfGene)`, `chemdner_TEXT:MESH:D009947)`, `mlee_NER:B-Dephosphorylation)`, `bionlp_st_2013_gro_NER:B-GeneSilencing)`, `pdr_RE:None)`, `scai_chemical_ner:I-TRIVIALVAR)`, `bionlp_st_2011_epi_NER:O)`, `bionlp_st_2013_cg_ner:I-Cell)`, `sciq_SEQ:None)`, `chemdner_TEXT:MESH:D019913)`, `chia_ner:I-Negation)`, `chemdner_TEXT:MESH:D014801)`, `chemdner_TEXT:MESH:D058846)`, `chemdner_TEXT:MESH:D011809)`, `bionlp_st_2011_epi_ner:O)`, `bionlp_st_2013_cg_NER:I-Metastasis)`, `chemdner_TEXT:MESH:D012643)`, `an_em_ner:I-Cell)`, `bionlp_st_2013_gro_ner:I-CatalyticActivity)`, `anat_em_ner:B-Anatomical_system)`, `mlee_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:I-ChromosomalDNA)`, `anat_em_ner:B-Cell)`, `chemdner_TEXT:MESH:D000242)`, `chemdner_TEXT:MESH:D017641)`, `bioscope_abstracts_ner:I-negation)`, `medmentions_st21pv_ner:B-T058)`, `chemdner_TEXT:MESH:D008744)`, `bionlp_st_2013_gro_ner:B-UpstreamRegulatorySequence)`, `chemdner_TEXT:MESH:D008012)`, `medmentions_full_ner:B-T013)`, `bionlp_st_2011_epi_NER:B-Glycosylation)`, `chemdner_TEXT:MESH:D052999)`, `chemdner_TEXT:MESH:D002329)`, `ebm_pico_ner:I-Intervention_Physical)`, `bionlp_st_2013_pc_ner:B-Complex)`, `medmentions_st21pv_ner:I-T005)`, `chemdner_TEXT:MESH:D064704)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomainTF)`, `bionlp_st_2013_pc_ner:I-Cellular_component)`, `genia_term_corpus_ner:B-ANDDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_gro_ner:B-Chromosome)`, `chemdner_TEXT:MESH:D007546)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfGeneExpression)`, `medmentions_full_ner:I-T010)`, `pdr_NER:B-Treatment_of_disease)`, `medmentions_full_ner:B-T081)`, `bionlp_st_2011_epi_NER:B-Demethylation)`, `chemdner_TEXT:MESH:D013261)`, `bionlp_st_2013_gro_ner:I-RibosomalRNA)`, `verspoor_2013_ner:O)`, `bionlp_st_2013_gro_NER:B-DevelopmentalProcess)`, `chemdner_TEXT:MESH:D009270)`, `medmentions_full_ner:I-T130)`, `bionlp_st_2013_cg_ner:B-Organism)`, `medmentions_full_ner:B-T014)`, `chemdner_TEXT:MESH:D003374)`, `chemdner_TEXT:MESH:D011078)`, `cellfinder_ner:B-GeneProtein)`, `mayosrs_sts:6)`, `chemdner_TEXT:MESH:D005576)`, `bionlp_st_2013_ge_RE:Cause)`, `an_em_RE:None)`, `sciq_SEQ:answer)`, `bionlp_st_2013_cg_NER:B-Dissociation)`, `mlee_RE:frag)`, `bionlp_st_2013_pc_COREF:coref)`, `meddocan_ner:B-NOMBRE_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D008469)`, `ncbi_disease_ner:O)`, `bionlp_st_2011_epi_ner:I-Protein)`, `chemdner_TEXT:MESH:D011140)`, `chemdner_TEXT:MESH:D020001)`, `bionlp_st_2013_gro_ner:I-ThreeDimensionalMolecularStructure)`, `bionlp_st_2013_cg_ner:B-Cancer)`, `genia_term_corpus_ner:B-BUT_NOTother_nameother_name)`, `chemdner_TEXT:MESH:D006862)`, `medmentions_full_ner:B-T104)`, `bionlp_st_2011_epi_RE:Theme)`, `cellfinder_ner:B-Anatomy)`, `chemdner_TEXT:MESH:D010545)`, `biorelex_ner:B-RNA-family)`, `pico_extraction_ner:I-outcome)`, `mantra_gsc_en_patents_ner:I-PHYS)`, `bionlp_st_2013_pc_NER:I-Transcription)`, `bionlp_shared_task_2009_RE:Cause)`, `bionlp_st_2013_gro_ner:B-Vitamin)`, `bionlp_shared_task_2009_RE:CSite)`, `bionlp_st_2011_ge_ner:I-Protein)`, `mlee_COREF:coref)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelix)`, `bioinfer_ner:I-Gene)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivatorActivity)`, `chemdner_TEXT:MESH:D054439)`, `chemdner_TEXT:MESH:D011621)`, `ddi_corpus_ner:I-DRUG_N)`, `chemdner_TEXT:MESH:D019308)`, `bionlp_st_2013_gro_ner:I-Locus)`, `bionlp_shared_task_2009_RE:ToLoc)`, `bionlp_st_2013_cg_NER:B-Development)`, `bionlp_st_2013_gro_NER:I-CellularDevelopmentalProcess)`, `bionlp_st_2013_gro_ner:B-Eukaryote)`, `bionlp_st_2013_ge_NER:B-Negative_regulation)`, `seth_corpus_ner:I-SNP)`, `hprd50_ner:B-protein)`, `bionlp_st_2013_gro_NER:B-BindingOfProtein)`, `mlee_NER:I-Negative_regulation)`, `bionlp_st_2011_ge_NER:B-Protein_catabolism)`, `bionlp_st_2013_pc_ner:B-Cellular_component)`, `bionlp_st_2011_id_ner:I-Chemical)`, `chemdner_TEXT:MESH:D013831)`, `biorelex_COREF:None)`, `chemdner_TEXT:MESH:D005609)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactor)`, `mlee_NER:B-Regulation)`, `chemdner_TEXT:MESH:D059808)`, `bionlp_st_2013_gro_ner:I-bHLHTF)`, `chemdner_TEXT:MESH:D010121)`, `chemdner_TEXT:MESH:D017608)`, `chemdner_TEXT:MESH:D007455)`, `mlee_NER:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorComplex)`, `biorelex_ner:B-disease)`, `bionlp_st_2013_cg_NER:B-Cell_differentiation)`, `medmentions_st21pv_ner:I-T092)`, `chemdner_TEXT:MESH:D007477)`, `medmentions_full_ner:B-T168)`, `pcr_ner:I-Chemical)`, `chemdner_TEXT:MESH:D009636)`, `chemdner_TEXT:MESH:D008051)`, `pharmaconer_ner:I-UNCLEAR)`, `bionlp_shared_task_2009_NER:I-Gene_expression)`, `chemprot_ner:I-GENE-N)`, `biorelex_ner:B-reagent)`, `chemdner_TEXT:MESH:D020123)`, `nlmchem_ner:O)`, `ebm_pico_ner:I-Outcome_Mental)`, `chemdner_TEXT:MESH:D004040)`, `chemdner_TEXT:MESH:D000450)`, `chebi_nactem_fullpaper_ner:O)`, `biorelex_ner:B-protein-isoform)`, `chemdner_TEXT:MESH:D001564)`, `medmentions_full_ner:I-T095)`, `mlee_NER:I-Remodeling)`, `bionlp_st_2013_cg_RE:None)`, `biorelex_ner:O)`, `seth_corpus_RE:AssociatedTo)`, `bioscope_abstracts_ner:B-negation)`, `chebi_nactem_fullpaper_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressorActivity)`, `bionlp_st_2013_cg_NER:B-Transcription)`, `bionlp_st_2011_ge_ner:B-Protein)`, `bionlp_st_2013_ge_ner:B-Protein)`, `bionlp_st_2013_gro_ner:I-Tissue)`, `chemdner_TEXT:MESH:D044005)`, `genia_term_corpus_ner:I-protein_substructure)`, `bionlp_st_2013_gro_ner:I-TranslationFactor)`, `minimayosrs_sts:5)`, `chemdner_TEXT:MESH:D012834)`, `ncbi_disease_ner:I-Modifier)`, `mlee_NER:B-Death)`, `medmentions_full_ner:B-T196)`, `bio_sim_verb_sts:4)`, `bionlp_st_2013_gro_NER:B-CellHomeostasis)`, `chemdner_TEXT:MESH:D006001)`, `bionlp_st_2013_gro_RE:encodes)`, `biorelex_ner:B-fusion-protein)`, `mlee_COREF:None)`, `chemdner_TEXT:MESH:D001623)`, `chemdner_TEXT:MESH:D000812)`, `medmentions_full_ner:B-T046)`, `bionlp_shared_task_2009_NER:O)`, `chemdner_TEXT:MESH:D000735)`, `gnormplus_ner:O)`, `chemdner_TEXT:MESH:D014635)`, `bionlp_st_2013_gro_NER:B-Mitosis)`, `chemdner_TEXT:MESH:D003847)`, `chemdner_TEXT:MESH:D002809)`, `medmentions_full_ner:I-T116)`, `chemdner_TEXT:MESH:D060406)`, `chemprot_ner:B-CHEMICAL)`, `chemdner_TEXT:MESH:D016642)`, `bionlp_st_2013_cg_NER:B-Phosphorylation)`, `an_em_ner:B-Organ)`, `chemdner_TEXT:MESH:D013431)`, `bionlp_shared_task_2009_RE:None)`, `medmentions_full_ner:B-T041)`, `mlee_ner:I-Tissue)`, `chemdner_TEXT:MESH:D023303)`, `ebm_pico_ner:I-Participant_Condition)`, `bionlp_st_2013_gro_ner:I-TATAbox)`, `bionlp_st_2013_gro_ner:I-bZIP)`, `bionlp_st_2011_epi_RE:Sidechain)`, `bionlp_st_2013_gro_ner:B-LivingEntity)`, `mantra_gsc_en_medline_ner:B-CHEM)`, `chemdner_TEXT:MESH:D007659)`, `medmentions_full_ner:I-T085)`, `bionlp_st_2013_cg_ner:I-Organism_substance)`, `medmentions_full_ner:B-T067)`, `chemdner_TEXT:MESH:D057846)`, `bionlp_st_2013_gro_NER:I-SignalingPathway)`, `bc5cdr_ner:I-Chemical)`, `nlm_gene_ner:I-STARGENE)`, `medmentions_full_ner:B-T090)`, `medmentions_full_ner:I-T037)`, `medmentions_full_ner:B-T037)`, `minimayosrs_sts:6)`, `medmentions_full_ner:I-T020)`, `chebi_nactem_fullpaper_ner:B-Species)`, `mirna_ner:O)`, `bionlp_st_2011_id_RE:Participant)`, `bionlp_st_2013_ge_NER:B-Binding)`, `ddi_corpus_ner:B-DRUG)`, `medmentions_full_ner:I-T078)`, `chemdner_TEXT:MESH:D012965)`, `bionlp_st_2013_cg_ner:I-Organ)`, `bionlp_st_2011_id_NER:B-Binding)`, `chemdner_TEXT:MESH:D006571)`, `mayosrs_sts:4)`, `chemdner_TEXT:MESH:D026422)`, `genia_term_corpus_ner:I-RNA_NA)`, `bionlp_st_2011_epi_RE:None)`, `chemdner_TEXT:MESH:D012265)`, `medmentions_full_ner:B-T195)`, `chemdner_TEXT:MESH:D014443)`, `bionlp_st_2013_gro_ner:I-OrganicChemical)`, `ebm_pico_ner:B-Participant_Age)`, `chemdner_TEXT:MESH:D009584)`, `chemdner_TEXT:MESH:D010862)`, `verspoor_2013_ner:B-Concepts_Ideas)`, `bionlp_st_2013_gro_NER:B-ActivationOfProcess)`, `chemdner_TEXT:MESH:D010118)`, `pharmaconer_ner:I-PROTEINAS)`, `biorelex_COREF:coref)`, `bionlp_st_2013_gro_ner:I-Enzyme)`, `chemdner_TEXT:MESH:D012530)`, `chemdner_TEXT:MESH:D002351)`, `biorelex_ner:B-gene)`, `chemdner_TEXT:MESH:D013213)`, `medmentions_full_ner:B-T103)`, `chemdner_TEXT:MESH:D010091)`, `ebm_pico_ner:B-Participant_Sex)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndDNA)`, `bionlp_st_2013_gro_ner:B-Phenotype)`, `chemdner_TEXT:MESH:D019791)`, `chemdner_TEXT:MESH:D014280)`, `chemdner_TEXT:MESH:D011094)`, `chia_RE:None)`, `biorelex_RE:None)`, `chemdner_TEXT:MESH:D005230)`, `verspoor_2013_ner:B-cohort-patient)`, `chemdner_TEXT:MESH:D013645)`, `bionlp_st_2013_gro_ner:B-SecondMessenger)`, `mlee_ner:B-Cellular_component)`, `bionlp_shared_task_2009_NER:I-Phosphorylation)`, `mlee_ner:B-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D017275)`, `chemdner_TEXT:MESH:D007053)`, `bionlp_st_2013_ge_RE:Site)`, `genia_term_corpus_ner:O)`, `chemprot_RE:CPR:6)`, `chemdner_TEXT:MESH:D006859)`, `genia_term_corpus_ner:I-other_name)`, `medmentions_full_ner:I-T042)`, `pdr_ner:O)`, `medmentions_full_ner:I-T057)`, `bionlp_st_2013_pc_RE:Product)`, `verspoor_2013_ner:B-size)`, `bionlp_st_2013_pc_NER:B-Acetylation)`, `medmentions_st21pv_ner:B-T017)`, `chia_ner:B-Temporal)`, `chemdner_TEXT:MESH:D003404)`, `bionlp_st_2013_gro_RE:None)`, `bionlp_shared_task_2009_NER:B-Gene_expression)`, `mqp_sts:3)`, `bionlp_st_2013_gro_ner:B-Chemical)`, `chemdner_TEXT:MESH:D013754)`, `mantra_gsc_en_medline_ner:B-GEOG)`, `mirna_ner:B-Specific_miRNAs)`, `chemdner_TEXT:MESH:D012492)`, `medmentions_full_ner:B-T190)`, `bionlp_st_2013_cg_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:B-RNA)`, `chemdner_TEXT:MESH:D011743)`, `chemdner_TEXT:MESH:D010795)`, `bionlp_st_2013_gro_NER:I-PositiveRegulation)`, `chemdner_TEXT:MESH:D002241)`, `medmentions_full_ner:B-T038)`, `mlee_ner:B-Organism)`, `medmentions_full_ner:I-T168)`, `bioscope_abstracts_ner:O)`, `chemdner_TEXT:MESH:D002599)`, `bionlp_st_2013_pc_ner:I-Simple_chemical)`, `medmentions_full_ner:I-T066)`, `chemdner_TEXT:MESH:D019695)`, `bionlp_st_2013_ge_NER:I-Transcription)`, `pharmaconer_ner:I-NO_NORMALIZABLES)`, `mantra_gsc_en_emea_ner:B-DISO)`, `bionlp_st_2013_gro_NER:B-CellDeath)`, `medmentions_st21pv_ner:I-T031)`, `chemdner_TEXT:MESH:D004317)`, `bionlp_st_2013_gro_ner:B-TATAbox)`, `chemdner_TEXT:MESH:D052203)`, `bionlp_st_2013_gro_NER:B-CellFateDetermination)`, `medmentions_st21pv_ner:I-T022)`, `bionlp_st_2013_ge_NER:B-Protein_catabolism)`, `bionlp_st_2011_epi_NER:I-Catalysis)`, `verspoor_2013_ner:I-cohort-patient)`, `chemdner_TEXT:MESH:D010100)`, `an_em_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D045162)`, `chia_RE:Has_qualifier)`, `verspoor_2013_RE:has)`, `chemdner_TEXT:MESH:D021382)`, `bionlp_st_2013_ge_NER:B-Acetylation)`, `medmentions_full_ner:I-T079)`, `bionlp_st_2013_gro_NER:B-Maintenance)`, `biorelex_ner:I-protein-domain)`, `chebi_nactem_abstr_ann1_ner:I-Chemical)`, `bioscope_papers_ner:O)`, `chia_RE:Has_scope)`, `bc5cdr_ner:B-Disease)`, `mlee_ner:I-Cellular_component)`, `medmentions_full_ner:I-T195)`, `spl_adr_200db_train_ner:B-AdverseReaction)`, `bionlp_st_2013_gro_ner:I-Promoter)`, `medmentions_full_ner:B-T040)`, `chemdner_TEXT:MESH:D005960)`, `chemdner_TEXT:MESH:D004164)`, `chemdner_TEXT:MESH:D015032)`, `chemdner_TEXT:MESH:D014255)`, `ebm_pico_ner:B-Outcome_Pain)`, `bionlp_st_2013_gro_ner:I-UpstreamRegulatorySequence)`, `meddocan_ner:I-CALLE)`, `bionlp_st_2013_pc_NER:I-Positive_regulation)`, `bionlp_st_2013_cg_NER:I-Regulation)`, `chemdner_TEXT:MESH:D001151)`, `medmentions_full_ner:I-T077)`, `chemdner_TEXT:MESH:D000081)`, `bionlp_st_2013_gro_NER:B-Stabilization)`, `mayosrs_sts:1)`, `biorelex_ner:B-mutation)`, `chemdner_TEXT:MESH:D000241)`, `chemdner_TEXT:MESH:D007930)`, `bionlp_st_2013_gro_NER:B-MetabolicPathway)`, `chemdner_TEXT:MESH:D013629)`, `chemdner_TEXT:MESH:D016202)`, `tmvar_v1_ner:I-DNAMutation)`, `chemdner_TEXT:MESH:D012502)`, `chemdner_TEXT:MESH:D044945)`, `bionlp_st_2013_cg_ner:I-Cellular_component)`, `mlee_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-AP2EREBPRelatedDomain)`, `chemdner_TEXT:MESH:D002338)`, `mayosrs_sts:5)`, `bionlp_st_2013_gro_ner:B-Intron)`, `genia_term_corpus_ner:I-DNA_domain_or_region)`, `anat_em_ner:I-Immaterial_anatomical_entity)`, `bionlp_st_2013_gro_ner:B-MutatedProtein)`, `ebm_pico_ner:I-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-ProteinCodingRegion)`, `chemdner_TEXT:MESH:D005047)`, `chia_ner:B-Mood)`, `medmentions_st21pv_ner:O)`, `cellfinder_ner:I-Species)`, `bionlp_st_2013_gro_ner:I-InorganicChemical)`, `bionlp_st_2011_id_ner:B-Entity)`, `bionlp_st_2013_cg_NER:I-Catabolism)`, `an_em_ner:I-Cellular_component)`, `medmentions_full_ner:B-T021)`, `bionlp_st_2013_gro_NER:B-Heterodimerization)`, `chemdner_TEXT:MESH:D008315)`, `medmentions_st21pv_ner:I-T170)`, `chemdner_TEXT:MESH:D050112)`, `meddocan_ner:I-ID_ASEGURAMIENTO)`, `chia_RE:Subsumes)`, `medmentions_full_ner:I-T099)`, `bionlp_st_2013_gro_ner:I-Protein)`, `chemdner_TEXT:MESH:D047071)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorActivity)`, `mlee_ner:B-Organism_subdivision)`, `chemdner_TEXT:MESH:D016559)`, `medmentions_full_ner:B-T129)`, `genia_term_corpus_ner:I-protein_molecule)`, `mlee_ner:B-Drug_or_compound)`, `bionlp_st_2013_gro_NER:B-Silencing)`, `bionlp_st_2013_gro_ner:I-MolecularStructure)`, `genia_term_corpus_ner:B-nucleotide)`, `chemdner_TEXT:MESH:D003042)`, `mantra_gsc_en_emea_ner:B-ANAT)`, `meddocan_ner:I-SEXO_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D006690)`, `genia_term_corpus_ner:I-ANDcell_linecell_line)`, `meddocan_ner:I-OTROS_SUJETO_ASISTENCIA)`, `chemdner_TEXT:MESH:D005473)`, `mantra_gsc_en_medline_ner:I-PHYS)`, `bionlp_st_2013_cg_NER:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-BetaScaffoldDomain_WithMinorGrooveContacts)`, `chemdner_TEXT:MESH:D001549)`, `chia_ner:B-Measurement)`, `bionlp_st_2011_id_ner:B-Regulon-operon)`, `bionlp_st_2013_cg_NER:B-Acetylation)`, `pdr_ner:B-Plant)`, `mlee_NER:B-Development)`, `linnaeus_filtered_ner:B-species)`, `bionlp_st_2013_pc_RE:AtLoc)`, `medmentions_full_ner:I-T192)`, `bionlp_st_2013_gro_ner:B-BindingSiteOfProtein)`, `bionlp_st_2013_ge_NER:B-Ubiquitination)`, `bionlp_st_2013_gro_ner:I-ProteinCodingDNARegion)`, `chemdner_TEXT:MESH:D009647)`, `bionlp_st_2013_gro_ner:I-Ligand)`, `bionlp_st_2011_id_ner:O)`, `bionlp_st_2013_gro_NER:I-RNASplicing)`, `bionlp_st_2013_gro_ner:I-ComplexOfProteinAndRNA)`, `bionlp_st_2011_id_NER:B-Gene_expression)`, `meddocan_ner:I-HOSPITAL)`, `chemdner_TEXT:MESH:D007501)`, `ehr_rel_sts:5)`, `bionlp_st_2013_gro_ner:B-TranscriptionRegulator)`, `medmentions_full_ner:B-T089)`, `bionlp_st_2011_epi_NER:I-DNA_demethylation)`, `mirna_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-TranscriptionRegulator)`, `bionlp_st_2013_gro_NER:B-ProteinBiosynthesis)`, `scai_chemical_ner:B-ABBREVIATION)`, `bionlp_st_2013_gro_ner:I-Virus)`, `bionlp_st_2011_ge_NER:O)`, `medmentions_full_ner:B-T203)`, `bionlp_st_2013_cg_NER:I-Mutation)`, `bionlp_st_2013_gro_ner:B-ThreeDimensionalMolecularStructure)`, `genetaggold_ner:I-NEWGENE)`, `chemdner_TEXT:MESH:D010705)`, `chia_ner:I-Mood)`, `medmentions_full_ner:I-T068)`, `minimayosrs_sts:4)`, `medmentions_full_ner:I-T097)`, `bionlp_st_2013_gro_ner:I-BetaScaffoldDomain_WithMinorGrooveContacts)`, `mantra_gsc_en_emea_ner:I-PHYS)`, `medmentions_full_ner:I-T104)`, `bio_sim_verb_sts:5)`, `chebi_nactem_abstr_ann1_ner:B-Biological_Activity)`, `bionlp_st_2013_gro_NER:B-IntraCellularProcess)`, `mantra_gsc_en_emea_ner:I-PHEN)`, `mlee_ner:B-Cell)`, `chemdner_TEXT:MESH:D045784)`, `bionlp_st_2013_gro_ner:I-Vitamin)`, `chemdner_TEXT:MESH:D010416)`, `bionlp_st_2013_gro_ner:B-FusionGene)`, `bionlp_st_2013_gro_ner:I-FusionProtein)`, `mlee_NER:B-Remodeling)`, `minimayosrs_sts:8)`, `bionlp_st_2013_gro_ner:B-Enhancer)`, `mantra_gsc_en_emea_ner:O)`, `bionlp_st_2013_gro_ner:B-OpenReadingFrame)`, `bionlp_st_2013_pc_COREF:None)`, `medmentions_full_ner:I-T123)`, `bionlp_st_2013_gro_NER:I-RegulatoryProcess)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfGeneExpression)`, `nlm_gene_ner:B-Domain)`, `bionlp_st_2013_pc_NER:B-Methylation)`, `medmentions_full_ner:B-T057)`, `chemdner_TEXT:MESH:D010226)`, `bionlp_st_2013_gro_ner:B-GeneProduct)`, `ebm_pico_ner:I-Outcome_Other)`, `chemdner_TEXT:MESH:D005223)`, `pdr_RE:Theme)`, `bionlp_shared_task_2009_NER:B-Protein_catabolism)`, `chemdner_TEXT:MESH:D019344)`, `gnormplus_ner:I-FamilyName)`, `verspoor_2013_ner:B-gender)`, `bionlp_st_2013_gro_NER:B-TranscriptionInitiation)`, `spl_adr_200db_train_ner:B-Severity)`, `medmentions_st21pv_ner:B-T097)`, `anat_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_NER:I-RNAMetabolism)`, `bioinfer_ner:I-Protein_complex)`, `anat_em_ner:I-Cell)`, `bionlp_st_2013_gro_ner:B-ProteinDomain)`, `bionlp_st_2013_gro_ner:I-PrimaryStructure)`, `genia_term_corpus_ner:I-other_artificial_source)`, `chemdner_TEXT:MESH:D010098)`, `bionlp_st_2013_gro_ner:I-Enhancer)`, `bionlp_st_2013_gro_ner:I-PositiveTranscriptionRegulator)`, `chemdner_TEXT:MESH:D004051)`, `chemdner_TEXT:MESH:D013853)`, `chebi_nactem_fullpaper_ner:B-Metabolite)`, `diann_iber_eval_en_ner:B-Disability)`, `biorelex_ner:B-peptide)`, `medmentions_full_ner:B-T048)`, `bionlp_st_2013_gro_ner:I-Function)`, `genia_term_corpus_ner:I-DNA_NA)`, `mlee_ner:I-Anatomical_system)`, `bioinfer_ner:B-Individual_protein)`, `verspoor_2013_ner:I-Physiology)`, `genia_term_corpus_ner:I-RNA_molecule)`, `chemdner_TEXT:MESH:D000255)`, `minimayosrs_sts:7)`, `mlee_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-ResponseProcess)`, `mantra_gsc_en_medline_ner:I-LIVB)`, `chemdner_TEXT:MESH:D010649)`, `seth_corpus_ner:B-Gene)`, `bionlp_st_2013_gro_ner:B-Attenuator)`, `chemdner_TEXT:MESH:D015363)`, `bionlp_st_2013_pc_NER:B-Inactivation)`, `medmentions_full_ner:I-T191)`, `mlee_ner:I-Organ)`, `chemdner_TEXT:MESH:D011765)`, `bionlp_shared_task_2009_NER:B-Binding)`, `an_em_ner:B-Cellular_component)`, `genia_term_corpus_ner:I-RNA_substructure)`, `medmentions_full_ner:B-T051)`, `anat_em_ner:I-Pathological_formation)`, `chemdner_TEXT:MESH:D013634)`, `chemdner_TEXT:MESH:D014414)`, `chia_RE:Has_index)`, `ddi_corpus_ner:B-GROUP)`, `bionlp_st_2013_gro_ner:B-MutantProtein)`, `bionlp_st_2013_ge_NER:I-Negative_regulation)`, `biorelex_ner:I-amino-acid)`, `chemdner_TEXT:MESH:D053279)`, `chemprot_RE:CPR:2)`, `bionlp_st_2013_gro_ner:B-bHLHTF)`, `bionlp_st_2013_cg_NER:I-Breakdown)`, `scai_chemical_ner:I-ABBREVIATION)`, `pdr_NER:B-Cause_of_disease)`, `chemdner_TEXT:MESH:D002219)`, `medmentions_full_ner:B-T044)`, `mirna_ner:B-Non-Specific_miRNAs)`, `chemdner_TEXT:MESH:D020748)`, `bionlp_shared_task_2009_RE:Theme)`, `chemdner_TEXT:MESH:D001647)`, `bionlp_st_2011_ge_NER:I-Regulation)`, `bionlp_st_2013_pc_ner:B-Gene_or_gene_product)`, `biorelex_ner:I-protein)`, `mantra_gsc_en_medline_ner:B-PROC)`, `medmentions_full_ner:I-T081)`, `medmentions_st21pv_ner:B-T022)`, `chia_ner:B-Multiplier)`, `bionlp_st_2013_gro_NER:B-GeneMutation)`, `chemdner_TEXT:MESH:D002232)`, `chemdner_TEXT:MESH:D010456)`, `biosses_sts:7)`, `medmentions_full_ner:B-T071)`, `chemdner_TEXT:MESH:D008628)`, `cadec_ner:O)`, `biorelex_ner:I-protein-complex)`, `chemdner_TEXT:MESH:D007328)`, `bionlp_st_2013_pc_NER:I-Activation)`, `bionlp_st_2013_cg_NER:B-Metabolism)`, `scai_chemical_ner:I-PARTIUPAC)`, `verspoor_2013_ner:B-age)`, `medmentions_full_ner:B-T122)`, `medmentions_full_ner:I-T050)`, `genia_term_corpus_ner:B-ANDother_nameother_name)`, `bionlp_st_2013_gro_NER:B-SPhase)`, `chemdner_TEXT:MESH:D012500)`, `mlee_NER:B-Metabolism)`, `bionlp_st_2011_id_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D002794)`, `bionlp_st_2013_gro_NER:B-ProteinTransport)`, `chemdner_TEXT:MESH:D006028)`, `chemdner_TEXT:MESH:D009822)`, `bionlp_st_2013_cg_ner:I-Cancer)`, `bionlp_shared_task_2009_ner:I-Entity)`, `pcr_ner:B-Herb)`, `pubmed_qa_labeled_fold0_CLF:yes)`, `bionlp_st_2013_gro_NER:I-NegativeRegulation)`, `bionlp_st_2013_cg_NER:B-Dephosphorylation)`, `anat_em_ner:B-Multi-tissue_structure)`, `chemdner_TEXT:MESH:D008274)`, `medmentions_full_ner:B-T025)`, `chemprot_RE:CPR:9)`, `bionlp_st_2013_pc_RE:Participant)`, `bionlp_st_2013_pc_ner:B-Simple_chemical)`, `genia_term_corpus_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:B-bZIP)`, `bionlp_st_2013_gro_ner:I-Eukaryote)`, `bionlp_st_2013_pc_ner:I-Complex)`, `hprd50_ner:I-protein)`, `medmentions_full_ner:B-T020)`, `bionlp_st_2013_gro_ner:B-Agonist)`, `medmentions_full_ner:B-T030)`, `chemdner_TEXT:MESH:D009536)`, `medmentions_full_ner:B-T169)`, `genia_term_corpus_ner:I-nucleotide)`, `bionlp_st_2013_gro_NER:I-ProteinCatabolism)`, `bc5cdr_ner:O)`, `chemdner_TEXT:MESH:D003078)`, `medmentions_full_ner:I-T040)`, `chemdner_TEXT:MESH:D005963)`, `bionlp_st_2013_gro_ner:B-ExpressionProfiling)`, `mantra_gsc_en_emea_ner:I-DEVI)`, `mlee_NER:B-Cell_division)`, `ebm_pico_ner:B-Intervention_Pharmacological)`, `chemdner_TEXT:MESH:D008790)`, `mantra_gsc_en_emea_ner:I-ANAT)`, `mantra_gsc_en_medline_ner:B-ANAT)`, `chemdner_TEXT:MESH:D003545)`, `bionlp_st_2013_gro_NER:I-IntraCellularTransport)`, `bionlp_st_2013_gro_NER:I-CellDivision)`, `chemdner_TEXT:MESH:D013438)`, `bionlp_st_2011_id_NER:I-Negative_regulation)`, `bionlp_st_2013_gro_NER:I-DevelopmentalProcess)`, `mlee_ner:B-Protein_domain_or_region)`, `chemdner_TEXT:MESH:D014978)`, `bionlp_st_2011_id_NER:O)`, `bionlp_st_2013_gro_ner:I-ReporterGeneConstruction)`, `medmentions_full_ner:I-T025)`, `bionlp_st_2019_bb_RE:Exhibits)`, `ddi_corpus_ner:I-GROUP)`, `chemdner_TEXT:MESH:D011241)`, `chemdner_TEXT:MESH:D010446)`, `bionlp_st_2013_gro_ner:I-ExperimentalMethod)`, `anat_em_ner:B-Tissue)`, `chemdner_TEXT:MESH:D000470)`, `bionlp_st_2013_pc_NER:I-Inactivation)`, `bionlp_st_2013_gro_ner:I-Agonist)`, `medmentions_full_ner:B-T024)`, `mlee_NER:I-Transcription)`, `bionlp_st_2011_epi_NER:B-Deglycosylation)`, `bionlp_st_2013_cg_NER:B-Cell_death)`, `chemdner_TEXT:MESH:D000266)`, `chemdner_TEXT:MESH:D019833)`, `genia_term_corpus_ner:I-RNA_family_or_group)`, `biosses_sts:8)`, `lll_RE:genic_interaction)`, `bionlp_st_2013_gro_ner:B-OrganicChemical)`, `chemdner_TEXT:MESH:D013267)`, `bionlp_st_2013_gro_ner:I-TranscriptionCofactor)`, `biorelex_ner:B-protein-region)`, `chemdner_TEXT:MESH:D001565)`, `genia_term_corpus_ner:B-cell_line)`, `bionlp_st_2013_gro_NER:B-Cleavage)`, `ddi_corpus_RE:EFFECT)`, `bionlp_st_2013_cg_NER:B-Planned_process)`, `bionlp_st_2013_cg_ner:I-Immaterial_anatomical_entity)`, `chemdner_TEXT:MESH:D007660)`, `medmentions_full_ner:I-T090)`, `bionlp_st_2013_gro_ner:I-CpGIsland)`, `bionlp_st_2013_gro_ner:B-AminoAcid)`, `chemdner_TEXT:MESH:D001095)`, `mlee_NER:I-Death)`, `meddocan_ner:I-EDAD_SUJETO_ASISTENCIA)`, `bionlp_st_2013_cg_ner:I-Anatomical_system)`, `bionlp_st_2013_gro_NER:B-Decrease)`, `bionlp_st_2013_pc_NER:B-Hydroxylation)`, `chemdner_TEXT:None)`, `bio_sim_verb_sts:3)`, `biorelex_ner:B-protein)`, `bionlp_st_2013_gro_ner:I-BasicDomain)`, `bionlp_st_2011_ge_ner:I-Entity)`, `bionlp_st_2013_gro_ner:B-PhysicalContinuant)`, `chemprot_RE:CPR:4)`, `chemdner_TEXT:MESH:D003345)`, `chemdner_TEXT:MESH:D010080)`, `mantra_gsc_en_patents_ner:O)`, `bionlp_st_2013_gro_ner:B-AntisenseRNA)`, `bionlp_st_2013_gro_ner:B-ProteinCodingDNARegion)`, `chemdner_TEXT:MESH:D010768)`, `chebi_nactem_fullpaper_ner:I-Protein)`, `genia_term_corpus_ner:I-multi_cell)`, `bionlp_st_2013_gro_ner:I-Gene)`, `medmentions_full_ner:B-T042)`, `chemdner_TEXT:MESH:D006034)`, `biorelex_ner:I-brand)`, `chebi_nactem_abstr_ann1_ner:I-Species)`, `chemdner_TEXT:MESH:D012236)`, `bionlp_st_2013_gro_ner:I-GeneProduct)`, `chemdner_TEXT:MESH:D005665)`, `chemdner_TEXT:MESH:D008715)`, `medmentions_st21pv_ner:I-T103)`, `ddi_corpus_RE:None)`, `medmentions_st21pv_ner:I-T091)`, `chemdner_TEXT:MESH:D019158)`, `chemdner_TEXT:MESH:D001280)`, `chemdner_TEXT:MESH:D009249)`, `medmentions_full_ner:I-T067)`, `medmentions_full_ner:B-T005)`, `meddocan_ner:O)`, `bionlp_st_2013_cg_NER:I-Remodeling)`, `meddocan_ner:B-ID_EMPLEO_PERSONAL_SANITARIO)`, `chemdner_TEXT:MESH:D000166)`, `osiris_ner:B-variant)`, `spl_adr_200db_train_ner:I-DrugClass)`, `mirna_ner:I-Species)`, `medmentions_st21pv_ner:I-T033)`, `ebm_pico_ner:I-Participant_Age)`, `medmentions_full_ner:B-T095)`, `bionlp_st_2013_gro_NER:B-RNAMetabolism)`, `chemdner_TEXT:MESH:D005231)`, `medmentions_full_ner:B-T062)`, `bionlp_st_2011_ge_NER:I-Gene_expression)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactor)`, `genia_term_corpus_ner:B-protein_domain_or_region)`, `mantra_gsc_en_emea_ner:B-PROC)`, `mlee_NER:I-Pathway)`, `bionlp_st_2013_gro_NER:I-BindingOfProteinToProteinBindingSiteOfProtein)`, `bionlp_st_2011_id_COREF:coref)`, `biosses_sts:6)`, `biorelex_ner:I-organism)`, `chia_ner:B-Value)`, `verspoor_2013_ner:B-body-part)`, `chemdner_TEXT:MESH:D004974)`, `chia_RE:Has_mood)`, `medmentions_st21pv_ner:B-T074)`, `chemdner_TEXT:MESH:D000535)`, `verspoor_2013_ner:I-Disorder)`, `bionlp_st_2013_gro_NER:B-BindingToMolecularEntity)`, `bionlp_st_2013_gro_ner:I-ReporterGene)`, `mayosrs_sts:8)`, `bionlp_st_2013_cg_ner:I-DNA_domain_or_region)`, `bionlp_st_2013_gro_NER:I-Pathway)`, `medmentions_st21pv_ner:I-T168)`, `bionlp_st_2013_gro_NER:B-NegativeRegulation)`, `medmentions_full_ner:B-T123)`, `bionlp_st_2013_pc_NER:B-Positive_regulation)`, `bionlp_st_2013_gro_NER:I-FormationOfProteinDNAComplex)`, `chemdner_TEXT:MESH:D000577)`, `mlee_NER:B-Ubiquitination)`, `chemdner_TEXT:MESH:D003630)`, `bionlp_st_2013_gro_ner:B-Transcript)`, `bionlp_st_2013_cg_NER:I-Transcription)`, `anat_em_ner:B-Organ)`, `anat_em_ner:I-Organism_substance)`, `spl_adr_200db_train_ner:B-DrugClass)`, `bionlp_st_2013_gro_ner:I-ProteinSubunit)`, `biorelex_ner:B-protein-domain)`, `chemdner_TEXT:MESH:D006051)`, `bionlp_st_2011_id_NER:B-Process)`, `bionlp_st_2013_pc_NER:B-Ubiquitination)`, `bionlp_st_2013_pc_NER:B-Transcription)`, `chemdner_TEXT:MESH:D006838)`, `cadec_ner:I-Disease)`, `bionlp_st_2013_ge_NER:B-Localization)`, `pharmaconer_ner:B-NO_NORMALIZABLES)`, `chemdner_TEXT:MESH:D011759)`, `chemdner_TEXT:MESH:D053243)`, `biorelex_ner:I-mutation)`, `mantra_gsc_en_emea_ner:I-LIVB)`, `bionlp_st_2013_gro_NER:I-Transport)`, `bionlp_st_2011_id_RE:Site)`, `chemdner_TEXT:MESH:D015474)`, `bionlp_st_2013_gro_NER:B-Dimerization)`, `bionlp_st_2013_cg_NER:I-Localization)`, `medmentions_full_ner:I-T032)`, `chemdner_TEXT:MESH:D018036)`, `meddocan_ner:B-FECHAS)`, `medmentions_full_ner:I-T167)`, `chemprot_RE:CPR:5)`, `minimayosrs_sts:2)`, `biorelex_ner:B-protein-DNA-complex)`, `cellfinder_ner:I-CellComponent)`, `nlm_gene_ner:B-Other)`, `medmentions_full_ner:I-T019)`, `chebi_nactem_abstr_ann1_ner:B-Spectral_Data)`, `bionlp_st_2013_cg_ner:I-Multi-tissue_structure)`, `medmentions_full_ner:B-T010)`, `mantra_gsc_en_medline_ner:I-GEOG)`, `chemprot_ner:I-GENE-Y)`, `mirna_ner:I-Diseases)`, `an_em_ner:O)`, `bionlp_st_2013_cg_NER:B-Remodeling)`, `medmentions_st21pv_ner:I-T058)`, `scicite_TEXT:background)`, `bionlp_st_2013_cg_NER:B-Mutation)`, `genia_term_corpus_ner:B-mono_cell)`, `bionlp_st_2013_gro_ner:B-DNA)`, `medmentions_full_ner:I-T114)`, `bionlp_st_2011_id_RE:Theme)`, `genetaggold_ner:B-NEWGENE)`, `mlee_ner:I-Organism_subdivision)`, `sciq_CLF:yes)`, `bionlp_shared_task_2009_NER:I-Regulation)`, `bionlp_st_2013_gro_ner:B-Microorganism)`, `chemdner_TEXT:MESH:D006108)`, `biorelex_ner:B-amino-acid)`, `bioinfer_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-Chemical)`, `mantra_gsc_en_patents_ner:I-DEVI)`, `mantra_gsc_en_medline_ner:O)`, `bionlp_st_2013_pc_NER:I-Regulation)`, `medmentions_full_ner:B-T043)`, `scicite_TEXT:result)`, `bionlp_st_2013_ge_NER:I-Binding)`, `meddocan_ner:I-INSTITUCION)`, `chemdner_TEXT:MESH:D011441)`, `genia_term_corpus_ner:I-protein_domain_or_region)`, `bionlp_st_2011_epi_RE:Cause)`, `bionlp_st_2013_gro_ner:B-Nucleosome)`, `chemdner_TEXT:MESH:D011223)`, `chebi_nactem_abstr_ann1_ner:B-Protein)`, `bionlp_st_2013_gro_RE:hasFunction)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorActivity)`, `biorelex_ner:B-protein-family)`, `bionlp_st_2013_cg_ner:B-Gene_or_gene_product)`, `tmvar_v1_ner:B-SNP)`, `bionlp_st_2013_gro_ner:B-ExperimentalMethod)`, `bionlp_st_2013_gro_ner:B-ReporterGeneConstruction)`, `bionlp_st_2011_ge_NER:B-Transcription)`, `chemdner_TEXT:MESH:D004041)`, `chemdner_TEXT:MESH:D000631)`, `meddocan_ner:I-ID_EMPLEO_PERSONAL_SANITARIO)`, `chebi_nactem_fullpaper_ner:I-Species)`, `medmentions_full_ner:B-T170)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelix)`, `bionlp_st_2013_cg_ner:B-Organism_subdivision)`, `genia_term_corpus_ner:I-DNA_molecule)`, `bionlp_st_2013_cg_NER:I-Glycolysis)`, `an_em_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_NER:B-TranscriptionTermination)`, `bionlp_st_2013_gro_NER:B-CellAging)`, `bionlp_st_2013_cg_ner:B-Protein_domain_or_region)`, `anat_em_ner:B-Organism_substance)`, `medmentions_full_ner:B-T053)`, `mlee_ner:B-Multi-tissue_structure)`, `biosses_sts:4)`, `bioscope_abstracts_ner:I-speculation)`, `chemdner_TEXT:MESH:D053644)`, `bionlp_st_2013_cg_NER:I-Translation)`, `tmvar_v1_ner:B-DNAMutation)`, `genia_term_corpus_ner:B-RNA_substructure)`, `an_em_ner:B-Anatomical_system)`, `bionlp_st_2013_gro_ner:B-Conformation)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfTranscriptionOfGene)`, `medmentions_full_ner:I-T069)`, `chemdner_TEXT:MESH:D006820)`, `chemdner_TEXT:MESH:D015725)`, `chemdner_TEXT:MESH:D010281)`, `mlee_NER:B-Pathway)`, `bionlp_st_2011_id_NER:I-Regulation)`, `bionlp_st_2013_gro_NER:I-GeneExpression)`, `medmentions_full_ner:I-T073)`, `biosses_sts:2)`, `medmentions_full_ner:I-T043)`, `chemdner_TEXT:MESH:D001152)`, `bionlp_st_2013_gro_ner:I-DNAMolecule)`, `chemdner_TEXT:MESH:D015636)`, `chemdner_TEXT:MESH:D000666)`, `chemprot_RE:None)`, `bionlp_st_2013_gro_ner:B-Sequence)`, `chemdner_TEXT:MESH:D009151)`, `chia_ner:B-Observation)`, `an_em_COREF:coref)`, `medmentions_full_ner:B-T120)`, `bionlp_st_2013_gro_ner:B-Tissue)`, `bionlp_st_2013_gro_ner:B-MolecularEntity)`, `bionlp_st_2013_pc_NER:B-Dephosphorylation)`, `chemdner_TEXT:MESH:D044242)`, `bionlp_st_2013_gro_ner:B-FusionProtein)`, `biorelex_ner:B-cell)`, `bionlp_st_2013_gro_NER:B-Disease)`, `bionlp_st_2011_id_RE:None)`, `biorelex_ner:B-protein-motif)`, `bionlp_st_2013_pc_NER:I-Localization)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomain)`, `bionlp_st_2013_gro_ner:B-Locus)`, `genia_term_corpus_ner:B-other_organic_compound)`, `seth_corpus_ner:B-SNP)`, `pcr_ner:O)`, `genia_term_corpus_ner:I-virus)`, `bionlp_st_2013_gro_ner:I-Peptide)`, `chebi_nactem_abstr_ann1_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:B-RNAMolecule)`, `bionlp_st_2013_gro_ner:B-SequenceHomologyAnalysis)`, `chemdner_TEXT:MESH:D005054)`, `bionlp_st_2013_ge_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:B-CellularProcess)`, `bionlp_st_2013_ge_RE:Site2)`, `verspoor_2013_ner:B-Phenomena)`, `chia_ner:I-Temporal)`, `bionlp_st_2013_gro_NER:I-Localization)`, `bionlp_st_2013_cg_NER:B-Ubiquitination)`, `chemdner_TEXT:MESH:D009020)`, `bionlp_st_2013_cg_RE:FromLoc)`, `mlee_ner:B-Organism_substance)`, `genia_term_corpus_ner:I-tissue)`, `medmentions_st21pv_ner:I-T082)`, `chemdner_TEXT:MESH:D054358)`, `medmentions_full_ner:I-T052)`, `chemdner_TEXT:MESH:D005459)`, `chemdner_TEXT:MESH:D047188)`, `medmentions_full_ner:I-T031)`, `chemdner_TEXT:MESH:D013890)`, `chemdner_TEXT:MESH:D004573)`, `genia_term_corpus_ner:B-peptide)`, `an_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-MessengerRNA)`, `medmentions_full_ner:B-T171)`, `bionlp_st_2013_gro_NER:B-Affecting)`, `genia_term_corpus_ner:I-body_part)`, `bionlp_st_2013_gro_ner:B-Prokaryote)`, `chemdner_TEXT:MESH:D013844)`, `medmentions_full_ner:I-T061)`, `bionlp_st_2013_pc_NER:B-Negative_regulation)`, `bionlp_st_2013_gro_ner:I-EukaryoticCell)`, `pdr_ner:I-Plant)`, `cadec_ner:I-ADR)`, `chemdner_TEXT:MESH:D024341)`, `medmentions_full_ner:I-T092)`, `chemdner_TEXT:MESH:D020319)`, `bionlp_st_2013_cg_NER:B-Cell_transformation)`, `bionlp_st_2013_gro_NER:B-BindingOfTranscriptionFactorToDNA)`, `an_em_ner:I-Anatomical_system)`, `bionlp_st_2011_epi_NER:B-Hydroxylation)`, `bionlp_st_2013_gro_ner:I-Exon)`, `cellfinder_ner:B-Species)`, `bionlp_st_2013_gro_NER:B-Pathway)`, `bionlp_st_2013_ge_NER:B-Protein_modification)`, `bionlp_st_2013_gro_ner:I-FusionGene)`, `bionlp_st_2011_rel_ner:B-Entity)`, `bionlp_st_2011_id_RE:CSite)`, `bionlp_st_2013_ge_NER:B-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-BindingAssay)`, `bionlp_st_2013_gro_NER:B-CellDivision)`, `bionlp_st_2019_bb_ner:I-Microorganism)`, `medmentions_full_ner:I-T059)`, `chemdner_TEXT:MESH:D011108)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-GeneRegion)`, `bionlp_st_2013_cg_COREF:None)`, `chemdner_TEXT:MESH:D010261)`, `mlee_NER:B-Binding)`, `chemprot_ner:I-CHEMICAL)`, `bionlp_st_2011_id_RE:ToLoc)`, `biorelex_ner:I-organelle)`, `chemdner_TEXT:MESH:D004318)`, `genia_term_corpus_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:B-RNAPolymerase)`, `bionlp_st_2013_gro_ner:B-CellComponent)`, `bionlp_st_2013_gro_NER:B-RegulationOfGeneExpression)`, `bionlp_st_2013_gro_ner:B-Peptide)`, `bionlp_shared_task_2009_NER:B-Transcription)`, `biorelex_ner:B-tissue)`, `pico_extraction_ner:B-participant)`, `chia_ner:I-Visit)`, `chemdner_TEXT:MESH:D011807)`, `chemdner_TEXT:MESH:D014501)`, `bionlp_st_2013_gro_NER:I-IntraCellularProcess)`, `ehr_rel_sts:7)`, `pico_extraction_ner:I-intervention)`, `chemdner_TEXT:MESH:D001599)`, `bionlp_st_2013_gro_ner:I-RegulatoryDNARegion)`, `medmentions_st21pv_ner:I-T037)`, `chemdner_TEXT:MESH:D055768)`, `bionlp_st_2013_gro_ner:B-ChromosomalDNA)`, `chemdner_TEXT:MESH:D008550)`, `bionlp_st_2013_pc_RE:Site)`, `cadec_ner:B-ADR)`, `medmentions_full_ner:I-T087)`, `chemdner_TEXT:MESH:D001583)`, `bionlp_st_2011_epi_NER:B-Dehydroxylation)`, `ehr_rel_sts:3)`, `bionlp_st_2013_gro_ner:I-MutantProtein)`, `chemdner_TEXT:MESH:D011804)`, `medmentions_full_ner:B-T091)`, `bionlp_st_2013_cg_RE:CSite)`, `linnaeus_ner:O)`, `medmentions_st21pv_ner:B-T201)`, `verspoor_2013_ner:B-Disorder)`, `bionlp_st_2013_cg_NER:I-Death)`, `bioinfer_ner:I-Individual_protein)`, `medmentions_full_ner:B-T191)`, `verspoor_2013_ner:B-ethnicity)`, `chemdner_TEXT:MESH:D002083)`, `genia_term_corpus_ner:B-carbohydrate)`, `genia_term_corpus_ner:B-DNA_molecule)`, `medmentions_full_ner:B-T069)`, `pdr_NER:I-Treatment_of_disease)`, `mlee_ner:B-Anatomical_system)`, `chebi_nactem_fullpaper_ner:B-Spectral_Data)`, `cadec_ner:B-Disease)`, `chemdner_TEXT:MESH:D005419)`, `bionlp_st_2013_gro_ner:I-Nucleotide)`, `medmentions_full_ner:B-T194)`, `chemdner_TEXT:MESH:D005947)`, `chemdner_TEXT:MESH:D008627)`, `bionlp_st_2013_gro_NER:B-ExperimentalIntervention)`, `chemdner_TEXT:MESH:D011073)`, `chia_RE:Has_negation)`, `verspoor_2013_ner:I-mutation)`, `chemdner_TEXT:MESH:D004224)`, `chemdner_TEXT:MESH:D005663)`, `medmentions_full_ner:I-T094)`, `chemdner_TEXT:MESH:D006877)`, `ebm_pico_ner:B-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressor)`, `biorelex_ner:I-cell)`, `bionlp_st_2013_gro_NER:I-BindingOfProteinToDNA)`, `verspoor_2013_RE:None)`, `bionlp_st_2013_gro_NER:B-ProteinModification)`, `chemdner_TEXT:MESH:D047090)`, `medmentions_full_ner:I-T204)`, `chemdner_TEXT:MESH:D006843)`, `biorelex_ner:I-protein-family)`, `chemdner_TEXT:MESH:D012694)`, `bionlp_st_2013_gro_ner:B-TranslationFactor)`, `scai_chemical_ner:B-)`, `bionlp_st_2013_gro_ner:B-Exon)`, `medmentions_full_ner:I-T083)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivatorActivity)`, `meddocan_ner:I-NUMERO_TELEFONO)`, `medmentions_full_ner:I-T101)`, `medmentions_full_ner:B-T034)`, `bionlp_st_2013_gro_ner:I-Histone)`, `ddi_corpus_RE:MECHANISM)`, `mantra_gsc_en_emea_ner:I-PROC)`, `genia_term_corpus_ner:I-peptide)`, `bionlp_st_2013_cg_NER:B-Cell_proliferation)`, `meddocan_ner:I-PAIS)`, `chemdner_TEXT:MESH:D004140)`, `medmentions_full_ner:B-T083)`, `diann_iber_eval_en_ner:I-Disability)`, `bionlp_st_2013_gro_NER:B-PosttranslationalModification)`, `biorelex_ner:I-fusion-protein)`, `chemdner_TEXT:MESH:D020910)`, `chemdner_TEXT:MESH:D014747)`, `bionlp_st_2013_ge_NER:B-Gene_expression)`, `biorelex_ner:I-tissue)`, `mantra_gsc_en_patents_ner:B-LIVB)`, `medmentions_full_ner:O)`, `medmentions_full_ner:B-T077)`, `bionlp_st_2013_gro_ner:I-Operon)`, `chemdner_TEXT:MESH:D002392)`, `chemdner_TEXT:MESH:D014498)`, `chemdner_TEXT:MESH:D002368)`, `chemdner_TEXT:MESH:D018817)`, `bionlp_st_2013_ge_NER:I-Regulation)`, `genia_term_corpus_ner:B-atom)`, `chemdner_TEXT:MESH:D011092)`, `chemdner_TEXT:MESH:D015283)`, `chemdner_TEXT:MESH:D018698)`, `cadec_ner:I-Finding)`, `chemdner_TEXT:MESH:D009569)`, `muchmore_en_ner:I-umlsterm)`, `bionlp_st_2013_cg_NER:B-Death)`, `nlm_gene_ner:I-Other)`, `medmentions_full_ner:B-T109)`, `osiris_ner:I-variant)`, `ehr_rel_sts:6)`, `chemdner_TEXT:MESH:D001120)`, `mlee_ner:I-Protein_domain_or_region)`, `bionlp_st_2013_pc_NER:B-Dissociation)`, `bionlp_st_2013_cg_NER:B-Metastasis)`, `chemdner_TEXT:MESH:D014204)`, `chemdner_TEXT:MESH:D005857)`, `medmentions_full_ner:I-T030)`, `chemdner_TEXT:MESH:D019256)`, `bionlp_st_2013_gro_ner:B-Polymerase)`, `chia_ner:B-Negation)`, `bionlp_st_2013_gro_NER:B-CellularMetabolicProcess)`, `bionlp_st_2013_gro_NER:B-CellDifferentiation)`, `biorelex_ner:I-protein-motif)`, `medmentions_full_ner:I-T093)`, `chemdner_TEXT:MESH:D019820)`, `anat_em_ner:B-Pathological_formation)`, `meddocan_ner:I-PROFESION)`, `bionlp_shared_task_2009_NER:B-Localization)`, `genia_term_corpus_ner:B-RNA_domain_or_region)`, `chemdner_TEXT:MESH:D014668)`, `bionlp_st_2013_pc_ner:I-Gene_or_gene_product)`, `chemdner_TEXT:MESH:D019207)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToProteinBindingSiteOfDNA)`, `medmentions_full_ner:B-T059)`, `bionlp_st_2013_gro_ner:B-Ligand)`, `bio_sim_verb_sts:6)`, `biorelex_ner:B-experimental-construct)`, `bionlp_st_2013_gro_ner:I-DNA)`, `pdr_NER:O)`, `chemdner_TEXT:MESH:D008670)`, `bionlp_st_2011_ge_RE:Cause)`, `meddocan_ner:B-CALLE)`, `chemdner_TEXT:MESH:D015232)`, `bionlp_st_2013_pc_NER:O)`, `bionlp_st_2013_gro_NER:B-FormationOfProteinDNAComplex)`, `medmentions_full_ner:B-T121)`, `bionlp_shared_task_2009_NER:B-Regulation)`, `chemdner_TEXT:MESH:D009534)`, `chemdner_TEXT:MESH:D014451)`, `bionlp_st_2011_id_RE:AtLoc)`, `chemdner_TEXT:MESH:D011799)`, `medmentions_st21pv_ner:B-T204)`, `genia_term_corpus_ner:I-protein_subunit)`, `biorelex_ner:I-assay)`, `chemdner_TEXT:MESH:D005680)`, `an_em_ner:I-Organism_substance)`, `chemdner_TEXT:MESH:D010368)`, `chemdner_TEXT:MESH:D000872)`, `bionlp_st_2011_id_NER:I-Gene_expression)`, `bionlp_st_2013_cg_NER:B-Regulation)`, `mlee_ner:I-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D001393)`, `medmentions_full_ner:I-T038)`, `chemdner_TEXT:MESH:D047311)`, `chemdner_TEXT:MESH:D011453)`, `chemdner_TEXT:MESH:D020106)`, `chemdner_TEXT:MESH:D019257)`, `bionlp_st_2013_gro_ner:B-NuclearReceptor)`, `chemdner_TEXT:MESH:D002117)`, `genia_term_corpus_ner:B-lipid)`, `bionlp_st_2013_gro_ner:B-SmallInterferingRNA)`, `chemdner_TEXT:MESH:D011205)`, `chemdner_TEXT:MESH:D002686)`, `bionlp_st_2013_gro_NER:B-Translation)`, `ebm_pico_ner:I-Intervention_Psychological)`, `mlee_ner:I-Drug_or_compound)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D000688)`, `bionlp_st_2011_ge_RE:None)`, `bionlp_st_2013_gro_ner:B-ProteinSubunit)`, `genia_term_corpus_ner:I-ANDother_nameother_name)`, `bionlp_st_2013_gro_NER:I-Heterodimerization)`, `pico_extraction_ner:B-intervention)`, `bionlp_st_2013_cg_ner:I-Organism)`, `bionlp_st_2013_gro_ner:I-ProteinDomain)`, `bionlp_st_2013_gro_NER:I-BindingToProtein)`, `scai_chemical_ner:I-)`, `biorelex_ner:B-experiment-tag)`, `ebm_pico_ner:B-Intervention_Physical)`, `bionlp_st_2013_cg_RE:ToLoc)`, `bionlp_st_2013_gro_NER:B-FormationOfTranscriptionFactorComplex)`, `linnaeus_ner:B-species)`, `medmentions_full_ner:I-T062)`, `chemdner_TEXT:MESH:D014640)`, `mlee_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D008701)`, `mlee_NER:O)`, `chemdner_TEXT:MESH:D014302)`, `genia_term_corpus_ner:B-RNA_family_or_group)`, `medmentions_full_ner:I-T091)`, `medmentions_full_ner:B-T022)`, `medmentions_full_ner:B-T074)`, `bionlp_st_2013_gro_NER:B-ProteinCatabolism)`, `chemdner_TEXT:MESH:D011388)`, `bionlp_st_2013_ge_NER:I-Phosphorylation)`, `bionlp_st_2013_gro_NER:I-CellAdhesion)`, `anat_em_ner:I-Organ)`, `medmentions_full_ner:B-T045)`, `chemdner_TEXT:MESH:D008727)`, `chebi_nactem_abstr_ann1_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-RNAPolymeraseII)`, `nlm_gene_ner:B-STARGENE)`, `mantra_gsc_en_emea_ner:B-OBJC)`, `meddocan_ner:B-PROFESION)`, `bionlp_st_2013_gro_ner:B-DNABindingDomainOfProtein)`, `chemdner_TEXT:MESH:D010636)`, `chemdner_TEXT:MESH:D004061)`, `mlee_NER:I-Binding)`, `medmentions_full_ner:B-T075)`, `medmentions_full_ner:B-UnknownType)`, `chemdner_TEXT:MESH:D019081)`, `bionlp_st_2013_gro_NER:I-Binding)`, `medmentions_full_ner:I-T005)`, `chemdner_TEXT:MESH:D009821)` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bunsen_base_best_en_5.2.0_3.0_1699290578555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bunsen_base_best_en_5.2.0_3.0_1699290578555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bunsen_base_best","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bunsen_base_best","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.base.by_leonweber").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bunsen_base_best| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|420.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/leonweber/bunsen_base_best \ No newline at end of file From 3857f0a94f7e2a2565c36ba16ddda375b24ed49f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:43:00 +0700 Subject: [PATCH 298/667] Add model 2023-11-06-bert_ner_bert_mention_german_vera_pro_de --- ...ert_ner_bert_mention_german_vera_pro_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_german_vera_pro_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_german_vera_pro_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_german_vera_pro_de.md new file mode 100644 index 00000000000000..b613a87d73492d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_german_vera_pro_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German bert_ner_bert_mention_german_vera_pro BertForTokenClassification from vera-pro +author: John Snow Labs +name: bert_ner_bert_mention_german_vera_pro +date: 2023-11-06 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_mention_german_vera_pro` is a German model originally trained by vera-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mention_german_vera_pro_de_5.2.0_3.0_1699288386584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mention_german_vera_pro_de_5.2.0_3.0_1699288386584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_mention_german_vera_pro","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_mention_german_vera_pro", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_mention_german_vera_pro| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|665.1 MB| + +## References + +https://huggingface.co/vera-pro/bert-mention-de \ No newline at end of file From 82f286266c3ea682a9251b9307507756846433ff Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:44:00 +0700 Subject: [PATCH 299/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en --- ..._multilingual_cased_chunking_english_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en.md new file mode 100644 index 00000000000000..ae9b2fc97208c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english BertForTokenClassification from QCRI +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english` is a English model originally trained by QCRI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en_5.2.0_3.0_1699298384330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english_en_5.2.0_3.0_1699298384330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_multilingual_cased_chunking_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/QCRI/bert-base-multilingual-cased-chunking-english \ No newline at end of file From 281045088b57a6317c5fd95caff8a9dbf196bab0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:45:01 +0700 Subject: [PATCH 300/667] Add model 2023-11-06-bert_sayula_popoluca_chinese_roberta_large_upos_zh --- ..._popoluca_chinese_roberta_large_upos_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_large_upos_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_large_upos_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_large_upos_zh.md new file mode 100644 index 00000000000000..6e3e82892c0352 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_chinese_roberta_large_upos_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_sayula_popoluca_chinese_roberta_large_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_chinese_roberta_large_upos +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_chinese_roberta_large_upos` is a Chinese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_chinese_roberta_large_upos_zh_5.2.0_3.0_1699301806106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_chinese_roberta_large_upos_zh_5.2.0_3.0_1699301806106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_chinese_roberta_large_upos","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_chinese_roberta_large_upos", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_chinese_roberta_large_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|1.2 GB| + +## References + +https://huggingface.co/KoichiYasuoka/chinese-roberta-large-upos \ No newline at end of file From c52bc9663b424364e386d451c44137698300420d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:46:01 +0700 Subject: [PATCH 301/667] Add model 2023-11-06-bert_ner_nominalization_candidate_classifier_en --- ..._nominalization_candidate_classifier_en.md | 116 ++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_nominalization_candidate_classifier_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nominalization_candidate_classifier_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nominalization_candidate_classifier_en.md new file mode 100644 index 00000000000000..62f1fcca94af30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_nominalization_candidate_classifier_en.md @@ -0,0 +1,116 @@ +--- +layout: model +title: English Named Entity Recognition (from kleinay) +author: John Snow Labs +name: bert_ner_nominalization_candidate_classifier +date: 2023-11-06 +tags: [bert, ner, token_classification, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `nominalization-candidate-classifier` is a English model orginally trained by `kleinay`. + +## Predicted Entities + +`False`, `True` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_nominalization_candidate_classifier_en_5.2.0_3.0_1699298930263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_nominalization_candidate_classifier_en_5.2.0_3.0_1699298930263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nominalization_candidate_classifier","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("pos") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_nominalization_candidate_classifier","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("pos") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("I love Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_kleinay").predict("""I love Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_nominalization_candidate_classifier| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/kleinay/nominalization-candidate-classifier +- https://www.aclweb.org/anthology/2020.coling-main.274/ +- https://github.com/kleinay/QANom \ No newline at end of file From 49a47c7be06800496953d19896f5e93e27ef9a33 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:47:02 +0700 Subject: [PATCH 302/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_kt_punctuator_en --- ...t_sayula_popoluca_tiny_kt_punctuator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_kt_punctuator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_kt_punctuator_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_kt_punctuator_en.md new file mode 100644 index 00000000000000..bf1e9cb9a5ede7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_kt_punctuator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_kt_punctuator BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_kt_punctuator +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_kt_punctuator` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_kt_punctuator_en_5.2.0_3.0_1699307175877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_kt_punctuator_en_5.2.0_3.0_1699307175877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_kt_punctuator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_kt_punctuator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_kt_punctuator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_kt_punctuator \ No newline at end of file From a04182790ca7be1ecd1b72f7b5af17213c1f33e1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:48:02 +0700 Subject: [PATCH 303/667] Add model 2023-11-06-bert_ner_neulvo_bert_finetuned_ner_en --- ...6-bert_ner_neulvo_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_neulvo_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_neulvo_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_neulvo_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..482f2869509600 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_neulvo_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_neulvo_bert_finetuned_ner BertForTokenClassification from Neulvo +author: John Snow Labs +name: bert_ner_neulvo_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_neulvo_bert_finetuned_ner` is a English model originally trained by Neulvo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_neulvo_bert_finetuned_ner_en_5.2.0_3.0_1699282015146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_neulvo_bert_finetuned_ner_en_5.2.0_3.0_1699282015146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_neulvo_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_neulvo_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_neulvo_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Neulvo/bert-finetuned-ner \ No newline at end of file From 0f02190948289cc057ac16763b037ed258bfea65 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:49:02 +0700 Subject: [PATCH 304/667] Add model 2023-11-06-bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt --- ...r_bert_base_cased_portuguese_lenerbr_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt.md new file mode 100644 index 00000000000000..3960ba53685b93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese bert_ner_ner_bert_base_cased_portuguese_lenerbr BertForTokenClassification from mateusqc +author: John Snow Labs +name: bert_ner_ner_bert_base_cased_portuguese_lenerbr +date: 2023-11-06 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_ner_bert_base_cased_portuguese_lenerbr` is a Portuguese model originally trained by mateusqc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt_5.2.0_3.0_1699297714905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_bert_base_cased_portuguese_lenerbr_pt_5.2.0_3.0_1699297714905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_bert_base_cased_portuguese_lenerbr","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_ner_bert_base_cased_portuguese_lenerbr", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_bert_base_cased_portuguese_lenerbr| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|406.0 MB| + +## References + +https://huggingface.co/mateusqc/ner-bert-base-cased-pt-lenerbr \ No newline at end of file From bc5dd2df9fa6953a4e94bbd1395fe990afcc8d46 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:50:02 +0700 Subject: [PATCH 305/667] Add model 2023-11-06-bert_ner_krimo11_bert_finetuned_ner_en --- ...-bert_ner_krimo11_bert_finetuned_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_krimo11_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_krimo11_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_krimo11_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..2ff034985e4174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_krimo11_bert_finetuned_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from krimo11) +author: John Snow Labs +name: bert_ner_krimo11_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `krimo11`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_krimo11_bert_finetuned_ner_en_5.2.0_3.0_1699292976154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_krimo11_bert_finetuned_ner_en_5.2.0_3.0_1699292976154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_krimo11_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_krimo11_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_krimo11").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_krimo11_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/krimo11/bert-finetuned-ner \ No newline at end of file From 8b35064995200989f0a615f0e6e07dc546f72aea Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:51:03 +0700 Subject: [PATCH 306/667] Add model 2023-11-06-bert_ner_biomuppet_en --- .../2023-11-06-bert_ner_biomuppet_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biomuppet_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biomuppet_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biomuppet_en.md new file mode 100644 index 00000000000000..e964339ab28253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biomuppet_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from leonweber) +author: John Snow Labs +name: bert_ner_biomuppet +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `biomuppet` is a English model originally trained by `leonweber`. + +## Predicted Entities + +`medmentions_full_ner:B-T085)`, `bionlp_st_2013_gro_ner:B-Ribosome)`, `chemdner_TEXT:MESH:D013830)`, `anat_em_ner:O)`, `cellfinder_ner:I-GeneProtein)`, `ncbi_disease_ner:B-CompositeMention)`, `bionlp_st_2013_gro_ner:B-Virus)`, `medmentions_full_ner:I-T129)`, `scai_disease_ner:B-DISEASE)`, `biorelex_ner:B-chemical)`, `chemdner_TEXT:MESH:D011166)`, `medmentions_st21pv_ner:I-T204)`, `chemdner_TEXT:MESH:D008345)`, `bionlp_st_2013_gro_NER:B-RegulationOfFunction)`, `mlee_ner:I-Cell)`, `bionlp_st_2013_gro_NER:I-RNABiosynthesis)`, `biorelex_ner:I-RNA-family)`, `bionlp_st_2013_gro_NER:B-ResponseToChemicalStimulus)`, `bionlp_st_2011_epi_NER:B-Dephosphorylation)`, `chemdner_TEXT:MESH:D003035)`, `chemdner_TEXT:MESH:D013440)`, `chemdner_TEXT:MESH:D037341)`, `chemdner_TEXT:MESH:D009532)`, `chemdner_TEXT:MESH:D019216)`, `chemdner_TEXT:MESH:D036701)`, `chemdner_TEXT:MESH:D011107)`, `bionlp_st_2013_cg_NER:B-Translation)`, `genia_term_corpus_ner:B-cell_component)`, `medmentions_full_ner:I-T065)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfDNA)`, `anat_em_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D000225)`, `genia_term_corpus_ner:I-ORDNA_domain_or_regionDNA_domain_or_region)`, `medmentions_full_ner:I-T015)`, `chemdner_TEXT:MESH:D008239)`, `bionlp_st_2013_cg_NER:I-Binding)`, `bionlp_st_2013_cg_NER:B-Amino_acid_catabolism)`, `cellfinder_ner:B-CellComponent)`, `bionlp_st_2013_gro_NER:I-MetabolicPathway)`, `bionlp_st_2013_gro_ner:B-ProteinIdentification)`, `bionlp_st_2011_ge_ner:O)`, `bionlp_st_2011_id_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelixTF)`, `mirna_ner:B-Relation_Trigger)`, `bionlp_st_2011_ge_NER:B-Regulation)`, `bionlp_st_2013_cg_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D008055)`, `chemdner_TEXT:MESH:D009944)`, `verspoor_2013_ner:I-gene)`, `bionlp_st_2013_ge_ner:O)`, `chemdner_TEXT:MESH:D003907)`, `mlee_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D010569)`, `mlee_NER:I-Growth)`, `chemdner_TEXT:MESH:D036145)`, `medmentions_full_ner:I-T196)`, `ehr_rel_sts:1)`, `bionlp_st_2013_gro_NER:B-CellularComponentOrganizationAndBiogenesis)`, `chemdner_TEXT:MESH:D009285)`, `bionlp_st_2013_gro_NER:B-ProteinMetabolism)`, `chemdner_TEXT:MESH:D016718)`, `bionlp_st_2013_gro_NER:I-BindingOfTFToTFBindingSiteOfProtein)`, `medmentions_full_ner:I-T074)`, `chemdner_TEXT:MESH:D000432)`, `bionlp_st_2013_gro_NER:I-CellFateDetermination)`, `chia_ner:I-Reference_point)`, `bionlp_st_2013_gro_ner:B-Histone)`, `lll_RE:None)`, `scai_disease_ner:B-ADVERSE)`, `medmentions_full_ner:B-T130)`, `bionlp_st_2013_gro_NER:I-CellCyclePhaseTransition)`, `chemdner_TEXT:MESH:D000480)`, `chemdner_TEXT:MESH:D001556)`, `bionlp_st_2013_gro_ner:B-Nucleus)`, `bionlp_st_2013_gro_ner:B-AP2EREBPRelatedDomain)`, `chemdner_TEXT:MESH:D007854)`, `chemdner_TEXT:MESH:D009499)`, `genia_term_corpus_ner:B-polynucleotide)`, `bionlp_st_2013_gro_NER:I-Transcription)`, `chemdner_TEXT:MESH:D007213)`, `bionlp_st_2013_ge_NER:B-Regulation)`, `bionlp_st_2011_epi_NER:B-DNA_methylation)`, `medmentions_st21pv_ner:B-T031)`, `bionlp_st_2013_ge_NER:I-Gene_expression)`, `chemdner_TEXT:MESH:D007651)`, `bionlp_st_2013_gro_NER:B-OrganismalProcess)`, `bionlp_st_2011_epi_COREF:None)`, `medmentions_st21pv_ner:I-T062)`, `chemdner_TEXT:MESH:D002047)`, `chemdner_TEXT:MESH:D012822)`, `mantra_gsc_en_patents_ner:B-DEVI)`, `medmentions_full_ner:I-T071)`, `chemdner_TEXT:MESH:D013739)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfGeneExpression)`, `genia_term_corpus_ner:B-other_name)`, `medmentions_full_ner:B-T018)`, `chemdner_TEXT:MESH:D015242)`, `bionlp_st_2013_cg_NER:O)`, `chemdner_TEXT:MESH:D019469)`, `ncbi_disease_ner:B-DiseaseClass)`, `ebm_pico_ner:B-Intervention_Surgical)`, `chemdner_TEXT:MESH:D011422)`, `chemdner_TEXT:MESH:D002112)`, `chemdner_TEXT:MESH:D005682)`, `anat_em_ner:B-Immaterial_anatomical_entity)`, `bionlp_st_2011_epi_ner:B-Entity)`, `medmentions_full_ner:I-T169)`, `mlee_ner:B-Immaterial_anatomical_entity)`, `verspoor_2013_ner:B-Physiology)`, `cellfinder_ner:I-CellType)`, `chemdner_TEXT:MESH:D011122)`, `chemdner_TEXT:MESH:D010622)`, `chemdner_TEXT:MESH:D017378)`, `bionlp_st_2011_ge_RE:Theme)`, `chemdner_TEXT:MESH:D000431)`, `medmentions_full_ner:I-T102)`, `medmentions_full_ner:B-T097)`, `chemdner_TEXT:MESH:D007529)`, `chemdner_TEXT:MESH:D045265)`, `chemdner_TEXT:MESH:D005971)`, `an_em_ner:I-Multi-tissue_structure)`, `genia_term_corpus_ner:I-ANDDNA_family_or_groupDNA_family_or_group)`, `medmentions_full_ner:I-T080)`, `chemdner_TEXT:MESH:D002207)`, `chia_ner:I-Qualifier)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscriptionByTranscriptionRepressor)`, `an_em_ner:I-Immaterial_anatomical_entity)`, `biosses_sts:5)`, `chemdner_TEXT:MESH:D000079963)`, `chemdner_TEXT:MESH:D013196)`, `ehr_rel_sts:2)`, `chemdner_TEXT:MESH:D006152)`, `bionlp_st_2013_gro_NER:B-RegulationOfProcess)`, `mlee_NER:I-Development)`, `medmentions_full_ner:B-T197)`, `bionlp_st_2013_gro_ner:B-NucleicAcid)`, `medmentions_st21pv_ner:I-T017)`, `medmentions_full_ner:I-T046)`, `medmentions_full_ner:B-T204)`, `bionlp_st_2013_gro_NER:B-CellularDevelopmentalProcess)`, `bionlp_st_2013_cg_ner:B-Immaterial_anatomical_entity)`, `chemdner_TEXT:MESH:D014212)`, `bionlp_st_2013_cg_NER:B-Protein_processing)`, `chemdner_TEXT:MESH:D008926)`, `chia_ner:B-Visit)`, `bionlp_st_2011_ge_NER:B-Negative_regulation)`, `mantra_gsc_en_medline_ner:I-OBJC)`, `mlee_RE:FromLoc)`, `bionlp_st_2013_gro_ner:I-RNAMolecule)`, `chemdner_TEXT:MESH:D014812)`, `linnaeus_filtered_ner:I-species)`, `chebi_nactem_fullpaper_ner:B-Chemical)`, `bionlp_st_2011_ge_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_ner:B-MutantGene)`, `chemdner_TEXT:MESH:D014859)`, `bionlp_st_2019_bb_ner:B-Phenotype)`, `bionlp_st_2013_gro_NER:I-BindingOfTFToTFBindingSiteOfDNA)`, `diann_iber_eval_en_ner:I-Neg)`, `ddi_corpus_ner:B-DRUG_N)`, `bionlp_st_2013_cg_ner:B-Organ)`, `chemdner_TEXT:MESH:D009320)`, `bionlp_st_2013_cg_ner:I-Organism_subdivision)`, `bionlp_st_2013_cg_ner:B-Cellular_component)`, `chemdner_TEXT:MESH:D003188)`, `chemdner_TEXT:MESH:D001241)`, `chemdner_TEXT:MESH:D004811)`, `bioinfer_ner:I-GeneproteinRNA)`, `chemdner_TEXT:MESH:D002248)`, `bionlp_shared_task_2009_NER:B-Negative_regulation)`, `chemdner_TEXT:MESH:D000143)`, `chemdner_TEXT:MESH:D007099)`, `nlm_gene_ner:O)`, `chemdner_TEXT:MESH:D005485)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorBindingSiteOfDNA)`, `bionlp_st_2013_gro_ner:B-PhysicalContact)`, `medmentions_full_ner:B-T167)`, `medmentions_st21pv_ner:B-T091)`, `seth_corpus_ner:I-Gene)`, `bionlp_st_2011_ge_COREF:coref)`, `bionlp_st_2011_ge_NER:B-Gene_expression)`, `medmentions_full_ner:B-T031)`, `genia_relation_corpus_RE:None)`, `genia_term_corpus_ner:I-ANDDNA_domain_or_regionDNA_domain_or_region)`, `chemdner_TEXT:MESH:D014970)`, `bionlp_st_2013_gro_NER:B-Mutation)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivator)`, `chemdner_TEXT:MESH:D002217)`, `chemdner_TEXT:MESH:D003367)`, `medmentions_full_ner:I-UnknownType)`, `chemdner_TEXT:MESH:D002998)`, `bionlp_st_2013_gro_ner:I-Phenotype)`, `genia_term_corpus_ner:B-ANDDNA_family_or_groupDNA_family_or_group)`, `hprd50_RE:PPI)`, `chemdner_TEXT:MESH:D002118)`, `scai_chemical_ner:B-IUPAC)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfProtein)`, `verspoor_2013_ner:B-mutation)`, `chemdner_TEXT:MESH:D011719)`, `chemdner_TEXT:MESH:D013729)`, `bionlp_shared_task_2009_ner:O)`, `chemdner_TEXT:MESH:D005840)`, `chemdner_TEXT:MESH:D009287)`, `medmentions_full_ner:B-T029)`, `chemdner_TEXT:MESH:D037742)`, `medmentions_full_ner:I-T200)`, `chemdner_TEXT:MESH:D012503)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndRNA)`, `mirna_ner:I-Non-Specific_miRNAs)`, `bionlp_st_2013_gro_ner:B-ProteinBindingSiteOfProtein)`, `bionlp_st_2013_pc_NER:B-Deacetylation)`, `chemprot_RE:CPR:7)`, `chia_ner:I-Value)`, `medmentions_full_ner:I-T048)`, `chemprot_ner:B-GENE-Y)`, `bionlp_st_2013_cg_NER:B-Reproduction)`, `bionlp_st_2011_id_ner:I-Regulon-operon)`, `ebm_pico_ner:I-Outcome_Adverse-effects)`, `bioinfer_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-bZIPTF)`, `mirna_ner:I-GenesProteins)`, `biorelex_ner:I-process)`, `chemdner_TEXT:MESH:D001555)`, `genia_term_corpus_ner:B-DNA_domain_or_region)`, `cellfinder_ner:O)`, `bionlp_st_2013_gro_ner:I-MutatedProtein)`, `bionlp_st_2013_gro_NER:I-CellularComponentOrganizationAndBiogenesis)`, `spl_adr_200db_train_ner:O)`, `medmentions_full_ner:I-T026)`, `chemdner_TEXT:MESH:D013619)`, `bionlp_st_2013_gro_NER:I-BindingToRNA)`, `biorelex_ner:I-drug)`, `bionlp_st_2013_pc_NER:B-Translation)`, `mantra_gsc_en_emea_ner:B-LIVB)`, `mantra_gsc_en_patents_ner:B-PROC)`, `bionlp_st_2013_pc_NER:B-Binding)`, `bionlp_st_2013_gro_NER:B-ModificationOfMolecularEntity)`, `bionlp_st_2013_cg_NER:I-Cell_transformation)`, `scai_chemical_ner:B-TRIVIALVAR)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomain)`, `bionlp_st_2013_gro_NER:I-TranscriptionInitiation)`, `chemdner_TEXT:MESH:D010907)`, `bionlp_st_2013_gro_ner:B-InorganicChemical)`, `bionlp_st_2013_pc_RE:None)`, `chemdner_TEXT:MESH:D002922)`, `chemdner_TEXT:MESH:D010743)`, `bionlp_st_2019_bb_ner:O)`, `medmentions_full_ner:I-T001)`, `chemdner_TEXT:MESH:D001381)`, `bionlp_shared_task_2009_ner:I-Protein)`, `bionlp_st_2013_gro_ner:B-Spliceosome)`, `bionlp_st_2013_gro_ner:I-HMGTF)`, `minimayosrs_sts:3)`, `ddi_corpus_RE:ADVISE)`, `mlee_NER:B-Dissociation)`, `bionlp_st_2013_gro_ner:I-Holoenzyme)`, `chemdner_TEXT:MESH:D001552)`, `bionlp_st_2013_gro_ner:B-bHLH)`, `chemdner_TEXT:MESH:D000109)`, `chemdner_TEXT:MESH:D013449)`, `bionlp_st_2013_gro_ner:I-GeneRegion)`, `medmentions_full_ner:B-T019)`, `scai_chemical_ner:B-TRIVIAL)`, `mlee_ner:B-Gene_or_gene_product)`, `biosses_sts:3)`, `bionlp_st_2013_cg_NER:I-Pathway)`, `bionlp_st_2011_id_ner:I-Organism)`, `bionlp_st_2013_gro_ner:B-tRNA)`, `chemdner_TEXT:MESH:D013109)`, `mlee_ner:I-Immaterial_anatomical_entity)`, `medmentions_full_ner:B-T065)`, `ebm_pico_ner:I-Participant_Sample-size)`, `mlee_RE:AtLoc)`, `genia_term_corpus_ner:I-protein_family_or_group)`, `chemdner_TEXT:MESH:D002444)`, `chemdner_TEXT:MESH:D063388)`, `mlee_NER:B-Translation)`, `chemdner_TEXT:MESH:D007052)`, `bionlp_st_2013_gro_ner:B-Gene)`, `chia_ner:B-Scope)`, `bionlp_st_2013_ge_NER:I-Positive_regulation)`, `chemdner_TEXT:MESH:D007785)`, `medmentions_st21pv_ner:I-T097)`, `iepa_RE:None)`, `medmentions_full_ner:B-T001)`, `medmentions_full_ner:I-T194)`, `chemdner_TEXT:MESH:D047309)`, `bionlp_st_2013_gro_ner:B-Substrate)`, `chemdner_TEXT:MESH:D002186)`, `ebm_pico_ner:B-Outcome_Other)`, `bionlp_st_2013_gro_NER:I-OrganismalProcess)`, `bionlp_st_2013_gro_ner:B-Ion)`, `bionlp_st_2013_gro_NER:I-ProteinBiosynthesis)`, `chia_ner:B-Drug)`, `bionlp_st_2013_gro_ner:I-MolecularEntity)`, `anat_em_ner:B-Cellular_component)`, `bionlp_st_2013_cg_ner:B-Multi-tissue_structure)`, `medmentions_full_ner:I-T122)`, `an_em_ner:B-Cell)`, `chemdner_TEXT:MESH:D011564)`, `bionlp_st_2013_gro_NER:B-Splicing)`, `bionlp_st_2013_cg_NER:I-Metabolism)`, `bionlp_st_2013_pc_NER:B-Activation)`, `bionlp_st_2013_gro_ner:I-BindingSiteOfProtein)`, `bionlp_st_2011_id_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:I-Ribosome)`, `nlmchem_ner:I-Chemical)`, `mirna_ner:I-Specific_miRNAs)`, `medmentions_full_ner:I-T012)`, `bionlp_st_2013_gro_NER:B-IntraCellularTransport)`, `mlee_RE:Instrument)`, `bionlp_st_2011_id_NER:I-Transcription)`, `mantra_gsc_en_patents_ner:I-ANAT)`, `an_em_ner:B-Immaterial_anatomical_entity)`, `scai_chemical_ner:I-IUPAC)`, `bionlp_st_2011_epi_NER:B-Deubiquitination)`, `chemdner_TEXT:MESH:D007295)`, `bionlp_st_2011_ge_NER:B-Binding)`, `bionlp_st_2013_pc_NER:B-Localization)`, `chia_ner:B-Procedure)`, `medmentions_full_ner:I-T109)`, `chemdner_TEXT:MESH:D002791)`, `mantra_gsc_en_medline_ner:I-CHEM)`, `chebi_nactem_fullpaper_ner:B-Biological_Activity)`, `ncbi_disease_ner:B-SpecificDisease)`, `medmentions_full_ner:B-T063)`, `chemdner_TEXT:MESH:D016595)`, `bionlp_st_2011_id_NER:B-Transcription)`, `bionlp_st_2013_gro_ner:B-DNAMolecule)`, `mlee_NER:B-Protein_processing)`, `biorelex_ner:B-protein-complex)`, `anat_em_ner:I-Cancer)`, `bionlp_st_2013_cg_RE:AtLoc)`, `medmentions_full_ner:I-T072)`, `bio_sim_verb_sts:2)`, `seth_corpus_ner:O)`, `medmentions_full_ner:B-T070)`, `biorelex_ner:I-experiment-tag)`, `chemdner_TEXT:MESH:D020126)`, `biorelex_ner:I-protein-RNA-complex)`, `bionlp_st_2013_pc_NER:I-Phosphorylation)`, `medmentions_st21pv_ner:I-T201)`, `genia_term_corpus_ner:B-protein_complex)`, `medmentions_full_ner:I-T125)`, `bionlp_st_2013_ge_ner:I-Entity)`, `chemdner_TEXT:MESH:D054659)`, `bionlp_st_2013_pc_RE:ToLoc)`, `medmentions_full_ner:B-T099)`, `bionlp_st_2013_gro_NER:B-Binding)`, `medmentions_full_ner:B-T114)`, `spl_adr_200db_train_ner:B-Factor)`, `mlee_RE:CSite)`, `bionlp_st_2013_gro_ner:B-HMG)`, `bionlp_st_2013_gro_ner:B-Operon)`, `bionlp_st_2013_ge_NER:I-Protein_catabolism)`, `ebm_pico_ner:I-Outcome_Pain)`, `bionlp_st_2013_ge_NER:B-Transcription)`, `chemdner_TEXT:MESH:D000880)`, `ebm_pico_ner:I-Outcome_Physical)`, `bionlp_st_2013_gro_ner:I-ProteinBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D006160)`, `gnormplus_ner:B-DomainMotif)`, `medmentions_full_ner:I-T016)`, `pdr_ner:I-Disease)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToProteinBindingSiteOfProtein)`, `chemdner_TEXT:MESH:D002264)`, `genia_term_corpus_ner:I-protein_NA)`, `bionlp_shared_task_2009_NER:I-Negative_regulation)`, `medmentions_full_ner:I-T011)`, `bionlp_st_2013_gro_NER:I-CellularMetabolicProcess)`, `mqp_sts:1)`, `an_em_ner:I-Pathological_formation)`, `bionlp_st_2011_epi_NER:B-Deacetylation)`, `bionlp_st_2013_pc_RE:Theme)`, `medmentions_full_ner:I-T103)`, `bionlp_st_2011_epi_NER:B-Methylation)`, `ebm_pico_ner:B-Intervention_Psychological)`, `bionlp_st_2013_gro_ner:B-Stress)`, `genia_term_corpus_ner:B-multi_cell)`, `bionlp_st_2013_cg_NER:B-Positive_regulation)`, `anat_em_ner:I-Cellular_component)`, `spl_adr_200db_train_ner:I-Negation)`, `chemdner_TEXT:MESH:D000605)`, `mlee_RE:Cause)`, `bionlp_st_2013_gro_ner:B-RegulatoryDNARegion)`, `bionlp_st_2013_gro_ner:I-HomeoboxTF)`, `bionlp_st_2013_gro_NER:I-GeneSilencing)`, `ddi_corpus_ner:I-DRUG)`, `bionlp_st_2013_cg_NER:I-Growth)`, `mantra_gsc_en_medline_ner:B-OBJC)`, `mayosrs_sts:3)`, `bionlp_st_2013_gro_NER:B-RNAProcessing)`, `cellfinder_ner:B-CellType)`, `medmentions_full_ner:B-T007)`, `chemprot_ner:B-GENE-N)`, `biorelex_ner:B-brand)`, `ebm_pico_ner:B-Outcome_Mental)`, `bionlp_st_2013_gro_NER:B-RegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-EukaryoticCell)`, `genia_term_corpus_ner:I-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:I-T184)`, `bionlp_st_2013_gro_NER:B-RegulatoryProcess)`, `bionlp_st_2011_id_NER:B-Negative_regulation)`, `bionlp_st_2013_cg_NER:I-Development)`, `cellfinder_ner:I-Anatomy)`, `chia_ner:B-Condition)`, `chemdner_TEXT:MESH:D003065)`, `medmentions_full_ner:B-T012)`, `bionlp_st_2011_id_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorComplex)`, `bionlp_st_2013_cg_NER:I-Carcinogenesis)`, `medmentions_full_ner:B-T064)`, `medmentions_full_ner:B-T026)`, `nlmchem_ner:B-Chemical)`, `genia_term_corpus_ner:I-RNA_domain_or_region)`, `ebm_pico_ner:I-Intervention_Educational)`, `genia_term_corpus_ner:B-ANDcell_linecell_line)`, `genia_term_corpus_ner:B-protein_substructure)`, `bionlp_st_2013_gro_NER:I-ProteinTransport)`, `bionlp_st_2013_cg_NER:B-DNA_demethylation)`, `medmentions_full_ner:I-T058)`, `biorelex_ner:B-parameter)`, `chemdner_TEXT:MESH:D013006)`, `mirna_ner:I-Relation_Trigger)`, `bionlp_st_2013_gro_ner:B-PrimaryStructure)`, `bionlp_st_2013_gro_NER:I-Phosphorylation)`, `chemdner_TEXT:MESH:D003911)`, `pico_extraction_ner:I-participant)`, `chemdner_TEXT:MESH:D010938)`, `chia_ner:B-Person)`, `an_em_ner:B-Tissue)`, `medmentions_st21pv_ner:B-T170)`, `chemdner_TEXT:MESH:D013936)`, `chemdner_TEXT:MESH:D001080)`, `mlee_RE:None)`, `chemdner_TEXT:MESH:D013669)`, `chemdner_TEXT:MESH:D009943)`, `spl_adr_200db_train_ner:I-Factor)`, `chemdner_TEXT:MESH:D044004)`, `ebm_pico_ner:I-Participant_Sex)`, `chemdner_TEXT:MESH:D000409)`, `bionlp_st_2013_cg_NER:B-Cell_division)`, `medmentions_st21pv_ner:B-T033)`, `pcr_ner:I-Herb)`, `chemdner_TEXT:MESH:D020112)`, `bionlp_st_2013_pc_NER:B-Gene_expression)`, `bionlp_st_2011_rel_ner:O)`, `chemdner_TEXT:MESH:D008610)`, `bionlp_st_2013_gro_NER:B-BindingOfDNABindingDomainOfProteinToDNA)`, `bionlp_st_2013_gro_ner:I-Cell)`, `medmentions_full_ner:I-T055)`, `bionlp_st_2013_pc_NER:I-Negative_regulation)`, `chia_RE:Has_value)`, `tmvar_v1_ner:I-SNP)`, `biorelex_ner:I-experimental-construct)`, `genia_term_corpus_ner:B-)`, `chemdner_TEXT:MESH:D053978)`, `bionlp_st_2013_gro_ner:I-Stress)`, `mlee_ner:B-Pathological_formation)`, `bionlp_st_2013_cg_ner:O)`, `chemdner_TEXT:MESH:D007631)`, `chemdner_TEXT:MESH:D011084)`, `medmentions_full_ner:B-T080)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-TranscriptionCorepressor)`, `ehr_rel_sts:4)`, `mlee_ner:I-Gene_or_gene_product)`, `chemdner_TEXT:MESH:D003474)`, `medmentions_full_ner:B-T098)`, `scicite_TEXT:method)`, `medmentions_full_ner:B-T100)`, `chemdner_TEXT:MESH:D011849)`, `medmentions_full_ner:I-T039)`, `anat_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:I-Nucleus)`, `mlee_NER:I-Regulation)`, `bionlp_st_2013_gro_ner:I-NuclearReceptor)`, `bionlp_st_2013_ge_RE:None)`, `chemdner_TEXT:MESH:D019483)`, `bionlp_st_2013_cg_ner:B-Cell)`, `bionlp_st_2013_gro_ner:B-Holoenzyme)`, `bionlp_st_2011_epi_NER:I-Methylation)`, `bionlp_shared_task_2009_ner:B-Protein)`, `medmentions_st21pv_ner:I-T038)`, `bionlp_st_2013_gro_ner:I-DNARegion)`, `bionlp_st_2013_gro_NER:I-CellCyclePhase)`, `bionlp_st_2013_gro_ner:I-tRNA)`, `mlee_ner:I-Multi-tissue_structure)`, `chemprot_ner:O)`, `medmentions_full_ner:B-T094)`, `bionlp_st_2013_gro_RE:fromSpecies)`, `bionlp_st_2013_gro_NER:O)`, `bionlp_st_2013_gro_NER:B-Acetylation)`, `bioinfer_ner:I-Protein_family_or_group)`, `medmentions_st21pv_ner:I-T098)`, `pdr_ner:B-Disease)`, `chemdner_ner:I-Chemical)`, `bionlp_st_2013_cg_NER:B-Negative_regulation)`, `chebi_nactem_fullpaper_ner:B-Chemical_Structure)`, `bionlp_st_2011_ge_NER:I-Negative_regulation)`, `diann_iber_eval_en_ner:O)`, `bionlp_shared_task_2009_NER:I-Binding)`, `mlee_NER:I-Cell_proliferation)`, `chebi_nactem_fullpaper_ner:B-Protein)`, `bionlp_st_2013_gro_NER:B-Phosphorylation)`, `bionlp_st_2011_epi_COREF:coref)`, `medmentions_full_ner:B-T200)`, `bionlp_st_2013_cg_ner:B-Tissue)`, `chemdner_TEXT:MESH:D000082)`, `chemdner_TEXT:MESH:D037201)`, `bionlp_st_2013_gro_ner:B-ComplexMolecularEntity)`, `bionlp_st_2011_ge_RE:ToLoc)`, `diann_iber_eval_en_ner:B-Neg)`, `bionlp_st_2013_gro_ner:B-RibosomalRNA)`, `bionlp_shared_task_2009_NER:I-Protein_catabolism)`, `chemdner_TEXT:MESH:D016912)`, `medmentions_full_ner:B-T017)`, `bionlp_st_2013_gro_ner:B-CpGIsland)`, `mlee_ner:I-Organism_substance)`, `medmentions_full_ner:I-T075)`, `bionlp_st_2013_gro_ner:I-SecondMessenger)`, `bioinfer_ner:B-Protein_family_or_group)`, `bionlp_st_2013_cg_NER:I-Negative_regulation)`, `mantra_gsc_en_emea_ner:B-CHEM)`, `genia_term_corpus_ner:B-DNA_NA)`, `chemdner_TEXT:MESH:D057888)`, `chemdner_TEXT:MESH:D006495)`, `chemdner_TEXT:MESH:D006575)`, `geokhoj_v1_TEXT:0)`, `bionlp_st_2013_gro_RE:locatedIn)`, `genia_term_corpus_ner:B-virus)`, `bionlp_st_2013_gro_ner:B-RuntLikeDomain)`, `medmentions_full_ner:B-T131)`, `bionlp_st_2013_gro_ner:I-ProteinCodingRegion)`, `chemdner_TEXT:MESH:D015525)`, `genia_term_corpus_ner:I-mono_cell)`, `chemdner_TEXT:MESH:D007840)`, `medmentions_full_ner:I-T098)`, `chemdner_TEXT:MESH:D009930)`, `genia_term_corpus_ner:I-polynucleotide)`, `biorelex_ner:I-protein-region)`, `bionlp_st_2011_id_NER:I-Process)`, `bionlp_st_2013_gro_NER:I-CellularProcess)`, `medmentions_full_ner:B-T023)`, `chemdner_TEXT:MESH:D008942)`, `medmentions_full_ner:I-T070)`, `biorelex_ner:B-organelle)`, `bionlp_st_2013_gro_NER:I-Decrease)`, `verspoor_2013_ner:I-size)`, `chemdner_TEXT:MESH:D002945)`, `ebm_pico_ner:B-Intervention_Other)`, `bionlp_st_2013_cg_ner:I-Simple_chemical)`, `chemdner_TEXT:MESH:D008751)`, `chia_RE:AND)`, `medmentions_full_ner:I-T028)`, `ebm_pico_ner:I-Intervention_Other)`, `chemdner_TEXT:MESH:D005472)`, `chemdner_TEXT:MESH:D005070)`, `gnormplus_ner:B-Gene)`, `medmentions_full_ner:I-T190)`, `mlee_NER:B-Breakdown)`, `bioinfer_ner:B-GeneproteinRNA)`, `bioinfer_ner:B-Gene)`, `chemdner_TEXT:MESH:D006835)`, `chemdner_TEXT:MESH:D004298)`, `chemdner_TEXT:MESH:D002951)`, `chia_ner:I-Device)`, `bionlp_st_2013_pc_NER:B-Conversion)`, `bionlp_shared_task_2009_NER:I-Transcription)`, `mlee_NER:B-DNA_methylation)`, `pubmed_qa_labeled_fold0_CLF:no)`, `minimayosrs_sts:1)`, `chemdner_TEXT:MESH:D002166)`, `chemdner_TEXT:MESH:D005934)`, `bionlp_st_2013_gro_NER:B-CatabolicPathway)`, `tmvar_v1_ner:I-ProteinMutation)`, `verspoor_2013_ner:I-Phenomena)`, `medmentions_full_ner:B-T011)`, `chemdner_TEXT:MESH:D001218)`, `medmentions_full_ner:B-T185)`, `mantra_gsc_en_patents_ner:I-PROC)`, `medmentions_full_ner:I-T120)`, `chia_ner:I-Procedure)`, `genia_term_corpus_ner:I-ANDcell_typecell_type)`, `bionlp_st_2011_id_ner:I-Entity)`, `pcr_ner:B-Chemical)`, `bionlp_st_2013_gro_NER:B-PositiveRegulation)`, `mlee_RE:Theme)`, `bionlp_st_2011_epi_ner:B-Protein)`, `medmentions_full_ner:B-T055)`, `spl_adr_200db_train_ner:I-Severity)`, `bionlp_st_2013_gro_ner:I-Ion)`, `bionlp_st_2011_id_RE:Cause)`, `bc5cdr_ner:I-Disease)`, `bionlp_st_2013_gro_ner:I-bHLH)`, `chemdner_TEXT:MESH:D001058)`, `bionlp_st_2013_gro_ner:I-AminoAcid)`, `bionlp_st_2011_epi_NER:B-Phosphorylation)`, `medmentions_full_ner:B-T086)`, `chemdner_TEXT:MESH:D004441)`, `medmentions_st21pv_ner:I-T007)`, `biorelex_ner:B-drug)`, `mantra_gsc_en_patents_ner:I-DISO)`, `medmentions_full_ner:I-T197)`, `bionlp_st_2011_ge_RE:AtLoc)`, `bionlp_st_2013_gro_NER:B-MolecularProcess)`, `bionlp_st_2011_ge_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:B-FormationOfTranscriptionInitiationComplex)`, `bionlp_st_2011_ge_NER:I-Binding)`, `mirna_ner:B-GenesProteins)`, `mirna_ner:B-Diseases)`, `mantra_gsc_en_emea_ner:I-DISO)`, `anat_em_ner:I-Multi-tissue_structure)`, `bioinfer_ner:O)`, `chemdner_TEXT:MESH:D017673)`, `bionlp_st_2013_gro_NER:B-Methylation)`, `genia_term_corpus_ner:I-AND_NOTcell_typecell_type)`, `bionlp_st_2013_cg_NER:I-Positive_regulation)`, `bionlp_st_2013_cg_NER:B-Carcinogenesis)`, `chemdner_TEXT:MESH:D009543)`, `gnormplus_ner:I-Gene)`, `bionlp_st_2013_cg_RE:Participant)`, `chemdner_TEXT:MESH:D019804)`, `seth_corpus_RE:Equals)`, `medmentions_full_ner:I-T082)`, `hprd50_ner:O)`, `bionlp_st_2013_gro_ner:B-OxidativeStress)`, `chemdner_TEXT:MESH:D014227)`, `bio_sim_verb_sts:7)`, `bionlp_st_2011_ge_NER:I-Protein_catabolism)`, `bionlp_st_2011_ge_NER:B-Localization)`, `chemdner_TEXT:MESH:D001224)`, `chemdner_TEXT:MESH:D009842)`, `bionlp_st_2013_cg_ner:B-Amino_acid)`, `bionlp_st_2013_gro_NER:B-CellCyclePhase)`, `chemdner_TEXT:MESH:D002245)`, `bionlp_st_2013_ge_NER:I-Ubiquitination)`, `bionlp_st_2013_cg_NER:I-Cell_death)`, `pico_extraction_ner:O)`, `chemdner_TEXT:MESH:D000596)`, `chemdner_TEXT:MESH:D000638)`, `an_em_ner:B-Developing_anatomical_structure)`, `bionlp_st_2019_bb_ner:I-Phenotype)`, `bionlp_st_2013_gro_NER:I-CellDeath)`, `mantra_gsc_en_patents_ner:B-PHYS)`, `chemdner_TEXT:MESH:D009705)`, `genia_term_corpus_ner:B-protein_molecule)`, `mantra_gsc_en_medline_ner:B-PHEN)`, `bionlp_st_2013_gro_NER:I-PosttranslationalModification)`, `ddi_corpus_ner:B-BRAND)`, `mantra_gsc_en_medline_ner:B-DEVI)`, `mlee_NER:I-Planned_process)`, `tmvar_v1_ner:O)`, `bionlp_st_2011_ge_NER:I-Phosphorylation)`, `genia_term_corpus_ner:I-ANDprotein_substructureprotein_substructure)`, `medmentions_st21pv_ner:B-T007)`, `bionlp_st_2013_cg_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:B-Organism)`, `bionlp_st_2013_gro_ner:I-NucleicAcid)`, `medmentions_full_ner:I-T044)`, `chia_ner:I-Person)`, `chemdner_TEXT:MESH:D016572)`, `scai_disease_ner:O)`, `bionlp_st_2013_gro_ner:B-TranscriptionCofactor)`, `chemdner_TEXT:MESH:D002762)`, `chemdner_TEXT:MESH:D011685)`, `chemdner_TEXT:MESH:D005031)`, `scai_disease_ner:I-ADVERSE)`, `biorelex_ner:I-protein-isoform)`, `bionlp_shared_task_2009_COREF:None)`, `genia_term_corpus_ner:I-lipid)`, `biorelex_ner:B-RNA)`, `chemdner_TEXT:MESH:D018020)`, `scai_chemical_ner:B-FAMILY)`, `chemdner_TEXT:MESH:D017382)`, `chemdner_TEXT:MESH:D006027)`, `chemdner_TEXT:MESH:D018942)`, `medmentions_full_ner:I-T024)`, `chemdner_TEXT:MESH:D008050)`, `bionlp_st_2013_cg_NER:B-Glycosylation)`, `chemdner_TEXT:MESH:D019342)`, `chemdner_TEXT:MESH:D008774)`, `bionlp_st_2011_ge_RE:CSite)`, `bionlp_st_2013_gro_ner:B-HMGTF)`, `chemdner_ner:B-Chemical)`, `bioscope_papers_ner:B-negation)`, `biorelex_RE:bind)`, `bioinfer_ner:B-Protein_complex)`, `bionlp_st_2011_epi_NER:B-Ubiquitination)`, `bionlp_st_2013_gro_NER:I-RegulationOfTranscription)`, `chemdner_TEXT:MESH:D011134)`, `bionlp_st_2011_rel_ner:I-Entity)`, `mantra_gsc_en_medline_ner:I-PROC)`, `ncbi_disease_ner:I-DiseaseClass)`, `chemdner_TEXT:MESH:D014315)`, `bionlp_st_2013_gro_ner:I-Chromosome)`, `chemdner_TEXT:MESH:D000639)`, `chemdner_TEXT:MESH:D005740)`, `bionlp_st_2013_gro_ner:I-MolecularFunction)`, `verspoor_2013_ner:B-gene)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomainTF)`, `bionlp_st_2013_gro_ner:B-DNARegion)`, `ebm_pico_ner:B-Intervention_Educational)`, `medmentions_st21pv_ner:B-T005)`, `medmentions_full_ner:I-T022)`, `gnormplus_ner:B-FamilyName)`, `bionlp_st_2011_epi_RE:Contextgene)`, `bionlp_st_2013_pc_NER:B-Demethylation)`, `chia_ner:I-Observation)`, `medmentions_full_ner:I-T089)`, `bionlp_st_2013_gro_ner:I-ComplexMolecularEntity)`, `bionlp_st_2013_gro_ner:B-Lipid)`, `biorelex_ner:I-gene)`, `chemdner_TEXT:MESH:D003300)`, `chemdner_TEXT:MESH:D008903)`, `verspoor_2013_RE:relatedTo)`, `bionlp_st_2011_epi_NER:I-DNA_methylation)`, `genia_term_corpus_ner:I-cell_component)`, `bionlp_st_2011_ge_COREF:None)`, `ebm_pico_ner:B-Participant_Sample-size)`, `chemdner_TEXT:MESH:D043823)`, `chemdner_TEXT:MESH:D004958)`, `bionlp_st_2013_gro_ner:I-RNA)`, `chemdner_TEXT:MESH:D006150)`, `bionlp_st_2013_gro_ner:B-MolecularStructure)`, `chemdner_TEXT:MESH:D007457)`, `bionlp_st_2013_gro_ner:I-OxidativeStress)`, `scai_chemical_ner:B-PARTIUPAC)`, `mlee_NER:I-Blood_vessel_development)`, `bionlp_shared_task_2009_ner:B-Entity)`, `bionlp_st_2013_ge_RE:CSite)`, `medmentions_full_ner:B-T058)`, `chemdner_TEXT:MESH:D000628)`, `ebm_pico_ner:I-Intervention_Surgical)`, `an_em_ner:I-Organ)`, `bionlp_st_2013_gro_NER:B-Increase)`, `iepa_RE:PPI)`, `mlee_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D014284)`, `chemdner_TEXT:MESH:D014260)`, `bionlp_st_2011_epi_NER:I-Glycosylation)`, `bionlp_st_2013_gro_NER:B-BindingToProtein)`, `bionlp_st_2013_gro_NER:B-BindingToRNA)`, `medmentions_full_ner:I-T047)`, `bionlp_st_2013_gro_NER:B-Localization)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfGeneExpression)`, `medmentions_full_ner:I-T051)`, `bionlp_st_2011_id_COREF:None)`, `chemdner_TEXT:MESH:D011744)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToDNA)`, `bionlp_st_2013_gro_ner:B-CatalyticActivity)`, `chebi_nactem_abstr_ann1_ner:I-Biological_Activity)`, `bio_sim_verb_sts:1)`, `chemdner_TEXT:MESH:D012402)`, `bionlp_st_2013_gro_ner:B-bZIPTF)`, `chemdner_TEXT:MESH:D003913)`, `bionlp_shared_task_2009_RE:Site)`, `bionlp_st_2013_gro_ner:I-AntisenseRNA)`, `bionlp_st_2013_gro_NER:B-ProteinTargeting)`, `bionlp_st_2013_gro_NER:B-GeneExpression)`, `bionlp_st_2013_cg_NER:I-Blood_vessel_development)`, `mantra_gsc_en_patents_ner:I-CHEM)`, `mayosrs_sts:2)`, `chemdner_TEXT:MESH:D001645)`, `bionlp_st_2011_ge_NER:I-Transcription)`, `bionlp_st_2011_epi_NER:B-Acetylation)`, `medmentions_full_ner:B-T002)`, `verspoor_2013_ner:I-Concepts_Ideas)`, `hprd50_RE:None)`, `ddi_corpus_ner:O)`, `chemdner_TEXT:MESH:D014131)`, `ebm_pico_ner:B-Outcome_Physical)`, `medmentions_st21pv_ner:B-T103)`, `chemdner_TEXT:MESH:D016650)`, `mlee_NER:B-Cell_proliferation)`, `bionlp_st_2013_gro_ner:I-TranscriptionCoactivator)`, `chebi_nactem_fullpaper_ner:I-Chemical)`, `chemdner_TEXT:MESH:D013256)`, `biorelex_ner:I-protein-DNA-complex)`, `chemdner_TEXT:MESH:D008767)`, `bioinfer_RE:None)`, `nlm_gene_ner:B-Gene)`, `bionlp_st_2013_gro_ner:B-ReporterGene)`, `biosses_sts:1)`, `chemdner_TEXT:MESH:D000493)`, `chemdner_TEXT:MESH:D011374)`, `ebm_pico_ner:B-Intervention_Control)`, `bionlp_st_2013_pc_NER:I-Pathway)`, `chemprot_RE:CPR:3)`, `bionlp_st_2013_cg_ner:I-Amino_acid)`, `chemdner_TEXT:MESH:D005557)`, `bionlp_st_2011_ge_RE:Site)`, `bionlp_st_2013_pc_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:I-Elongation)`, `bionlp_st_2011_ge_NER:I-Localization)`, `spl_adr_200db_train_ner:B-Negation)`, `chemdner_TEXT:MESH:D010455)`, `nlm_gene_ner:B-GENERIF)`, `mlee_RE:Site)`, `bionlp_st_2013_gro_NER:B-BindingOfTFToTFBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D017953)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscription)`, `osiris_ner:B-gene)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressor)`, `medmentions_full_ner:I-T131)`, `genia_term_corpus_ner:B-protein_family_or_group)`, `genia_term_corpus_ner:B-cell_type)`, `chemdner_TEXT:MESH:D013759)`, `chemdner_TEXT:MESH:D002247)`, `scai_chemical_ner:I-FAMILY)`, `chemdner_TEXT:MESH:D006020)`, `biorelex_ner:B-DNA)`, `chebi_nactem_abstr_ann1_ner:I-Spectral_Data)`, `mantra_gsc_en_medline_ner:B-DISO)`, `chemdner_TEXT:MESH:D019829)`, `ncbi_disease_ner:I-CompositeMention)`, `chemdner_TEXT:MESH:D013876)`, `chebi_nactem_fullpaper_ner:I-Spectral_Data)`, `biorelex_ner:I-DNA)`, `chemdner_TEXT:MESH:D005492)`, `chemdner_TEXT:MESH:D011810)`, `chemdner_TEXT:MESH:D008563)`, `chemdner_TEXT:MESH:D015735)`, `bionlp_st_2019_bb_ner:B-Microorganism)`, `ddi_corpus_RE:INT)`, `medmentions_st21pv_ner:B-T038)`, `bionlp_st_2013_gro_NER:B-CellCyclePhaseTransition)`, `cellfinder_ner:B-CellLine)`, `pdr_RE:Cause)`, `chemdner_TEXT:MESH:D011433)`, `chemdner_TEXT:MESH:D011720)`, `chemdner_TEXT:MESH:D020156)`, `ebm_pico_ner:O)`, `mlee_ner:B-Organ)`, `chemdner_TEXT:MESH:D012721)`, `chebi_nactem_fullpaper_ner:I-Biological_Activity)`, `bionlp_st_2013_cg_COREF:coref)`, `chemdner_TEXT:MESH:D006918)`, `medmentions_full_ner:B-T092)`, `genia_term_corpus_ner:B-protein_NA)`, `bionlp_st_2013_ge_ner:B-Entity)`, `an_em_ner:B-Multi-tissue_structure)`, `chia_ner:I-Measurement)`, `chia_RE:Has_temporal)`, `bionlp_st_2011_id_NER:B-Protein_catabolism)`, `bionlp_st_2013_gro_NER:B-CellAdhesion)`, `bionlp_st_2013_gro_ner:B-DNABindingSite)`, `biorelex_ner:B-organism)`, `scai_disease_ner:I-DISEASE)`, `bionlp_st_2013_gro_ner:I-DNABindingSite)`, `chemdner_TEXT:MESH:D016607)`, `chemdner_TEXT:MESH:D030421)`, `bionlp_st_2013_pc_NER:I-Binding)`, `medmentions_full_ner:I-T029)`, `chemdner_TEXT:MESH:D001569)`, `genia_term_corpus_ner:B-ANDcell_typecell_type)`, `scai_chemical_ner:B-SUM)`, `chemdner_TEXT:MESH:D007656)`, `medmentions_full_ner:B-T082)`, `chemdner_TEXT:MESH:D009525)`, `medmentions_full_ner:B-T079)`, `bionlp_st_2013_cg_NER:B-Synthesis)`, `biorelex_ner:B-process)`, `bionlp_st_2013_ge_RE:Theme)`, `chemdner_TEXT:MESH:D012825)`, `chemdner_TEXT:MESH:D005462)`, `bionlp_st_2013_cg_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-CellCycle)`, `cellfinder_ner:I-CellLine)`, `bionlp_st_2013_gro_ner:I-DNABindingDomainOfProtein)`, `medmentions_st21pv_ner:B-T168)`, `genia_term_corpus_ner:B-body_part)`, `genia_term_corpus_ner:B-ANDprotein_family_or_groupprotein_family_or_group)`, `mlee_ner:B-Tissue)`, `mlee_NER:I-Localization)`, `medmentions_full_ner:B-T125)`, `bionlp_st_2013_cg_NER:B-Infection)`, `chebi_nactem_abstr_ann1_ner:I-Protein)`, `chemdner_TEXT:MESH:D009570)`, `medmentions_full_ner:I-T045)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivator)`, `verspoor_2013_ner:B-disease)`, `medmentions_full_ner:I-T056)`, `medmentions_full_ner:B-T050)`, `bionlp_st_2013_gro_ner:B-MolecularFunction)`, `medmentions_full_ner:B-T060)`, `bionlp_st_2013_gro_ner:B-Cell)`, `medmentions_full_ner:I-T060)`, `bionlp_st_2013_pc_NER:I-Gene_expression)`, `genia_term_corpus_ner:B-RNA_NA)`, `bionlp_st_2013_gro_ner:I-MessengerRNA)`, `medmentions_full_ner:I-T086)`, `an_em_RE:Part-of)`, `bionlp_st_2013_gro_NER:B-NegativeRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_gro_NER:I-Splicing)`, `bioinfer_RE:PPI)`, `bioscope_papers_ner:I-speculation)`, `bionlp_st_2013_gro_ner:B-HomeoBox)`, `medmentions_full_ner:B-T004)`, `chia_ner:I-Drug)`, `bionlp_st_2013_gro_ner:B-FusionOfGeneWithReporterGene)`, `genia_term_corpus_ner:I-cell_line)`, `chebi_nactem_abstr_ann1_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-ExpressionProfiling)`, `chemdner_TEXT:MESH:D004390)`, `medmentions_full_ner:B-T016)`, `bionlp_st_2013_cg_NER:B-Growth)`, `medmentions_full_ner:I-T170)`, `medmentions_full_ner:B-T093)`, `genia_term_corpus_ner:I-inorganic)`, `mlee_NER:B-Planned_process)`, `bionlp_st_2013_gro_RE:hasPart)`, `bionlp_st_2013_gro_ner:B-BasicDomain)`, `chemdner_TEXT:MESH:D050091)`, `medmentions_st21pv_ner:B-T037)`, `chemdner_TEXT:MESH:D011522)`, `bionlp_st_2013_ge_NER:B-Deacetylation)`, `chemdner_TEXT:MESH:D004008)`, `chemdner_TEXT:MESH:D013972)`, `bionlp_st_2013_gro_NER:B-SignalingPathway)`, `bionlp_st_2013_gro_ner:B-Promoter)`, `chemdner_TEXT:MESH:D012701)`, `an_em_COREF:None)`, `bionlp_st_2019_bb_RE:None)`, `mlee_NER:I-Positive_regulation)`, `bionlp_st_2013_gro_NER:I-Translation)`, `chemdner_TEXT:MESH:D013453)`, `genia_term_corpus_ner:I-ANDprotein_moleculeprotein_molecule)`, `chemdner_TEXT:MESH:D002746)`, `chebi_nactem_abstr_ann1_ner:O)`, `bionlp_st_2013_pc_ner:O)`, `mayosrs_sts:7)`, `bionlp_st_2013_cg_NER:B-Pathway)`, `verspoor_2013_ner:I-age)`, `biorelex_ner:I-peptide)`, `medmentions_full_ner:I-T096)`, `chebi_nactem_fullpaper_ner:I-Chemical_Structure)`, `chemdner_TEXT:MESH:D007211)`, `medmentions_full_ner:I-T018)`, `medmentions_full_ner:B-T201)`, `bionlp_st_2013_gro_NER:B-BindingOfTFToTFBindingSiteOfProtein)`, `medmentions_full_ner:B-T054)`, `ebm_pico_ner:I-Intervention_Pharmacological)`, `chemdner_TEXT:MESH:D010672)`, `chemdner_TEXT:MESH:D004492)`, `chemdner_TEXT:MESH:D008094)`, `chemdner_TEXT:MESH:D002227)`, `chemdner_TEXT:MESH:D009553)`, `bionlp_st_2013_gro_NER:I-ResponseProcess)`, `chemdner_TEXT:MESH:D006046)`, `ebm_pico_ner:B-Participant_Condition)`, `nlm_gene_ner:I-Gene)`, `bionlp_st_2019_bb_ner:I-Habitat)`, `bionlp_shared_task_2009_COREF:coref)`, `chemdner_TEXT:MESH:D005640)`, `mantra_gsc_en_emea_ner:B-PHYS)`, `mantra_gsc_en_patents_ner:B-DISO)`, `bionlp_st_2013_gro_ner:B-Heterochromatin)`, `bionlp_st_2013_gro_NER:I-CellCycle)`, `bionlp_st_2013_cg_NER:I-Cell_proliferation)`, `bionlp_st_2013_cg_ner:B-Simple_chemical)`, `genia_term_corpus_ner:I-cell_type)`, `chemdner_TEXT:MESH:D003553)`, `bionlp_st_2013_ge_RE:Theme2)`, `tmvar_v1_ner:B-ProteinMutation)`, `chemdner_TEXT:MESH:D012717)`, `chemdner_TEXT:MESH:D026121)`, `chemdner_TEXT:MESH:D008687)`, `bionlp_st_2013_gro_NER:I-TranscriptionTermination)`, `medmentions_full_ner:B-T028)`, `biorelex_ner:B-assay)`, `genia_term_corpus_ner:B-tissue)`, `chemdner_TEXT:MESH:D009173)`, `bionlp_st_2013_gro_ner:B-TranscriptionCoactivator)`, `genia_term_corpus_ner:B-amino_acid_monomer)`, `mantra_gsc_en_emea_ner:B-DEVI)`, `bionlp_st_2013_gro_NER:B-Growth)`, `chemdner_TEXT:MESH:D017374)`, `genia_term_corpus_ner:B-other_artificial_source)`, `medmentions_full_ner:B-T072)`, `bionlp_st_2013_gro_NER:B-CellGrowth)`, `bionlp_st_2013_gro_ner:I-DoubleStrandDNA)`, `chemdner_ner:O)`, `bionlp_shared_task_2009_NER:I-Localization)`, `bionlp_st_2013_gro_NER:B-RegulationOfPathway)`, `genia_term_corpus_ner:I-amino_acid_monomer)`, `bionlp_st_2013_gro_NER:I-SPhase)`, `an_em_ner:B-Organism_substance)`, `medmentions_full_ner:B-T052)`, `genia_term_corpus_ner:B-ANDprotein_subunitprotein_subunit)`, `medmentions_full_ner:B-T096)`, `chemdner_TEXT:MESH:D056831)`, `chemdner_TEXT:MESH:D010755)`, `pdr_NER:I-Cause_of_disease)`, `mlee_NER:B-Phosphorylation)`, `medmentions_full_ner:I-T064)`, `chemdner_TEXT:MESH:D005978)`, `mantra_gsc_en_medline_ner:I-PHEN)`, `bionlp_st_2013_cg_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_NER:B-Modification)`, `bionlp_st_2013_gro_ner:B-ProteinComplex)`, `bionlp_st_2013_gro_ner:B-DoubleStrandDNA)`, `medmentions_full_ner:B-T068)`, `medmentions_full_ner:I-T034)`, `bionlp_st_2011_epi_NER:B-Catalysis)`, `biosses_sts:0)`, `bionlp_st_2013_cg_ner:B-Organism_substance)`, `chemdner_TEXT:MESH:D055549)`, `bionlp_st_2013_cg_NER:B-Glycolysis)`, `chemdner_TEXT:MESH:D001761)`, `chemdner_TEXT:MESH:D011728)`, `bionlp_st_2013_gro_ner:B-Function)`, `medmentions_full_ner:I-T033)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfTranscriptionOfGene)`, `medmentions_full_ner:I-T053)`, `bionlp_st_2013_gro_ner:B-Protein)`, `genia_term_corpus_ner:I-ANDprotein_family_or_groupprotein_family_or_group)`, `bionlp_st_2013_gro_NER:I-CatabolicPathway)`, `biorelex_ner:I-chemical)`, `chemdner_TEXT:MESH:D013185)`, `biorelex_ner:I-RNA)`, `chemdner_TEXT:MESH:D009838)`, `medmentions_full_ner:I-T008)`, `chemdner_TEXT:MESH:D002104)`, `bionlp_st_2013_gro_NER:B-RNABiosynthesis)`, `verspoor_2013_ner:I-ethnicity)`, `bionlp_st_2013_gro_ner:I-SmallInterferingRNA)`, `chemdner_TEXT:MESH:D026023)`, `mlee_ner:O)`, `bionlp_st_2013_gro_NER:I-CellHomeostasis)`, `bionlp_st_2013_pc_NER:B-Pathway)`, `gnormplus_ner:I-DomainMotif)`, `bionlp_st_2013_gro_ner:I-OpenReadingFrame)`, `bionlp_st_2013_gro_NER:I-RegulationOfGeneExpression)`, `muchmore_en_ner:O)`, `chemdner_TEXT:MESH:D000911)`, `bionlp_st_2011_epi_NER:B-DNA_demethylation)`, `bionlp_st_2013_gro_ner:I-RuntLikeDomain)`, `chemdner_TEXT:MESH:D010748)`, `medmentions_full_ner:B-T008)`, `biorelex_ner:B-protein-RNA-complex)`, `bionlp_st_2013_cg_NER:I-Planned_process)`, `chemdner_TEXT:MESH:D014867)`, `mantra_gsc_en_patents_ner:I-LIVB)`, `bionlp_st_2013_gro_NER:I-Silencing)`, `chemdner_TEXT:MESH:D015306)`, `chemdner_TEXT:MESH:D001679)`, `bionlp_shared_task_2009_NER:I-Positive_regulation)`, `linnaeus_filtered_ner:O)`, `chia_RE:Has_multiplier)`, `medmentions_full_ner:B-T116)`, `bionlp_shared_task_2009_NER:B-Positive_regulation)`, `anat_em_ner:B-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D011137)`, `chemdner_TEXT:MESH:D048271)`, `chemdner_TEXT:MESH:D003975)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressorActivity)`, `bionlp_st_2011_id_ner:B-Protein)`, `bionlp_st_2013_gro_NER:I-Mutation)`, `chemdner_TEXT:MESH:D001572)`, `mantra_gsc_en_patents_ner:B-CHEM)`, `mantra_gsc_en_medline_ner:I-DEVI)`, `bionlp_st_2013_gro_ner:B-Enzyme)`, `medmentions_full_ner:B-T056)`, `mantra_gsc_en_patents_ner:B-OBJC)`, `medmentions_full_ner:B-T073)`, `anat_em_ner:I-Tissue)`, `chemdner_TEXT:MESH:D047310)`, `chia_ner:I-Scope)`, `ncbi_disease_ner:B-Modifier)`, `medmentions_st21pv_ner:B-T082)`, `medmentions_full_ner:I-T054)`, `genia_term_corpus_ner:I-carbohydrate)`, `bionlp_st_2013_cg_RE:Theme)`, `chemdner_TEXT:MESH:D009538)`, `chemdner_TEXT:MESH:D008691)`, `genia_term_corpus_ner:B-ANDprotein_substructureprotein_substructure)`, `bionlp_st_2013_cg_ner:I-Tissue)`, `chia_ner:B-Device)`, `chemdner_TEXT:MESH:D002784)`, `medmentions_full_ner:I-T007)`, `bionlp_st_2013_gro_ner:I-DNAFragment)`, `mlee_RE:ToLoc)`, `spl_adr_200db_train_ner:I-AdverseReaction)`, `bionlp_st_2013_cg_NER:B-Catabolism)`, `chemdner_TEXT:MESH:D013779)`, `bionlp_st_2013_pc_NER:B-Regulation)`, `bionlp_st_2013_gro_NER:I-Disease)`, `chia_ner:I-Condition)`, `chemdner_TEXT:MESH:D012370)`, `bionlp_st_2013_ge_NER:O)`, `bionlp_st_2013_pc_NER:B-Deubiquitination)`, `bionlp_st_2013_pc_NER:I-Translation)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfTranscriptionOfGene)`, `bionlp_st_2013_cg_NER:B-DNA_methylation)`, `bioscope_papers_ner:B-speculation)`, `chemdner_TEXT:MESH:D018130)`, `bionlp_st_2013_gro_ner:B-RNAPolymeraseII)`, `medmentions_st21pv_ner:B-T098)`, `bionlp_st_2013_gro_NER:B-Elongation)`, `bionlp_st_2013_pc_RE:Cause)`, `seth_corpus_ner:B-RS)`, `bionlp_st_2013_ge_RE:ToLoc)`, `chemdner_TEXT:MESH:D000538)`, `medmentions_full_ner:B-T192)`, `medmentions_full_ner:B-T061)`, `medmentions_full_ner:B-T032)`, `bionlp_st_2013_gro_NER:B-Transport)`, `medmentions_full_ner:I-T014)`, `chemdner_TEXT:MESH:D004137)`, `medmentions_full_ner:B-T101)`, `bionlp_st_2013_gro_NER:B-Transcription)`, `bionlp_st_2013_pc_NER:B-Transport)`, `medmentions_full_ner:I-T203)`, `ebm_pico_ner:I-Intervention_Control)`, `genia_term_corpus_ner:I-atom)`, `chemdner_TEXT:MESH:D014230)`, `osiris_ner:I-gene)`, `mantra_gsc_en_patents_ner:B-ANAT)`, `ncbi_disease_ner:I-SpecificDisease)`, `bionlp_st_2013_gro_NER:I-CellGrowth)`, `chemdner_TEXT:MESH:D001205)`, `chemdner_TEXT:MESH:D016627)`, `genia_term_corpus_ner:B-protein_subunit)`, `bionlp_st_2013_gro_ner:I-CellComponent)`, `medmentions_full_ner:B-T049)`, `scai_chemical_ner:O)`, `chemdner_TEXT:MESH:D010840)`, `chemdner_TEXT:MESH:D008694)`, `mantra_gsc_en_patents_ner:B-PHEN)`, `bionlp_st_2013_cg_RE:Cause)`, `chemdner_TEXT:MESH:D012293)`, `bionlp_st_2013_gro_NER:B-Homodimerization)`, `chemdner_TEXT:MESH:D008070)`, `chia_RE:OR)`, `bionlp_st_2013_cg_ner:I-Gene_or_gene_product)`, `verspoor_2013_ner:I-disease)`, `muchmore_en_ner:B-umlsterm)`, `chemdner_TEXT:MESH:D011794)`, `medmentions_full_ner:I-T002)`, `chemdner_TEXT:MESH:D007649)`, `genia_term_corpus_ner:B-AND_NOTcell_typecell_type)`, `medmentions_full_ner:I-T023)`, `chemprot_RE:CPR:1)`, `chemdner_TEXT:MESH:D001786)`, `bionlp_st_2013_gro_ner:B-HomeoboxTF)`, `bionlp_st_2013_cg_ner:I-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-Attenuator)`, `bionlp_st_2019_bb_ner:B-Habitat)`, `chemdner_TEXT:MESH:D017931)`, `medmentions_full_ner:B-T047)`, `chemdner_TEXT:MESH:D006886)`, `genia_term_corpus_ner:I-)`, `medmentions_full_ner:B-T039)`, `chemdner_TEXT:MESH:D004220)`, `bionlp_st_2013_pc_RE:FromLoc)`, `nlm_gene_ner:I-GENERIF)`, `bionlp_st_2013_ge_NER:I-Protein_modification)`, `genia_term_corpus_ner:B-RNA_molecule)`, `chemdner_TEXT:MESH:D006854)`, `chemdner_TEXT:MESH:D006493)`, `chia_ner:B-Qualifier)`, `medmentions_full_ner:I-T013)`, `ehr_rel_sts:8)`, `an_em_RE:frag)`, `genia_term_corpus_ner:I-DNA_substructure)`, `chemdner_TEXT:MESH:D063065)`, `genia_term_corpus_ner:I-ANDprotein_complexprotein_complex)`, `bionlp_st_2013_pc_NER:I-Dissociation)`, `medmentions_full_ner:I-T004)`, `bionlp_st_2013_cg_ner:B-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D010069)`, `bionlp_st_2013_gro_NER:I-Homodimerization)`, `chemdner_TEXT:MESH:D006147)`, `medmentions_full_ner:I-T041)`, `bionlp_st_2011_id_NER:B-Regulation)`, `bionlp_st_2013_gro_ner:O)`, `chemdner_TEXT:MESH:D008623)`, `bionlp_st_2013_ge_ner:I-Protein)`, `scai_chemical_ner:I-TRIVIAL)`, `an_em_ner:B-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-BindingAssay)`, `bionlp_st_2013_gro_ner:I-HMG)`, `anat_em_ner:I-Anatomical_system)`, `chemdner_TEXT:MESH:D015034)`, `mlee_NER:B-Catabolism)`, `mantra_gsc_en_medline_ner:B-LIVB)`, `ddi_corpus_ner:I-BRAND)`, `chia_ner:I-Multiplier)`, `bionlp_st_2013_gro_ner:I-SequenceHomologyAnalysis)`, `seth_corpus_RE:None)`, `bionlp_st_2013_cg_NER:B-Binding)`, `bioscope_papers_ner:I-negation)`, `chemdner_TEXT:MESH:D008741)`, `chemdner_TEXT:MESH:D052998)`, `chemdner_TEXT:MESH:D005227)`, `chemdner_TEXT:MESH:D009828)`, `spl_adr_200db_train_ner:B-Animal)`, `chemdner_TEXT:MESH:D010616)`, `bionlp_st_2013_gro_ner:I-ProteinComplex)`, `pico_extraction_ner:B-outcome)`, `mlee_NER:B-Negative_regulation)`, `chemdner_TEXT:MESH:D007093)`, `bionlp_st_2013_gro_NER:I-RNAProcessing)`, `bionlp_st_2013_gro_RE:hasAgent2)`, `biorelex_ner:I-reagent)`, `medmentions_st21pv_ner:I-T074)`, `bionlp_st_2013_gro_NER:B-BindingOfMolecularEntity)`, `chemdner_TEXT:MESH:D008911)`, `medmentions_full_ner:B-T033)`, `genia_term_corpus_ner:B-ANDprotein_complexprotein_complex)`, `medmentions_full_ner:I-T100)`, `chemdner_TEXT:MESH:D019259)`, `genia_term_corpus_ner:I-BUT_NOTother_nameother_name)`, `geokhoj_v1_TEXT:1)`, `bionlp_st_2013_cg_RE:Site)`, `medmentions_full_ner:B-T184)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelixTF)`, `bionlp_st_2013_cg_ner:I-Protein_domain_or_region)`, `genia_term_corpus_ner:I-other_organic_compound)`, `chemdner_TEXT:MESH:D010793)`, `bionlp_st_2011_id_NER:B-Phosphorylation)`, `chemdner_TEXT:MESH:D002482)`, `bionlp_st_2013_cg_NER:B-Breakdown)`, `biorelex_ner:I-disease)`, `genia_term_corpus_ner:B-DNA_substructure)`, `bionlp_st_2013_gro_RE:hasPatient)`, `medmentions_full_ner:B-T127)`, `medmentions_full_ner:I-T185)`, `bionlp_shared_task_2009_RE:AtLoc)`, `medmentions_full_ner:I-T201)`, `chemdner_TEXT:MESH:D005290)`, `mlee_NER:I-Breakdown)`, `medmentions_full_ner:I-T063)`, `chemdner_TEXT:MESH:D017964)`, `an_em_ner:I-Tissue)`, `mlee_ner:I-Organism)`, `mantra_gsc_en_emea_ner:I-CHEM)`, `bionlp_st_2013_cg_ner:B-Anatomical_system)`, `genia_term_corpus_ner:B-ORDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_pc_NER:B-Degradation)`, `chemprot_RE:CPR:0)`, `genia_term_corpus_ner:B-inorganic)`, `chemdner_TEXT:MESH:D005466)`, `chia_ner:O)`, `medmentions_full_ner:B-T078)`, `mlee_NER:B-Growth)`, `mantra_gsc_en_emea_ner:B-PHEN)`, `chemdner_TEXT:MESH:D012545)`, `bionlp_st_2013_gro_NER:B-G1Phase)`, `chemdner_TEXT:MESH:D009841)`, `bionlp_st_2013_gro_ner:B-Chromatin)`, `bionlp_st_2011_epi_RE:Site)`, `medmentions_full_ner:B-T066)`, `genetaggold_ner:O)`, `bionlp_st_2013_cg_NER:I-Gene_expression)`, `medmentions_st21pv_ner:B-T092)`, `chemprot_RE:CPR:8)`, `bionlp_st_2013_cg_RE:Instrument)`, `nlm_gene_ner:I-Domain)`, `chemdner_TEXT:MESH:D006151)`, `bionlp_st_2011_id_ner:I-Protein)`, `mlee_NER:B-Synthesis)`, `bionlp_st_2013_gro_NER:B-CellMotility)`, `scai_chemical_ner:B-MODIFIER)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfTranscription)`, `osiris_ner:O)`, `mlee_NER:B-Acetylation)`, `medmentions_st21pv_ner:B-T062)`, `chemdner_TEXT:MESH:D017705)`, `bionlp_st_2013_gro_NER:I-TranscriptionOfGene)`, `genia_term_corpus_ner:I-protein_complex)`, `chemprot_RE:CPR:10)`, `medmentions_full_ner:B-T102)`, `medmentions_full_ner:I-T171)`, `chia_ner:B-Reference_point)`, `medmentions_full_ner:B-T015)`, `bionlp_st_2013_gro_ner:I-RNAPolymerase)`, `chebi_nactem_abstr_ann1_ner:B-Metabolite)`, `bionlp_st_2013_gro_NER:I-CellDifferentiation)`, `chemdner_TEXT:MESH:D006861)`, `pubmed_qa_labeled_fold0_CLF:maybe)`, `bionlp_st_2013_gro_ner:I-Sequence)`, `mlee_NER:B-Transcription)`, `bc5cdr_ner:B-Chemical)`, `chemdner_TEXT:MESH:D000072317)`, `bionlp_st_2013_gro_NER:B-Producing)`, `genia_term_corpus_ner:B-ANDprotein_moleculeprotein_molecule)`, `bionlp_st_2011_id_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-MolecularInteraction)`, `chemdner_TEXT:MESH:D014639)`, `bionlp_st_2013_gro_NER:I-Increase)`, `mlee_NER:I-Translation)`, `medmentions_full_ner:B-T087)`, `bioscope_abstracts_ner:B-speculation)`, `ebm_pico_ner:B-Outcome_Adverse-effects)`, `mantra_gsc_en_medline_ner:B-PHYS)`, `bionlp_st_2013_gro_ner:I-Lipid)`, `bionlp_st_2011_ge_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D005278)`, `bionlp_shared_task_2009_NER:B-Phosphorylation)`, `mlee_NER:I-Gene_expression)`, `bionlp_st_2011_epi_NER:I-Deacetylation)`, `chemdner_TEXT:MESH:D002110)`, `medmentions_full_ner:I-T121)`, `bionlp_st_2011_epi_ner:I-Entity)`, `bionlp_st_2019_bb_RE:Lives_In)`, `chemdner_TEXT:MESH:D001710)`, `anat_em_ner:B-Cancer)`, `bionlp_st_2013_gro_NER:B-RNASplicing)`, `mantra_gsc_en_medline_ner:I-ANAT)`, `chemdner_TEXT:MESH:D024508)`, `chemdner_TEXT:MESH:D000537)`, `mantra_gsc_en_medline_ner:I-DISO)`, `bionlp_st_2013_gro_ner:I-Prokaryote)`, `bionlp_st_2013_gro_ner:I-Chromatin)`, `bionlp_st_2013_gro_ner:B-Nucleotide)`, `linnaeus_ner:I-species)`, `verspoor_2013_ner:I-body-part)`, `bionlp_st_2013_gro_ner:B-DNAFragment)`, `bionlp_st_2013_gro_ner:B-PositiveTranscriptionRegulator)`, `medmentions_full_ner:I-T049)`, `bionlp_st_2011_ge_ner:B-Entity)`, `medmentions_full_ner:I-T017)`, `bionlp_st_2013_gro_NER:B-TranscriptionOfGene)`, `chemdner_TEXT:MESH:D009947)`, `mlee_NER:B-Dephosphorylation)`, `bionlp_st_2013_gro_NER:B-GeneSilencing)`, `pdr_RE:None)`, `scai_chemical_ner:I-TRIVIALVAR)`, `bionlp_st_2011_epi_NER:O)`, `bionlp_st_2013_cg_ner:I-Cell)`, `sciq_SEQ:None)`, `chemdner_TEXT:MESH:D019913)`, `mlee_RE:Participant)`, `chia_ner:I-Negation)`, `chemdner_TEXT:MESH:D014801)`, `chemdner_TEXT:MESH:D058846)`, `chemdner_TEXT:MESH:D011809)`, `bionlp_st_2011_epi_ner:O)`, `bionlp_st_2013_cg_NER:I-Metastasis)`, `chemdner_TEXT:MESH:D012643)`, `an_em_ner:I-Cell)`, `bionlp_st_2013_gro_ner:I-CatalyticActivity)`, `anat_em_ner:B-Anatomical_system)`, `mlee_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_ner:I-ChromosomalDNA)`, `anat_em_ner:B-Cell)`, `chemdner_TEXT:MESH:D000242)`, `chemdner_TEXT:MESH:D017641)`, `bioscope_abstracts_ner:I-negation)`, `medmentions_st21pv_ner:B-T058)`, `chemdner_TEXT:MESH:D008744)`, `bionlp_st_2013_gro_ner:B-UpstreamRegulatorySequence)`, `chemdner_TEXT:MESH:D008012)`, `medmentions_full_ner:B-T013)`, `bionlp_st_2011_epi_NER:B-Glycosylation)`, `chemdner_TEXT:MESH:D052999)`, `chemdner_TEXT:MESH:D002329)`, `ebm_pico_ner:I-Intervention_Physical)`, `bionlp_st_2013_pc_ner:B-Complex)`, `medmentions_st21pv_ner:I-T005)`, `chemdner_TEXT:MESH:D064704)`, `bionlp_st_2013_gro_ner:I-ZincCoordinatingDomainTF)`, `bionlp_st_2013_pc_ner:I-Cellular_component)`, `genia_term_corpus_ner:B-ANDDNA_domain_or_regionDNA_domain_or_region)`, `bionlp_st_2013_gro_ner:B-Chromosome)`, `chemdner_TEXT:MESH:D007546)`, `bionlp_st_2013_gro_NER:I-PositiveRegulationOfGeneExpression)`, `medmentions_full_ner:I-T010)`, `pdr_NER:B-Treatment_of_disease)`, `medmentions_full_ner:B-T081)`, `bionlp_st_2011_epi_NER:B-Demethylation)`, `chemdner_TEXT:MESH:D013261)`, `bionlp_st_2013_gro_ner:I-RibosomalRNA)`, `verspoor_2013_ner:O)`, `bionlp_st_2013_gro_NER:B-DevelopmentalProcess)`, `chemdner_TEXT:MESH:D009270)`, `medmentions_full_ner:I-T130)`, `bionlp_st_2013_cg_ner:B-Organism)`, `medmentions_full_ner:B-T014)`, `chemdner_TEXT:MESH:D003374)`, `chemdner_TEXT:MESH:D011078)`, `cellfinder_ner:B-GeneProtein)`, `mayosrs_sts:6)`, `chemdner_TEXT:MESH:D005576)`, `bionlp_st_2013_ge_RE:Cause)`, `an_em_RE:None)`, `sciq_SEQ:answer)`, `bionlp_st_2013_cg_NER:B-Dissociation)`, `mlee_RE:frag)`, `bionlp_st_2013_pc_COREF:coref)`, `chemdner_TEXT:MESH:D008469)`, `ncbi_disease_ner:O)`, `bionlp_st_2011_epi_ner:I-Protein)`, `chemdner_TEXT:MESH:D011140)`, `chemdner_TEXT:MESH:D020001)`, `bionlp_st_2013_gro_ner:I-ThreeDimensionalMolecularStructure)`, `bionlp_st_2013_cg_ner:B-Cancer)`, `genia_term_corpus_ner:B-BUT_NOTother_nameother_name)`, `chemdner_TEXT:MESH:D006862)`, `medmentions_full_ner:B-T104)`, `bionlp_st_2011_epi_RE:Theme)`, `cellfinder_ner:B-Anatomy)`, `chemdner_TEXT:MESH:D010545)`, `biorelex_ner:B-RNA-family)`, `pico_extraction_ner:I-outcome)`, `mantra_gsc_en_patents_ner:I-PHYS)`, `bionlp_st_2013_pc_NER:I-Transcription)`, `bionlp_shared_task_2009_RE:Cause)`, `bionlp_st_2013_gro_ner:B-Vitamin)`, `bionlp_shared_task_2009_RE:CSite)`, `bionlp_st_2011_ge_ner:I-Protein)`, `mlee_COREF:coref)`, `bionlp_st_2013_gro_ner:I-ForkheadWingedHelix)`, `bioinfer_ner:I-Gene)`, `bionlp_st_2013_gro_ner:B-TranscriptionActivatorActivity)`, `chemdner_TEXT:MESH:D054439)`, `chemdner_TEXT:MESH:D011621)`, `ddi_corpus_ner:I-DRUG_N)`, `chemdner_TEXT:MESH:D019308)`, `bionlp_st_2013_gro_ner:I-Locus)`, `bionlp_shared_task_2009_RE:ToLoc)`, `bionlp_st_2013_cg_NER:B-Development)`, `bionlp_st_2013_gro_NER:I-CellularDevelopmentalProcess)`, `bionlp_st_2013_gro_ner:B-Eukaryote)`, `bionlp_st_2013_ge_NER:B-Negative_regulation)`, `seth_corpus_ner:I-SNP)`, `hprd50_ner:B-protein)`, `bionlp_st_2013_gro_NER:B-BindingOfProtein)`, `mlee_NER:I-Negative_regulation)`, `bionlp_st_2011_ge_NER:B-Protein_catabolism)`, `bionlp_st_2013_pc_ner:B-Cellular_component)`, `bionlp_st_2011_id_ner:I-Chemical)`, `chemdner_TEXT:MESH:D013831)`, `biorelex_COREF:None)`, `chemdner_TEXT:MESH:D005609)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactor)`, `mlee_NER:B-Regulation)`, `chemdner_TEXT:MESH:D059808)`, `bionlp_st_2013_gro_ner:I-bHLHTF)`, `chemdner_TEXT:MESH:D010121)`, `chemdner_TEXT:MESH:D017608)`, `chemdner_TEXT:MESH:D007455)`, `mlee_NER:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorComplex)`, `biorelex_ner:B-disease)`, `bionlp_st_2013_cg_NER:B-Cell_differentiation)`, `medmentions_st21pv_ner:I-T092)`, `chemdner_TEXT:MESH:D007477)`, `medmentions_full_ner:B-T168)`, `pcr_ner:I-Chemical)`, `chemdner_TEXT:MESH:D009636)`, `chemdner_TEXT:MESH:D008051)`, `bionlp_shared_task_2009_NER:I-Gene_expression)`, `chemprot_ner:I-GENE-N)`, `biorelex_ner:B-reagent)`, `chemdner_TEXT:MESH:D020123)`, `nlmchem_ner:O)`, `ebm_pico_ner:I-Outcome_Mental)`, `chemdner_TEXT:MESH:D004040)`, `chemdner_TEXT:MESH:D000450)`, `chebi_nactem_fullpaper_ner:O)`, `biorelex_ner:B-protein-isoform)`, `chemdner_TEXT:MESH:D001564)`, `medmentions_full_ner:I-T095)`, `mlee_NER:I-Remodeling)`, `bionlp_st_2013_cg_RE:None)`, `biorelex_ner:O)`, `seth_corpus_RE:AssociatedTo)`, `bioscope_abstracts_ner:B-negation)`, `chebi_nactem_fullpaper_ner:I-Metabolite)`, `bionlp_st_2013_gro_ner:I-TranscriptionRepressorActivity)`, `bionlp_st_2013_cg_NER:B-Transcription)`, `bionlp_st_2011_ge_ner:B-Protein)`, `bionlp_st_2013_ge_ner:B-Protein)`, `bionlp_st_2013_gro_ner:I-Tissue)`, `chemdner_TEXT:MESH:D044005)`, `genia_term_corpus_ner:I-protein_substructure)`, `bionlp_st_2013_gro_ner:I-TranslationFactor)`, `minimayosrs_sts:5)`, `chemdner_TEXT:MESH:D012834)`, `ncbi_disease_ner:I-Modifier)`, `mlee_NER:B-Death)`, `medmentions_full_ner:B-T196)`, `bio_sim_verb_sts:4)`, `bionlp_st_2013_gro_NER:B-CellHomeostasis)`, `chemdner_TEXT:MESH:D006001)`, `bionlp_st_2013_gro_RE:encodes)`, `biorelex_ner:B-fusion-protein)`, `mlee_COREF:None)`, `chemdner_TEXT:MESH:D001623)`, `chemdner_TEXT:MESH:D000812)`, `medmentions_full_ner:B-T046)`, `bionlp_shared_task_2009_NER:O)`, `chemdner_TEXT:MESH:D000735)`, `gnormplus_ner:O)`, `chemdner_TEXT:MESH:D014635)`, `bionlp_st_2013_gro_NER:B-Mitosis)`, `chemdner_TEXT:MESH:D003847)`, `chemdner_TEXT:MESH:D002809)`, `medmentions_full_ner:I-T116)`, `chemdner_TEXT:MESH:D060406)`, `chemprot_ner:B-CHEMICAL)`, `chemdner_TEXT:MESH:D016642)`, `bionlp_st_2013_cg_NER:B-Phosphorylation)`, `an_em_ner:B-Organ)`, `chemdner_TEXT:MESH:D013431)`, `bionlp_shared_task_2009_RE:None)`, `medmentions_full_ner:B-T041)`, `mlee_ner:I-Tissue)`, `chemdner_TEXT:MESH:D023303)`, `ebm_pico_ner:I-Participant_Condition)`, `bionlp_st_2013_gro_ner:I-TATAbox)`, `bionlp_st_2013_gro_ner:I-bZIP)`, `bionlp_st_2011_epi_RE:Sidechain)`, `bionlp_st_2013_gro_ner:B-LivingEntity)`, `mantra_gsc_en_medline_ner:B-CHEM)`, `chemdner_TEXT:MESH:D007659)`, `medmentions_full_ner:I-T085)`, `bionlp_st_2013_cg_ner:I-Organism_substance)`, `medmentions_full_ner:B-T067)`, `chemdner_TEXT:MESH:D057846)`, `bionlp_st_2013_gro_NER:I-SignalingPathway)`, `bc5cdr_ner:I-Chemical)`, `nlm_gene_ner:I-STARGENE)`, `medmentions_full_ner:B-T090)`, `medmentions_full_ner:I-T037)`, `medmentions_full_ner:B-T037)`, `minimayosrs_sts:6)`, `medmentions_full_ner:I-T020)`, `chebi_nactem_fullpaper_ner:B-Species)`, `mirna_ner:O)`, `bionlp_st_2011_id_RE:Participant)`, `bionlp_st_2013_ge_NER:B-Binding)`, `ddi_corpus_ner:B-DRUG)`, `medmentions_full_ner:I-T078)`, `chemdner_TEXT:MESH:D012965)`, `bionlp_st_2013_cg_ner:I-Organ)`, `bionlp_st_2011_id_NER:B-Binding)`, `chemdner_TEXT:MESH:D006571)`, `mayosrs_sts:4)`, `chemdner_TEXT:MESH:D026422)`, `genia_term_corpus_ner:I-RNA_NA)`, `bionlp_st_2011_epi_RE:None)`, `chemdner_TEXT:MESH:D012265)`, `medmentions_full_ner:B-T195)`, `chemdner_TEXT:MESH:D014443)`, `bionlp_st_2013_gro_ner:I-OrganicChemical)`, `ebm_pico_ner:B-Participant_Age)`, `chemdner_TEXT:MESH:D009584)`, `chemdner_TEXT:MESH:D010862)`, `verspoor_2013_ner:B-Concepts_Ideas)`, `bionlp_st_2013_gro_NER:B-ActivationOfProcess)`, `chemdner_TEXT:MESH:D010118)`, `biorelex_COREF:coref)`, `bionlp_st_2013_gro_ner:I-Enzyme)`, `chemdner_TEXT:MESH:D012530)`, `chemdner_TEXT:MESH:D002351)`, `biorelex_ner:B-gene)`, `chemdner_TEXT:MESH:D013213)`, `medmentions_full_ner:B-T103)`, `chemdner_TEXT:MESH:D010091)`, `ebm_pico_ner:B-Participant_Sex)`, `bionlp_st_2013_gro_ner:B-ComplexOfProteinAndDNA)`, `bionlp_st_2013_gro_ner:B-Phenotype)`, `chemdner_TEXT:MESH:D019791)`, `chemdner_TEXT:MESH:D014280)`, `chemdner_TEXT:MESH:D011094)`, `chia_RE:None)`, `biorelex_RE:None)`, `chemdner_TEXT:MESH:D005230)`, `verspoor_2013_ner:B-cohort-patient)`, `chemdner_TEXT:MESH:D013645)`, `bionlp_st_2013_gro_ner:B-SecondMessenger)`, `mlee_ner:B-Cellular_component)`, `bionlp_shared_task_2009_NER:I-Phosphorylation)`, `mlee_ner:B-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D017275)`, `chemdner_TEXT:MESH:D007053)`, `bionlp_st_2013_ge_RE:Site)`, `genia_term_corpus_ner:O)`, `chemprot_RE:CPR:6)`, `chemdner_TEXT:MESH:D006859)`, `genia_term_corpus_ner:I-other_name)`, `medmentions_full_ner:I-T042)`, `pdr_ner:O)`, `medmentions_full_ner:I-T057)`, `bionlp_st_2013_pc_RE:Product)`, `verspoor_2013_ner:B-size)`, `bionlp_st_2013_pc_NER:B-Acetylation)`, `medmentions_st21pv_ner:B-T017)`, `chia_ner:B-Temporal)`, `chemdner_TEXT:MESH:D003404)`, `bionlp_st_2013_gro_RE:None)`, `bionlp_shared_task_2009_NER:B-Gene_expression)`, `mqp_sts:3)`, `bionlp_st_2013_gro_ner:B-Chemical)`, `chemdner_TEXT:MESH:D013754)`, `mantra_gsc_en_medline_ner:B-GEOG)`, `mirna_ner:B-Specific_miRNAs)`, `chemdner_TEXT:MESH:D012492)`, `medmentions_full_ner:B-T190)`, `bionlp_st_2013_cg_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:B-RNA)`, `chemdner_TEXT:MESH:D011743)`, `chemdner_TEXT:MESH:D010795)`, `bionlp_st_2013_gro_NER:I-PositiveRegulation)`, `chemdner_TEXT:MESH:D002241)`, `medmentions_full_ner:B-T038)`, `bionlp_st_2013_gro_RE:hasAgent)`, `mlee_ner:B-Organism)`, `medmentions_full_ner:I-T168)`, `bioscope_abstracts_ner:O)`, `chemdner_TEXT:MESH:D002599)`, `bionlp_st_2013_pc_ner:I-Simple_chemical)`, `medmentions_full_ner:I-T066)`, `chemdner_TEXT:MESH:D019695)`, `bionlp_st_2013_ge_NER:I-Transcription)`, `mantra_gsc_en_emea_ner:B-DISO)`, `bionlp_st_2013_gro_NER:B-CellDeath)`, `medmentions_st21pv_ner:I-T031)`, `chemdner_TEXT:MESH:D004317)`, `bionlp_st_2013_gro_ner:B-TATAbox)`, `chemdner_TEXT:MESH:D052203)`, `bionlp_st_2013_gro_NER:B-CellFateDetermination)`, `medmentions_st21pv_ner:I-T022)`, `bionlp_st_2013_ge_NER:B-Protein_catabolism)`, `bionlp_st_2011_epi_NER:I-Catalysis)`, `verspoor_2013_ner:I-cohort-patient)`, `chemdner_TEXT:MESH:D010100)`, `an_em_ner:I-Developing_anatomical_structure)`, `chemdner_TEXT:MESH:D045162)`, `chia_RE:Has_qualifier)`, `verspoor_2013_RE:has)`, `chemdner_TEXT:MESH:D021382)`, `bionlp_st_2013_ge_NER:B-Acetylation)`, `medmentions_full_ner:I-T079)`, `bionlp_st_2013_gro_NER:B-Maintenance)`, `biorelex_ner:I-protein-domain)`, `chebi_nactem_abstr_ann1_ner:I-Chemical)`, `bioscope_papers_ner:O)`, `chia_RE:Has_scope)`, `bc5cdr_ner:B-Disease)`, `mlee_ner:I-Cellular_component)`, `medmentions_full_ner:I-T195)`, `spl_adr_200db_train_ner:B-AdverseReaction)`, `bionlp_st_2013_gro_ner:I-Promoter)`, `medmentions_full_ner:B-T040)`, `chemdner_TEXT:MESH:D005960)`, `chemdner_TEXT:MESH:D004164)`, `chemdner_TEXT:MESH:D015032)`, `chemdner_TEXT:MESH:D014255)`, `ebm_pico_ner:B-Outcome_Pain)`, `bionlp_st_2013_gro_ner:I-UpstreamRegulatorySequence)`, `bionlp_st_2013_pc_NER:I-Positive_regulation)`, `bionlp_st_2013_cg_NER:I-Regulation)`, `chemdner_TEXT:MESH:D001151)`, `medmentions_full_ner:I-T077)`, `chemdner_TEXT:MESH:D000081)`, `bionlp_st_2013_gro_NER:B-Stabilization)`, `mayosrs_sts:1)`, `biorelex_ner:B-mutation)`, `chemdner_TEXT:MESH:D000241)`, `chemdner_TEXT:MESH:D007930)`, `bionlp_st_2013_gro_NER:B-MetabolicPathway)`, `chemdner_TEXT:MESH:D013629)`, `chemdner_TEXT:MESH:D016202)`, `tmvar_v1_ner:I-DNAMutation)`, `chemdner_TEXT:MESH:D012502)`, `chemdner_TEXT:MESH:D044945)`, `bionlp_st_2013_cg_ner:I-Cellular_component)`, `mlee_ner:B-Developing_anatomical_structure)`, `bionlp_st_2013_gro_ner:I-AP2EREBPRelatedDomain)`, `chemdner_TEXT:MESH:D002338)`, `mayosrs_sts:5)`, `bionlp_st_2013_gro_ner:B-Intron)`, `genia_term_corpus_ner:I-DNA_domain_or_region)`, `anat_em_ner:I-Immaterial_anatomical_entity)`, `bionlp_st_2013_gro_ner:B-MutatedProtein)`, `ebm_pico_ner:I-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-ProteinCodingRegion)`, `chemdner_TEXT:MESH:D005047)`, `chia_ner:B-Mood)`, `medmentions_st21pv_ner:O)`, `cellfinder_ner:I-Species)`, `bionlp_st_2013_gro_ner:I-InorganicChemical)`, `bionlp_st_2011_id_ner:B-Entity)`, `bionlp_st_2013_cg_NER:I-Catabolism)`, `an_em_ner:I-Cellular_component)`, `medmentions_full_ner:B-T021)`, `bionlp_st_2013_gro_NER:B-Heterodimerization)`, `chemdner_TEXT:MESH:D008315)`, `medmentions_st21pv_ner:I-T170)`, `chemdner_TEXT:MESH:D050112)`, `chia_RE:Subsumes)`, `medmentions_full_ner:I-T099)`, `bionlp_st_2013_gro_ner:I-Protein)`, `chemdner_TEXT:MESH:D047071)`, `bionlp_st_2013_gro_ner:B-TranscriptionFactorActivity)`, `mlee_ner:B-Organism_subdivision)`, `chemdner_TEXT:MESH:D016559)`, `medmentions_full_ner:B-T129)`, `genia_term_corpus_ner:I-protein_molecule)`, `mlee_ner:B-Drug_or_compound)`, `bionlp_st_2013_gro_NER:B-Silencing)`, `bionlp_st_2013_gro_ner:I-MolecularStructure)`, `genia_term_corpus_ner:B-nucleotide)`, `chemdner_TEXT:MESH:D003042)`, `mantra_gsc_en_emea_ner:B-ANAT)`, `chemdner_TEXT:MESH:D006690)`, `genia_term_corpus_ner:I-ANDcell_linecell_line)`, `chemdner_TEXT:MESH:D005473)`, `mantra_gsc_en_medline_ner:I-PHYS)`, `bionlp_st_2013_cg_NER:B-Blood_vessel_development)`, `bionlp_st_2013_gro_ner:B-BetaScaffoldDomain_WithMinorGrooveContacts)`, `chemdner_TEXT:MESH:D001549)`, `chia_ner:B-Measurement)`, `bionlp_st_2011_id_ner:B-Regulon-operon)`, `bionlp_st_2013_cg_NER:B-Acetylation)`, `pdr_ner:B-Plant)`, `mlee_NER:B-Development)`, `linnaeus_filtered_ner:B-species)`, `bionlp_st_2013_pc_RE:AtLoc)`, `medmentions_full_ner:I-T192)`, `bionlp_st_2013_gro_ner:B-BindingSiteOfProtein)`, `bionlp_st_2013_ge_NER:B-Ubiquitination)`, `bionlp_st_2013_gro_ner:I-ProteinCodingDNARegion)`, `chemdner_TEXT:MESH:D009647)`, `bionlp_st_2013_gro_ner:I-Ligand)`, `bionlp_st_2011_id_ner:O)`, `bionlp_st_2013_gro_NER:I-RNASplicing)`, `bionlp_st_2013_gro_ner:I-ComplexOfProteinAndRNA)`, `bionlp_st_2011_id_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D007501)`, `ehr_rel_sts:5)`, `bionlp_st_2013_gro_ner:B-TranscriptionRegulator)`, `medmentions_full_ner:B-T089)`, `bionlp_st_2011_epi_NER:I-DNA_demethylation)`, `mirna_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-TranscriptionRegulator)`, `bionlp_st_2013_gro_NER:B-ProteinBiosynthesis)`, `scai_chemical_ner:B-ABBREVIATION)`, `bionlp_st_2013_gro_ner:I-Virus)`, `bionlp_st_2011_ge_NER:O)`, `medmentions_full_ner:B-T203)`, `bionlp_st_2013_cg_NER:I-Mutation)`, `bionlp_st_2013_gro_ner:B-ThreeDimensionalMolecularStructure)`, `genetaggold_ner:I-NEWGENE)`, `chemdner_TEXT:MESH:D010705)`, `chia_ner:I-Mood)`, `medmentions_full_ner:I-T068)`, `minimayosrs_sts:4)`, `medmentions_full_ner:I-T097)`, `bionlp_st_2013_gro_ner:I-BetaScaffoldDomain_WithMinorGrooveContacts)`, `mantra_gsc_en_emea_ner:I-PHYS)`, `medmentions_full_ner:I-T104)`, `bio_sim_verb_sts:5)`, `chebi_nactem_abstr_ann1_ner:B-Biological_Activity)`, `bionlp_st_2013_gro_NER:B-IntraCellularProcess)`, `mantra_gsc_en_emea_ner:I-PHEN)`, `mlee_ner:B-Cell)`, `chemdner_TEXT:MESH:D045784)`, `bionlp_st_2013_gro_ner:I-Vitamin)`, `chemdner_TEXT:MESH:D010416)`, `bionlp_st_2013_gro_ner:B-FusionGene)`, `bionlp_st_2013_gro_ner:I-FusionProtein)`, `mlee_NER:B-Remodeling)`, `minimayosrs_sts:8)`, `bionlp_st_2013_gro_ner:B-Enhancer)`, `mantra_gsc_en_emea_ner:O)`, `bionlp_st_2013_gro_ner:B-OpenReadingFrame)`, `bionlp_st_2013_pc_COREF:None)`, `medmentions_full_ner:I-T123)`, `bionlp_st_2013_gro_NER:I-RegulatoryProcess)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfGeneExpression)`, `nlm_gene_ner:B-Domain)`, `bionlp_st_2013_pc_NER:B-Methylation)`, `medmentions_full_ner:B-T057)`, `chemdner_TEXT:MESH:D010226)`, `bionlp_st_2013_gro_ner:B-GeneProduct)`, `ebm_pico_ner:I-Outcome_Other)`, `chemdner_TEXT:MESH:D005223)`, `pdr_RE:Theme)`, `bionlp_shared_task_2009_NER:B-Protein_catabolism)`, `chemdner_TEXT:MESH:D019344)`, `gnormplus_ner:I-FamilyName)`, `verspoor_2013_ner:B-gender)`, `bionlp_st_2013_gro_NER:B-TranscriptionInitiation)`, `spl_adr_200db_train_ner:B-Severity)`, `medmentions_st21pv_ner:B-T097)`, `anat_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_NER:I-RNAMetabolism)`, `bioinfer_ner:I-Protein_complex)`, `anat_em_ner:I-Cell)`, `bionlp_st_2013_gro_ner:B-ProteinDomain)`, `bionlp_st_2013_gro_ner:I-PrimaryStructure)`, `genia_term_corpus_ner:I-other_artificial_source)`, `chemdner_TEXT:MESH:D010098)`, `bionlp_st_2013_gro_ner:I-Enhancer)`, `bionlp_st_2013_gro_ner:I-PositiveTranscriptionRegulator)`, `chemdner_TEXT:MESH:D004051)`, `chemdner_TEXT:MESH:D013853)`, `chebi_nactem_fullpaper_ner:B-Metabolite)`, `diann_iber_eval_en_ner:B-Disability)`, `biorelex_ner:B-peptide)`, `medmentions_full_ner:B-T048)`, `bionlp_st_2013_gro_ner:I-Function)`, `genia_term_corpus_ner:I-DNA_NA)`, `mlee_ner:I-Anatomical_system)`, `bioinfer_ner:B-Individual_protein)`, `verspoor_2013_ner:I-Physiology)`, `genia_term_corpus_ner:I-RNA_molecule)`, `chemdner_TEXT:MESH:D000255)`, `minimayosrs_sts:7)`, `mlee_NER:B-Localization)`, `bionlp_st_2013_gro_NER:B-ResponseProcess)`, `mantra_gsc_en_medline_ner:I-LIVB)`, `chemdner_TEXT:MESH:D010649)`, `seth_corpus_ner:B-Gene)`, `bionlp_st_2013_gro_ner:B-Attenuator)`, `chemdner_TEXT:MESH:D015363)`, `bionlp_st_2013_pc_NER:B-Inactivation)`, `medmentions_full_ner:I-T191)`, `mlee_ner:I-Organ)`, `chemdner_TEXT:MESH:D011765)`, `bionlp_shared_task_2009_NER:B-Binding)`, `an_em_ner:B-Cellular_component)`, `genia_term_corpus_ner:I-RNA_substructure)`, `medmentions_full_ner:B-T051)`, `anat_em_ner:I-Pathological_formation)`, `bionlp_st_2013_gro_RE:hasPatient3)`, `chemdner_TEXT:MESH:D013634)`, `chemdner_TEXT:MESH:D014414)`, `chia_RE:Has_index)`, `ddi_corpus_ner:B-GROUP)`, `bionlp_st_2013_gro_ner:B-MutantProtein)`, `bionlp_st_2013_ge_NER:I-Negative_regulation)`, `biorelex_ner:I-amino-acid)`, `chemdner_TEXT:MESH:D053279)`, `chemprot_RE:CPR:2)`, `bionlp_st_2013_gro_ner:B-bHLHTF)`, `bionlp_st_2013_cg_NER:I-Breakdown)`, `scai_chemical_ner:I-ABBREVIATION)`, `pdr_NER:B-Cause_of_disease)`, `chemdner_TEXT:MESH:D002219)`, `medmentions_full_ner:B-T044)`, `mirna_ner:B-Non-Specific_miRNAs)`, `chemdner_TEXT:MESH:D020748)`, `bionlp_shared_task_2009_RE:Theme)`, `chemdner_TEXT:MESH:D001647)`, `bionlp_st_2011_ge_NER:I-Regulation)`, `bionlp_st_2013_pc_ner:B-Gene_or_gene_product)`, `biorelex_ner:I-protein)`, `mantra_gsc_en_medline_ner:B-PROC)`, `medmentions_full_ner:I-T081)`, `medmentions_st21pv_ner:B-T022)`, `chia_ner:B-Multiplier)`, `bionlp_st_2013_gro_NER:B-GeneMutation)`, `chemdner_TEXT:MESH:D002232)`, `chemdner_TEXT:MESH:D010456)`, `biosses_sts:7)`, `medmentions_full_ner:B-T071)`, `chemdner_TEXT:MESH:D008628)`, `biorelex_ner:I-protein-complex)`, `chemdner_TEXT:MESH:D007328)`, `bionlp_st_2013_pc_NER:I-Activation)`, `bionlp_st_2013_cg_NER:B-Metabolism)`, `scai_chemical_ner:I-PARTIUPAC)`, `verspoor_2013_ner:B-age)`, `medmentions_full_ner:B-T122)`, `medmentions_full_ner:I-T050)`, `genia_term_corpus_ner:B-ANDother_nameother_name)`, `bionlp_st_2013_gro_NER:B-SPhase)`, `chemdner_TEXT:MESH:D012500)`, `mlee_NER:B-Metabolism)`, `bionlp_st_2011_id_NER:B-Positive_regulation)`, `chemdner_TEXT:MESH:D002794)`, `bionlp_st_2013_gro_NER:B-ProteinTransport)`, `chemdner_TEXT:MESH:D006028)`, `bionlp_st_2013_gro_RE:hasPatient2)`, `chemdner_TEXT:MESH:D009822)`, `bionlp_st_2013_cg_ner:I-Cancer)`, `bionlp_shared_task_2009_ner:I-Entity)`, `pcr_ner:B-Herb)`, `pubmed_qa_labeled_fold0_CLF:yes)`, `bionlp_st_2013_gro_NER:I-NegativeRegulation)`, `bionlp_st_2013_cg_NER:B-Dephosphorylation)`, `anat_em_ner:B-Multi-tissue_structure)`, `chemdner_TEXT:MESH:D008274)`, `medmentions_full_ner:B-T025)`, `chemprot_RE:CPR:9)`, `bionlp_st_2013_pc_RE:Participant)`, `bionlp_st_2013_pc_ner:B-Simple_chemical)`, `genia_term_corpus_ner:B-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:B-bZIP)`, `bionlp_st_2013_gro_ner:I-Eukaryote)`, `bionlp_st_2013_pc_ner:I-Complex)`, `hprd50_ner:I-protein)`, `medmentions_full_ner:B-T020)`, `bionlp_st_2013_gro_ner:B-Agonist)`, `medmentions_full_ner:B-T030)`, `chemdner_TEXT:MESH:D009536)`, `medmentions_full_ner:B-T169)`, `genia_term_corpus_ner:I-nucleotide)`, `bionlp_st_2013_gro_NER:I-ProteinCatabolism)`, `bc5cdr_ner:O)`, `chemdner_TEXT:MESH:D003078)`, `medmentions_full_ner:I-T040)`, `chemdner_TEXT:MESH:D005963)`, `bionlp_st_2013_gro_ner:B-ExpressionProfiling)`, `mantra_gsc_en_emea_ner:I-DEVI)`, `mlee_NER:B-Cell_division)`, `ebm_pico_ner:B-Intervention_Pharmacological)`, `chemdner_TEXT:MESH:D008790)`, `mantra_gsc_en_emea_ner:I-ANAT)`, `mantra_gsc_en_medline_ner:B-ANAT)`, `chemdner_TEXT:MESH:D003545)`, `bionlp_st_2013_gro_NER:I-IntraCellularTransport)`, `bionlp_st_2013_gro_NER:I-CellDivision)`, `chemdner_TEXT:MESH:D013438)`, `bionlp_st_2011_id_NER:I-Negative_regulation)`, `bionlp_st_2013_gro_NER:I-DevelopmentalProcess)`, `mlee_ner:B-Protein_domain_or_region)`, `chemdner_TEXT:MESH:D014978)`, `bionlp_st_2011_id_NER:O)`, `bionlp_st_2013_gro_ner:I-ReporterGeneConstruction)`, `medmentions_full_ner:I-T025)`, `bionlp_st_2019_bb_RE:Exhibits)`, `ddi_corpus_ner:I-GROUP)`, `chemdner_TEXT:MESH:D011241)`, `chemdner_TEXT:MESH:D010446)`, `bionlp_st_2013_gro_ner:I-ExperimentalMethod)`, `anat_em_ner:B-Tissue)`, `chemdner_TEXT:MESH:D000470)`, `bionlp_st_2013_pc_NER:I-Inactivation)`, `bionlp_st_2013_gro_ner:I-Agonist)`, `medmentions_full_ner:B-T024)`, `mlee_NER:I-Transcription)`, `bionlp_st_2011_epi_NER:B-Deglycosylation)`, `bionlp_st_2013_cg_NER:B-Cell_death)`, `chemdner_TEXT:MESH:D000266)`, `chemdner_TEXT:MESH:D019833)`, `genia_term_corpus_ner:I-RNA_family_or_group)`, `biosses_sts:8)`, `lll_RE:genic_interaction)`, `bionlp_st_2013_gro_ner:B-OrganicChemical)`, `chemdner_TEXT:MESH:D013267)`, `bionlp_st_2013_gro_ner:I-TranscriptionCofactor)`, `biorelex_ner:B-protein-region)`, `chemdner_TEXT:MESH:D001565)`, `genia_term_corpus_ner:B-cell_line)`, `bionlp_st_2013_gro_NER:B-Cleavage)`, `ddi_corpus_RE:EFFECT)`, `bionlp_st_2013_cg_NER:B-Planned_process)`, `bionlp_st_2013_cg_ner:I-Immaterial_anatomical_entity)`, `chemdner_TEXT:MESH:D007660)`, `medmentions_full_ner:I-T090)`, `bionlp_st_2013_gro_ner:I-CpGIsland)`, `bionlp_st_2013_gro_ner:B-AminoAcid)`, `chemdner_TEXT:MESH:D001095)`, `mlee_NER:I-Death)`, `bionlp_st_2013_cg_ner:I-Anatomical_system)`, `bionlp_st_2013_gro_NER:B-Decrease)`, `bionlp_st_2013_pc_NER:B-Hydroxylation)`, `chemdner_TEXT:None)`, `bio_sim_verb_sts:3)`, `biorelex_ner:B-protein)`, `bionlp_st_2013_gro_ner:I-BasicDomain)`, `bionlp_st_2011_ge_ner:I-Entity)`, `bionlp_st_2013_gro_ner:B-PhysicalContinuant)`, `chemprot_RE:CPR:4)`, `chemdner_TEXT:MESH:D003345)`, `chemdner_TEXT:MESH:D010080)`, `mantra_gsc_en_patents_ner:O)`, `bionlp_st_2013_gro_ner:B-AntisenseRNA)`, `bionlp_st_2013_gro_ner:B-ProteinCodingDNARegion)`, `chemdner_TEXT:MESH:D010768)`, `chebi_nactem_fullpaper_ner:I-Protein)`, `genia_term_corpus_ner:I-multi_cell)`, `bionlp_st_2013_gro_ner:I-Gene)`, `medmentions_full_ner:B-T042)`, `chemdner_TEXT:MESH:D006034)`, `biorelex_ner:I-brand)`, `chebi_nactem_abstr_ann1_ner:I-Species)`, `chemdner_TEXT:MESH:D012236)`, `bionlp_st_2013_gro_ner:I-GeneProduct)`, `chemdner_TEXT:MESH:D005665)`, `chemdner_TEXT:MESH:D008715)`, `medmentions_st21pv_ner:I-T103)`, `ddi_corpus_RE:None)`, `medmentions_st21pv_ner:I-T091)`, `chemdner_TEXT:MESH:D019158)`, `chemdner_TEXT:MESH:D001280)`, `chemdner_TEXT:MESH:D009249)`, `medmentions_full_ner:I-T067)`, `medmentions_full_ner:B-T005)`, `bionlp_st_2013_cg_NER:I-Remodeling)`, `chemdner_TEXT:MESH:D000166)`, `osiris_ner:B-variant)`, `spl_adr_200db_train_ner:I-DrugClass)`, `mirna_ner:I-Species)`, `medmentions_st21pv_ner:I-T033)`, `ebm_pico_ner:I-Participant_Age)`, `medmentions_full_ner:B-T095)`, `bionlp_st_2013_gro_NER:B-RNAMetabolism)`, `chemdner_TEXT:MESH:D005231)`, `medmentions_full_ner:B-T062)`, `bionlp_st_2011_ge_NER:I-Gene_expression)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactor)`, `genia_term_corpus_ner:B-protein_domain_or_region)`, `mantra_gsc_en_emea_ner:B-PROC)`, `mlee_NER:I-Pathway)`, `bionlp_st_2013_gro_NER:I-BindingOfProteinToProteinBindingSiteOfProtein)`, `bionlp_st_2011_id_COREF:coref)`, `biosses_sts:6)`, `biorelex_ner:I-organism)`, `chia_ner:B-Value)`, `verspoor_2013_ner:B-body-part)`, `chemdner_TEXT:MESH:D004974)`, `chia_RE:Has_mood)`, `medmentions_st21pv_ner:B-T074)`, `chemdner_TEXT:MESH:D000535)`, `verspoor_2013_ner:I-Disorder)`, `bionlp_st_2013_gro_NER:B-BindingToMolecularEntity)`, `bionlp_st_2013_gro_ner:I-ReporterGene)`, `mayosrs_sts:8)`, `bionlp_st_2013_cg_ner:I-DNA_domain_or_region)`, `bionlp_st_2013_gro_NER:I-Pathway)`, `medmentions_st21pv_ner:I-T168)`, `bionlp_st_2013_gro_NER:B-NegativeRegulation)`, `medmentions_full_ner:B-T123)`, `bionlp_st_2013_pc_NER:B-Positive_regulation)`, `bionlp_st_2013_gro_NER:I-FormationOfProteinDNAComplex)`, `chemdner_TEXT:MESH:D000577)`, `mlee_NER:B-Ubiquitination)`, `chemdner_TEXT:MESH:D003630)`, `bionlp_st_2013_gro_ner:B-Transcript)`, `bionlp_st_2013_cg_NER:I-Transcription)`, `anat_em_ner:B-Organ)`, `anat_em_ner:I-Organism_substance)`, `spl_adr_200db_train_ner:B-DrugClass)`, `bionlp_st_2013_gro_ner:I-ProteinSubunit)`, `biorelex_ner:B-protein-domain)`, `chemdner_TEXT:MESH:D006051)`, `bionlp_st_2011_id_NER:B-Process)`, `bionlp_st_2013_pc_NER:B-Ubiquitination)`, `bionlp_st_2013_pc_NER:B-Transcription)`, `chemdner_TEXT:MESH:D006838)`, `bionlp_st_2013_gro_RE:hasPatient5)`, `bionlp_st_2013_ge_NER:B-Localization)`, `chemdner_TEXT:MESH:D011759)`, `chemdner_TEXT:MESH:D053243)`, `biorelex_ner:I-mutation)`, `mantra_gsc_en_emea_ner:I-LIVB)`, `bionlp_st_2013_gro_NER:I-Transport)`, `bionlp_st_2011_id_RE:Site)`, `chemdner_TEXT:MESH:D015474)`, `bionlp_st_2013_gro_NER:B-Dimerization)`, `bionlp_st_2013_cg_NER:I-Localization)`, `medmentions_full_ner:I-T032)`, `chemdner_TEXT:MESH:D018036)`, `medmentions_full_ner:I-T167)`, `chemprot_RE:CPR:5)`, `minimayosrs_sts:2)`, `biorelex_ner:B-protein-DNA-complex)`, `cellfinder_ner:I-CellComponent)`, `nlm_gene_ner:B-Other)`, `medmentions_full_ner:I-T019)`, `chebi_nactem_abstr_ann1_ner:B-Spectral_Data)`, `bionlp_st_2013_cg_ner:I-Multi-tissue_structure)`, `medmentions_full_ner:B-T010)`, `mantra_gsc_en_medline_ner:I-GEOG)`, `chemprot_ner:I-GENE-Y)`, `mirna_ner:I-Diseases)`, `an_em_ner:O)`, `bionlp_st_2013_cg_NER:B-Remodeling)`, `medmentions_st21pv_ner:I-T058)`, `scicite_TEXT:background)`, `bionlp_st_2013_cg_NER:B-Mutation)`, `genia_term_corpus_ner:B-mono_cell)`, `bionlp_st_2013_gro_ner:B-DNA)`, `medmentions_full_ner:I-T114)`, `bionlp_st_2011_id_RE:Theme)`, `genetaggold_ner:B-NEWGENE)`, `mlee_ner:I-Organism_subdivision)`, `bionlp_shared_task_2009_NER:I-Regulation)`, `bionlp_st_2013_gro_ner:B-Microorganism)`, `chemdner_TEXT:MESH:D006108)`, `biorelex_ner:B-amino-acid)`, `bioinfer_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:I-Chemical)`, `mantra_gsc_en_patents_ner:I-DEVI)`, `mantra_gsc_en_medline_ner:O)`, `bionlp_st_2013_pc_NER:I-Regulation)`, `medmentions_full_ner:B-T043)`, `scicite_TEXT:result)`, `bionlp_st_2013_ge_NER:I-Binding)`, `chemdner_TEXT:MESH:D011441)`, `genia_term_corpus_ner:I-protein_domain_or_region)`, `bionlp_st_2011_epi_RE:Cause)`, `bionlp_st_2013_gro_ner:B-Nucleosome)`, `chemdner_TEXT:MESH:D011223)`, `chebi_nactem_abstr_ann1_ner:B-Protein)`, `bionlp_st_2013_gro_RE:hasFunction)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorActivity)`, `biorelex_ner:B-protein-family)`, `bionlp_st_2013_cg_ner:B-Gene_or_gene_product)`, `tmvar_v1_ner:B-SNP)`, `bionlp_st_2013_gro_ner:B-ExperimentalMethod)`, `bionlp_st_2013_gro_ner:B-ReporterGeneConstruction)`, `bionlp_st_2011_ge_NER:B-Transcription)`, `chemdner_TEXT:MESH:D004041)`, `chemdner_TEXT:MESH:D000631)`, `chebi_nactem_fullpaper_ner:I-Species)`, `medmentions_full_ner:B-T170)`, `bionlp_st_2013_gro_ner:B-ForkheadWingedHelix)`, `bionlp_st_2013_cg_ner:B-Organism_subdivision)`, `genia_term_corpus_ner:I-DNA_molecule)`, `bionlp_st_2013_cg_NER:I-Glycolysis)`, `an_em_ner:B-Pathological_formation)`, `bionlp_st_2013_gro_NER:B-TranscriptionTermination)`, `bionlp_st_2013_gro_NER:B-CellAging)`, `bionlp_st_2013_cg_ner:B-Protein_domain_or_region)`, `anat_em_ner:B-Organism_substance)`, `medmentions_full_ner:B-T053)`, `mlee_ner:B-Multi-tissue_structure)`, `biosses_sts:4)`, `bioscope_abstracts_ner:I-speculation)`, `chemdner_TEXT:MESH:D053644)`, `bionlp_st_2013_cg_NER:I-Translation)`, `tmvar_v1_ner:B-DNAMutation)`, `genia_term_corpus_ner:B-RNA_substructure)`, `an_em_ner:B-Anatomical_system)`, `bionlp_st_2013_gro_ner:B-Conformation)`, `bionlp_st_2013_gro_NER:I-NegativeRegulationOfTranscriptionOfGene)`, `medmentions_full_ner:I-T069)`, `chemdner_TEXT:MESH:D006820)`, `chemdner_TEXT:MESH:D015725)`, `chemdner_TEXT:MESH:D010281)`, `mlee_NER:B-Pathway)`, `bionlp_st_2011_id_NER:I-Regulation)`, `bionlp_st_2013_gro_NER:I-GeneExpression)`, `medmentions_full_ner:I-T073)`, `biosses_sts:2)`, `medmentions_full_ner:I-T043)`, `chemdner_TEXT:MESH:D001152)`, `bionlp_st_2013_gro_ner:I-DNAMolecule)`, `chemdner_TEXT:MESH:D015636)`, `chemdner_TEXT:MESH:D000666)`, `chemprot_RE:None)`, `bionlp_st_2013_gro_ner:B-Sequence)`, `chemdner_TEXT:MESH:D009151)`, `chia_ner:B-Observation)`, `an_em_COREF:coref)`, `medmentions_full_ner:B-T120)`, `bionlp_st_2013_gro_ner:B-Tissue)`, `bionlp_st_2013_gro_ner:B-MolecularEntity)`, `bionlp_st_2013_pc_NER:B-Dephosphorylation)`, `chemdner_TEXT:MESH:D044242)`, `bionlp_st_2013_gro_ner:B-FusionProtein)`, `biorelex_ner:B-cell)`, `bionlp_st_2013_gro_NER:B-Disease)`, `bionlp_st_2011_id_RE:None)`, `biorelex_ner:B-protein-motif)`, `bionlp_st_2013_pc_NER:I-Localization)`, `bionlp_st_2013_gro_ner:B-ZincCoordinatingDomain)`, `bionlp_st_2013_gro_ner:B-Locus)`, `genia_term_corpus_ner:B-other_organic_compound)`, `seth_corpus_ner:B-SNP)`, `pcr_ner:O)`, `genia_term_corpus_ner:I-virus)`, `bionlp_st_2013_gro_ner:I-Peptide)`, `chebi_nactem_abstr_ann1_ner:B-Chemical)`, `bionlp_st_2013_gro_ner:B-RNAMolecule)`, `bionlp_st_2013_gro_ner:B-SequenceHomologyAnalysis)`, `chemdner_TEXT:MESH:D005054)`, `bionlp_st_2013_ge_NER:B-Phosphorylation)`, `bionlp_st_2013_gro_NER:B-CellularProcess)`, `bionlp_st_2013_ge_RE:Site2)`, `verspoor_2013_ner:B-Phenomena)`, `chia_ner:I-Temporal)`, `bionlp_st_2013_gro_NER:I-Localization)`, `bionlp_st_2013_cg_NER:B-Ubiquitination)`, `chemdner_TEXT:MESH:D009020)`, `bionlp_st_2013_cg_RE:FromLoc)`, `mlee_ner:B-Organism_substance)`, `genia_term_corpus_ner:I-tissue)`, `medmentions_st21pv_ner:I-T082)`, `chemdner_TEXT:MESH:D054358)`, `medmentions_full_ner:I-T052)`, `chemdner_TEXT:MESH:D005459)`, `chemdner_TEXT:MESH:D047188)`, `medmentions_full_ner:I-T031)`, `chemdner_TEXT:MESH:D013890)`, `chemdner_TEXT:MESH:D004573)`, `genia_term_corpus_ner:B-peptide)`, `an_em_ner:I-Organism_subdivision)`, `bionlp_st_2013_gro_ner:B-MessengerRNA)`, `medmentions_full_ner:B-T171)`, `bionlp_st_2013_gro_NER:B-Affecting)`, `genia_term_corpus_ner:I-body_part)`, `bionlp_st_2013_gro_ner:B-Prokaryote)`, `chemdner_TEXT:MESH:D013844)`, `medmentions_full_ner:I-T061)`, `bionlp_st_2013_pc_NER:B-Negative_regulation)`, `bionlp_st_2013_gro_ner:I-EukaryoticCell)`, `pdr_ner:I-Plant)`, `chemdner_TEXT:MESH:D024341)`, `medmentions_full_ner:I-T092)`, `chemdner_TEXT:MESH:D020319)`, `bionlp_st_2013_cg_NER:B-Cell_transformation)`, `bionlp_st_2013_gro_NER:B-BindingOfTranscriptionFactorToDNA)`, `an_em_ner:I-Anatomical_system)`, `bionlp_st_2011_epi_NER:B-Hydroxylation)`, `bionlp_st_2013_gro_ner:I-Exon)`, `cellfinder_ner:B-Species)`, `bionlp_st_2013_gro_NER:B-Pathway)`, `bionlp_st_2013_ge_NER:B-Protein_modification)`, `bionlp_st_2013_gro_ner:I-FusionGene)`, `bionlp_st_2011_rel_ner:B-Entity)`, `bionlp_st_2011_id_RE:CSite)`, `bionlp_st_2013_ge_NER:B-Positive_regulation)`, `bionlp_st_2013_gro_ner:I-BindingAssay)`, `bionlp_st_2013_gro_NER:B-CellDivision)`, `bionlp_st_2019_bb_ner:I-Microorganism)`, `medmentions_full_ner:I-T059)`, `chemdner_TEXT:MESH:D011108)`, `bionlp_st_2013_gro_NER:B-PositiveRegulationOfTranscription)`, `bionlp_st_2013_gro_ner:B-GeneRegion)`, `bionlp_st_2013_cg_COREF:None)`, `chemdner_TEXT:MESH:D010261)`, `mlee_NER:B-Binding)`, `chemprot_ner:I-CHEMICAL)`, `bionlp_st_2011_id_RE:ToLoc)`, `biorelex_ner:I-organelle)`, `chemdner_TEXT:MESH:D004318)`, `genia_term_corpus_ner:I-DNA_family_or_group)`, `bionlp_st_2013_gro_ner:B-RNAPolymerase)`, `bionlp_st_2013_gro_ner:B-CellComponent)`, `bionlp_st_2013_gro_NER:B-RegulationOfGeneExpression)`, `bionlp_st_2013_gro_ner:B-Peptide)`, `bionlp_shared_task_2009_NER:B-Transcription)`, `biorelex_ner:B-tissue)`, `pico_extraction_ner:B-participant)`, `chia_ner:I-Visit)`, `chemdner_TEXT:MESH:D011807)`, `chemdner_TEXT:MESH:D014501)`, `bionlp_st_2013_gro_NER:I-IntraCellularProcess)`, `ehr_rel_sts:7)`, `pico_extraction_ner:I-intervention)`, `chemdner_TEXT:MESH:D001599)`, `bionlp_st_2013_gro_ner:I-RegulatoryDNARegion)`, `medmentions_st21pv_ner:I-T037)`, `chemdner_TEXT:MESH:D055768)`, `bionlp_st_2013_gro_ner:B-ChromosomalDNA)`, `chemdner_TEXT:MESH:D008550)`, `bionlp_st_2013_pc_RE:Site)`, `medmentions_full_ner:I-T087)`, `chemdner_TEXT:MESH:D001583)`, `bionlp_st_2011_epi_NER:B-Dehydroxylation)`, `ehr_rel_sts:3)`, `bionlp_st_2013_gro_ner:I-MutantProtein)`, `chemdner_TEXT:MESH:D011804)`, `medmentions_full_ner:B-T091)`, `bionlp_st_2013_cg_RE:CSite)`, `linnaeus_ner:O)`, `medmentions_st21pv_ner:B-T201)`, `verspoor_2013_ner:B-Disorder)`, `bionlp_st_2013_cg_NER:I-Death)`, `bioinfer_ner:I-Individual_protein)`, `medmentions_full_ner:B-T191)`, `verspoor_2013_ner:B-ethnicity)`, `chemdner_TEXT:MESH:D002083)`, `genia_term_corpus_ner:B-carbohydrate)`, `genia_term_corpus_ner:B-DNA_molecule)`, `medmentions_full_ner:B-T069)`, `pdr_NER:I-Treatment_of_disease)`, `mlee_ner:B-Anatomical_system)`, `chebi_nactem_fullpaper_ner:B-Spectral_Data)`, `chemdner_TEXT:MESH:D005419)`, `bionlp_st_2013_gro_ner:I-Nucleotide)`, `medmentions_full_ner:B-T194)`, `chemdner_TEXT:MESH:D005947)`, `chemdner_TEXT:MESH:D008627)`, `bionlp_st_2013_gro_NER:B-ExperimentalIntervention)`, `chemdner_TEXT:MESH:D011073)`, `chia_RE:Has_negation)`, `verspoor_2013_ner:I-mutation)`, `chemdner_TEXT:MESH:D004224)`, `chemdner_TEXT:MESH:D005663)`, `medmentions_full_ner:I-T094)`, `chemdner_TEXT:MESH:D006877)`, `ebm_pico_ner:B-Outcome_Mortality)`, `bionlp_st_2013_gro_ner:B-TranscriptionRepressor)`, `biorelex_ner:I-cell)`, `bionlp_st_2013_gro_NER:I-BindingOfProteinToDNA)`, `verspoor_2013_RE:None)`, `bionlp_st_2013_gro_NER:B-ProteinModification)`, `chemdner_TEXT:MESH:D047090)`, `medmentions_full_ner:I-T204)`, `chemdner_TEXT:MESH:D006843)`, `biorelex_ner:I-protein-family)`, `chemdner_TEXT:MESH:D012694)`, `bionlp_st_2013_gro_ner:B-TranslationFactor)`, `scai_chemical_ner:B-)`, `bionlp_st_2013_gro_ner:B-Exon)`, `medmentions_full_ner:I-T083)`, `bionlp_st_2013_gro_ner:I-TranscriptionActivatorActivity)`, `medmentions_full_ner:I-T101)`, `medmentions_full_ner:B-T034)`, `bionlp_st_2013_gro_ner:I-Histone)`, `ddi_corpus_RE:MECHANISM)`, `mantra_gsc_en_emea_ner:I-PROC)`, `genia_term_corpus_ner:I-peptide)`, `bionlp_st_2013_cg_NER:B-Cell_proliferation)`, `chemdner_TEXT:MESH:D004140)`, `medmentions_full_ner:B-T083)`, `diann_iber_eval_en_ner:I-Disability)`, `bionlp_st_2013_gro_NER:B-PosttranslationalModification)`, `biorelex_ner:I-fusion-protein)`, `chemdner_TEXT:MESH:D020910)`, `chemdner_TEXT:MESH:D014747)`, `bionlp_st_2013_ge_NER:B-Gene_expression)`, `biorelex_ner:I-tissue)`, `mantra_gsc_en_patents_ner:B-LIVB)`, `medmentions_full_ner:O)`, `medmentions_full_ner:B-T077)`, `bionlp_st_2013_gro_ner:I-Operon)`, `chemdner_TEXT:MESH:D002392)`, `chemdner_TEXT:MESH:D014498)`, `chemdner_TEXT:MESH:D002368)`, `chemdner_TEXT:MESH:D018817)`, `bionlp_st_2013_ge_NER:I-Regulation)`, `genia_term_corpus_ner:B-atom)`, `chemdner_TEXT:MESH:D011092)`, `chemdner_TEXT:MESH:D015283)`, `chemdner_TEXT:MESH:D018698)`, `chemdner_TEXT:MESH:D009569)`, `muchmore_en_ner:I-umlsterm)`, `bionlp_st_2013_cg_NER:B-Death)`, `nlm_gene_ner:I-Other)`, `medmentions_full_ner:B-T109)`, `osiris_ner:I-variant)`, `ehr_rel_sts:6)`, `chemdner_TEXT:MESH:D001120)`, `mlee_ner:I-Protein_domain_or_region)`, `bionlp_st_2013_pc_NER:B-Dissociation)`, `bionlp_st_2013_cg_NER:B-Metastasis)`, `chemdner_TEXT:MESH:D014204)`, `chemdner_TEXT:MESH:D005857)`, `medmentions_full_ner:I-T030)`, `chemdner_TEXT:MESH:D019256)`, `bionlp_st_2013_gro_ner:B-Polymerase)`, `chia_ner:B-Negation)`, `bionlp_st_2013_gro_NER:B-CellularMetabolicProcess)`, `bionlp_st_2013_gro_NER:B-CellDifferentiation)`, `biorelex_ner:I-protein-motif)`, `medmentions_full_ner:I-T093)`, `chemdner_TEXT:MESH:D019820)`, `anat_em_ner:B-Pathological_formation)`, `bionlp_shared_task_2009_NER:B-Localization)`, `genia_term_corpus_ner:B-RNA_domain_or_region)`, `chemdner_TEXT:MESH:D014668)`, `bionlp_st_2013_pc_ner:I-Gene_or_gene_product)`, `chemdner_TEXT:MESH:D019207)`, `bionlp_st_2013_gro_NER:B-BindingOfProteinToProteinBindingSiteOfDNA)`, `medmentions_full_ner:B-T059)`, `bionlp_st_2013_gro_ner:B-Ligand)`, `bio_sim_verb_sts:6)`, `biorelex_ner:B-experimental-construct)`, `bionlp_st_2013_gro_ner:I-DNA)`, `pdr_NER:O)`, `chemdner_TEXT:MESH:D008670)`, `bionlp_st_2011_ge_RE:Cause)`, `chemdner_TEXT:MESH:D015232)`, `bionlp_st_2013_pc_NER:O)`, `bionlp_st_2013_gro_NER:B-FormationOfProteinDNAComplex)`, `medmentions_full_ner:B-T121)`, `bionlp_shared_task_2009_NER:B-Regulation)`, `chemdner_TEXT:MESH:D009534)`, `chemdner_TEXT:MESH:D014451)`, `bionlp_st_2011_id_RE:AtLoc)`, `chemdner_TEXT:MESH:D011799)`, `medmentions_st21pv_ner:B-T204)`, `genia_term_corpus_ner:I-protein_subunit)`, `biorelex_ner:I-assay)`, `chemdner_TEXT:MESH:D005680)`, `an_em_ner:I-Organism_substance)`, `chemdner_TEXT:MESH:D010368)`, `chemdner_TEXT:MESH:D000872)`, `bionlp_st_2011_id_NER:I-Gene_expression)`, `bionlp_st_2013_cg_NER:B-Regulation)`, `mlee_ner:I-DNA_domain_or_region)`, `chemdner_TEXT:MESH:D001393)`, `medmentions_full_ner:I-T038)`, `chemdner_TEXT:MESH:D047311)`, `chemdner_TEXT:MESH:D011453)`, `chemdner_TEXT:MESH:D020106)`, `chemdner_TEXT:MESH:D019257)`, `bionlp_st_2013_gro_ner:B-NuclearReceptor)`, `chemdner_TEXT:MESH:D002117)`, `genia_term_corpus_ner:B-lipid)`, `bionlp_st_2013_gro_ner:B-SmallInterferingRNA)`, `chemdner_TEXT:MESH:D011205)`, `chemdner_TEXT:MESH:D002686)`, `bionlp_st_2013_gro_NER:B-Translation)`, `ebm_pico_ner:I-Intervention_Psychological)`, `mlee_ner:I-Drug_or_compound)`, `bionlp_st_2013_gro_ner:I-TranscriptionFactorBindingSiteOfDNA)`, `chemdner_TEXT:MESH:D000688)`, `bionlp_st_2011_ge_RE:None)`, `bionlp_st_2013_gro_ner:B-ProteinSubunit)`, `genia_term_corpus_ner:I-ANDother_nameother_name)`, `bionlp_st_2013_gro_NER:I-Heterodimerization)`, `pico_extraction_ner:B-intervention)`, `bionlp_st_2013_cg_ner:I-Organism)`, `bionlp_st_2013_gro_ner:I-ProteinDomain)`, `bionlp_st_2013_gro_NER:I-BindingToProtein)`, `scai_chemical_ner:I-)`, `biorelex_ner:B-experiment-tag)`, `ebm_pico_ner:B-Intervention_Physical)`, `bionlp_st_2013_cg_RE:ToLoc)`, `bionlp_st_2013_gro_NER:B-FormationOfTranscriptionFactorComplex)`, `linnaeus_ner:B-species)`, `medmentions_full_ner:I-T062)`, `chemdner_TEXT:MESH:D014640)`, `mlee_NER:B-Gene_expression)`, `chemdner_TEXT:MESH:D008701)`, `mlee_NER:O)`, `chemdner_TEXT:MESH:D014302)`, `genia_term_corpus_ner:B-RNA_family_or_group)`, `medmentions_full_ner:I-T091)`, `medmentions_full_ner:B-T022)`, `medmentions_full_ner:B-T074)`, `bionlp_st_2013_gro_NER:B-ProteinCatabolism)`, `bionlp_st_2013_gro_RE:hasPatient4)`, `chemdner_TEXT:MESH:D011388)`, `bionlp_st_2013_ge_NER:I-Phosphorylation)`, `bionlp_st_2013_gro_NER:I-CellAdhesion)`, `anat_em_ner:I-Organ)`, `medmentions_full_ner:B-T045)`, `chemdner_TEXT:MESH:D008727)`, `chebi_nactem_abstr_ann1_ner:B-Species)`, `bionlp_st_2013_gro_ner:I-RNAPolymeraseII)`, `nlm_gene_ner:B-STARGENE)`, `mantra_gsc_en_emea_ner:B-OBJC)`, `bionlp_st_2013_gro_ner:B-DNABindingDomainOfProtein)`, `chemdner_TEXT:MESH:D010636)`, `chemdner_TEXT:MESH:D004061)`, `mlee_NER:I-Binding)`, `medmentions_full_ner:B-T075)`, `medmentions_full_ner:B-UnknownType)`, `chemdner_TEXT:MESH:D019081)`, `bionlp_st_2013_gro_NER:I-Binding)`, `medmentions_full_ner:I-T005)`, `chemdner_TEXT:MESH:D009821)` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biomuppet_en_5.2.0_3.0_1699292355718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biomuppet_en_5.2.0_3.0_1699292355718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biomuppet","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biomuppet","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.biomuppet.by_leonweber").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biomuppet| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|420.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/leonweber/biomuppet \ No newline at end of file From a2d8b7dac589b98b37b41d944e6a5cd1f33e920c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:52:03 +0700 Subject: [PATCH 307/667] Add model 2023-11-06-bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en --- ...bert_scivocab_uncased_ft_tv_sdu21_ai_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en.md new file mode 100644 index 00000000000000..71efa0119070a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai BertForTokenClassification from napsternxg +author: John Snow Labs +name: bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en_5.2.0_3.0_1699300157518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai_en_5.2.0_3.0_1699300157518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_scibert_scivocab_uncased_ft_tv_sdu21_ai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/napsternxg/scibert_scivocab_uncased_ft_tv_SDU21_AI \ No newline at end of file From 6fbe9f98a2d65be80cf8760a998a325dca9eaf29 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:53:03 +0700 Subject: [PATCH 308/667] Add model 2023-11-06-bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en --- ...eswitch_spaeng_sayula_popoluca_lince_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en.md new file mode 100644 index 00000000000000..83cc14082f8648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince BertForTokenClassification from sagorsarker +author: John Snow Labs +name: bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince` is a English model originally trained by sagorsarker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en_5.2.0_3.0_1699307513091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince_en_5.2.0_3.0_1699307513091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_codeswitch_spaeng_sayula_popoluca_lince| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/sagorsarker/codeswitch-spaeng-pos-lince \ No newline at end of file From 011c0136ee0fa194f5492ec9a3315dd62896aaa7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:54:03 +0700 Subject: [PATCH 309/667] Add model 2023-11-06-bert_ner_legalbert_beneficiary_single_en --- ...ert_ner_legalbert_beneficiary_single_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_beneficiary_single_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_beneficiary_single_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_beneficiary_single_en.md new file mode 100644 index 00000000000000..33e7f450375432 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_legalbert_beneficiary_single_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Anery) +author: John Snow Labs +name: bert_ner_legalbert_beneficiary_single +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `legalbert_beneficiary_single` is a English model originally trained by `Anery`. + +## Predicted Entities + +`AC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_legalbert_beneficiary_single_en_5.2.0_3.0_1699296397141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_legalbert_beneficiary_single_en_5.2.0_3.0_1699296397141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_legalbert_beneficiary_single","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_legalbert_beneficiary_single","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.legal").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_legalbert_beneficiary_single| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Anery/legalbert_beneficiary_single \ No newline at end of file From 03e9d0d7ef617f8bec65b15ea8d45326f0afb014 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:55:03 +0700 Subject: [PATCH 310/667] Add model 2023-11-06-bert_ner_mateocolina_bert_finetuned_ner_en --- ...t_ner_mateocolina_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mateocolina_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mateocolina_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mateocolina_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..459eb6d3b2fb72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mateocolina_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mateocolina) +author: John Snow Labs +name: bert_ner_mateocolina_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `mateocolina`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mateocolina_bert_finetuned_ner_en_5.2.0_3.0_1699294025892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mateocolina_bert_finetuned_ner_en_5.2.0_3.0_1699294025892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mateocolina_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mateocolina_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_mateocolina").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mateocolina_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mateocolina/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From b108b0bca33c61e939096b6bab63ba09d3d5e2ce Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:56:04 +0700 Subject: [PATCH 311/667] Add model 2023-11-06-bert_ner_amasi_wikineural_multilingual_ner_en --- ...er_amasi_wikineural_multilingual_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_amasi_wikineural_multilingual_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_amasi_wikineural_multilingual_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_amasi_wikineural_multilingual_ner_en.md new file mode 100644 index 00000000000000..534e3b46644edd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_amasi_wikineural_multilingual_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from amasi) +author: John Snow Labs +name: bert_ner_amasi_wikineural_multilingual_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `wikineural-multilingual-ner` is a English model originally trained by `amasi`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_amasi_wikineural_multilingual_ner_en_5.2.0_3.0_1699282412379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_amasi_wikineural_multilingual_ner_en_5.2.0_3.0_1699282412379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_amasi_wikineural_multilingual_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_amasi_wikineural_multilingual_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.wikineural.multilingual.by_amasi").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_amasi_wikineural_multilingual_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/amasi/wikineural-multilingual-ner \ No newline at end of file From 95fefb5879bd99c2448bf32a70fe60f372be9575 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:57:04 +0700 Subject: [PATCH 312/667] Add model 2023-11-06-bert_ner_tiny_bert_for_token_classification_en --- ...r_tiny_bert_for_token_classification_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_bert_for_token_classification_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_bert_for_token_classification_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_bert_for_token_classification_en.md new file mode 100644 index 00000000000000..4abec8a913df7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_bert_for_token_classification_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Tiny Cased model (from hf-internal-testing) +author: John Snow Labs +name: bert_ner_tiny_bert_for_token_classification +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tiny-bert-for-token-classification` is a English model originally trained by `hf-internal-testing`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tiny_bert_for_token_classification_en_5.2.0_3.0_1699301009832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tiny_bert_for_token_classification_en_5.2.0_3.0_1699301009832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tiny_bert_for_token_classification","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tiny_bert_for_token_classification","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.tiny.by_hf_internal_testing").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tiny_bert_for_token_classification| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|527.6 KB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/hf-internal-testing/tiny-bert-for-token-classification \ No newline at end of file From d9e9d6f24ca55deaa0b9e82012662693174488bb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:58:04 +0700 Subject: [PATCH 313/667] Add model 2023-11-06-bert_italian_finetuned_ner_it --- ...023-11-06-bert_italian_finetuned_ner_it.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_italian_finetuned_ner_it.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_italian_finetuned_ner_it.md b/docs/_posts/ahmedlone127/2023-11-06-bert_italian_finetuned_ner_it.md new file mode 100644 index 00000000000000..d11875f0915520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_italian_finetuned_ner_it.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Italian bert_italian_finetuned_ner BertForTokenClassification from nickprock +author: John Snow Labs +name: bert_italian_finetuned_ner +date: 2023-11-06 +tags: [bert, it, open_source, token_classification, onnx] +task: Named Entity Recognition +language: it +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_italian_finetuned_ner` is a Italian model originally trained by nickprock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_italian_finetuned_ner_it_5.2.0_3.0_1699307848390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_italian_finetuned_ner_it_5.2.0_3.0_1699307848390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_italian_finetuned_ner","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_italian_finetuned_ner", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_italian_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|it| +|Size:|409.7 MB| + +## References + +https://huggingface.co/nickprock/bert-italian-finetuned-ner \ No newline at end of file From 3515cd09e5d3a774a8b3289432d01280cc3b9146 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 04:59:05 +0700 Subject: [PATCH 314/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_toto_punctuator_en --- ...sayula_popoluca_tiny_toto_punctuator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_toto_punctuator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_toto_punctuator_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_toto_punctuator_en.md new file mode 100644 index 00000000000000..f3a4edf0d05a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_toto_punctuator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_toto_punctuator BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_toto_punctuator +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_toto_punctuator` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_toto_punctuator_en_5.2.0_3.0_1699307918818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_toto_punctuator_en_5.2.0_3.0_1699307918818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_toto_punctuator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_toto_punctuator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_toto_punctuator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_toto_punctuator \ No newline at end of file From 365d09f64803bbfa1bea33803acd1bc303610e4b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:00:05 +0700 Subject: [PATCH 315/667] Add model 2023-11-06-bert_addresses_en --- .../2023-11-06-bert_addresses_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_addresses_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_addresses_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_addresses_en.md new file mode 100644 index 00000000000000..f27aa8f38a7a0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_addresses_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_addresses BertForTokenClassification from ctrlbuzz +author: John Snow Labs +name: bert_addresses +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_addresses` is a English model originally trained by ctrlbuzz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_addresses_en_5.2.0_3.0_1699304551042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_addresses_en_5.2.0_3.0_1699304551042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_addresses","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_addresses", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_addresses| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/ctrlbuzz/bert-addresses \ No newline at end of file From df58daec16d7b596f7f0bc77c02e29aaaa258299 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:01:05 +0700 Subject: [PATCH 316/667] Add model 2023-11-06-bert_ner_buntan_bert_finetuned_ner_en --- ...6-bert_ner_buntan_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_buntan_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_buntan_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_buntan_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..85665eeb3f54b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_buntan_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_buntan_bert_finetuned_ner BertForTokenClassification from Buntan +author: John Snow Labs +name: bert_ner_buntan_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_buntan_bert_finetuned_ner` is a English model originally trained by Buntan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_buntan_bert_finetuned_ner_en_5.2.0_3.0_1699276868703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_buntan_bert_finetuned_ner_en_5.2.0_3.0_1699276868703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_buntan_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_buntan_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_buntan_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Buntan/bert-finetuned-ner \ No newline at end of file From 50cebcb5b89322ecfd0856856ca98414917f9466 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:02:06 +0700 Subject: [PATCH 317/667] Add model 2023-11-06-bert_ner_craft_original_pubmedbert_384_en --- ...rt_ner_craft_original_pubmedbert_384_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_pubmedbert_384_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_pubmedbert_384_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_pubmedbert_384_en.md new file mode 100644 index 00000000000000..ae76ba111e3fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_craft_original_pubmedbert_384_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_craft_original_pubmedbert_384 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_craft_original_pubmedbert_384 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_craft_original_pubmedbert_384` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_craft_original_pubmedbert_384_en_5.2.0_3.0_1699277790730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_craft_original_pubmedbert_384_en_5.2.0_3.0_1699277790730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_craft_original_pubmedbert_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_craft_original_pubmedbert_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_craft_original_pubmedbert_384| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/CRAFT-Original-PubMedBERT-384 \ No newline at end of file From 0126dfd18b39b860ccd52f4be9e24ce269c18807 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:03:06 +0700 Subject: [PATCH 318/667] Add model 2023-11-06-bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en --- ...alkraj_bert_base_cased_ner_conll2003_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en.md new file mode 100644 index 00000000000000..8e07d4af10edc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Base Cased model (from kamalkraj) +author: John Snow Labs +name: bert_ner_kamalkraj_bert_base_cased_ner_conll2003 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-cased-ner-conll2003` is a English model originally trained by `kamalkraj`. + +## Predicted Entities + +`ORG`, `MISC`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en_5.2.0_3.0_1699295382664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_kamalkraj_bert_base_cased_ner_conll2003_en_5.2.0_3.0_1699295382664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kamalkraj_bert_base_cased_ner_conll2003","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kamalkraj_bert_base_cased_ner_conll2003","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.cased_base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_kamalkraj_bert_base_cased_ner_conll2003| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/kamalkraj/bert-base-cased-ner-conll2003 +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From a17e1facca86c0a65c51ef9d447b43587668bc5b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:04:06 +0700 Subject: [PATCH 319/667] Add model 2023-11-06-bert_ner_kurama_bert_finetuned_ner_en --- ...6-bert_ner_kurama_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurama_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurama_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurama_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..e885db189cf5af --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_kurama_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from kurama) +author: John Snow Labs +name: bert_ner_kurama_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `kurama`. + +## Predicted Entities + +`MISC`, `ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_kurama_bert_finetuned_ner_en_5.2.0_3.0_1699295552710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_kurama_bert_finetuned_ner_en_5.2.0_3.0_1699295552710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kurama_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_kurama_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_kurama").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_kurama_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/kurama/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From fd6b915e5344745e2be4a889ad53a177d0e941ac Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:05:06 +0700 Subject: [PATCH 320/667] Add model 2023-11-06-bert_ner_ysharma_bert_finetuned_ner_en --- ...-bert_ner_ysharma_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ysharma_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ysharma_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ysharma_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..ddea248bf671dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ysharma_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from ysharma) +author: John Snow Labs +name: bert_ner_ysharma_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `ysharma`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ysharma_bert_finetuned_ner_en_5.2.0_3.0_1699301535436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ysharma_bert_finetuned_ner_en_5.2.0_3.0_1699301535436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ysharma_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ysharma_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_ysharma").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ysharma_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ysharma/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 5797a8d847a8af652505215cad6d5da96a3369c9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:06:06 +0700 Subject: [PATCH 321/667] Add model 2023-11-06-bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu --- ...cognition_nerkor_hungarian_hungarian_hu.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu.md new file mode 100644 index 00000000000000..7703ce6118a447 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Hungarian bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian BertForTokenClassification from NYTK +author: John Snow Labs +name: bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian +date: 2023-11-06 +tags: [bert, hu, open_source, token_classification, onnx] +task: Named Entity Recognition +language: hu +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian` is a Hungarian model originally trained by NYTK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu_5.2.0_3.0_1699308204116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian_hu_5.2.0_3.0_1699308204116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian","hu") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian", "hu") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_named_entity_recognition_nerkor_hungarian_hungarian| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|hu| +|Size:|412.5 MB| + +## References + +https://huggingface.co/NYTK/named-entity-recognition-nerkor-hubert-hungarian \ No newline at end of file From 1a037342ac0e0909cb355b4b0a533a8bca46712f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:07:07 +0700 Subject: [PATCH 322/667] Add model 2023-11-06-bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh --- ...oluca_bert_ancient_chinese_base_upos_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh.md new file mode 100644 index 00000000000000..8ca19e080789e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_sayula_popoluca_bert_ancient_chinese_base_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_ancient_chinese_base_upos +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_ancient_chinese_base_upos` is a Chinese model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh_5.2.0_3.0_1699297246594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_ancient_chinese_base_upos_zh_5.2.0_3.0_1699297246594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_ancient_chinese_base_upos","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_ancient_chinese_base_upos", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_ancient_chinese_base_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|430.7 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-ancient-chinese-base-upos \ No newline at end of file From 79db0891221400b0bbf73025abdeb2f71164f728 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:08:07 +0700 Subject: [PATCH 323/667] Add model 2023-11-06-bert_ner_bert_base_swedish_cased_neriob_sv --- ...t_ner_bert_base_swedish_cased_neriob_sv.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_swedish_cased_neriob_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_swedish_cased_neriob_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_swedish_cased_neriob_sv.md new file mode 100644 index 00000000000000..5d613e643cd506 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_base_swedish_cased_neriob_sv.md @@ -0,0 +1,114 @@ +--- +layout: model +title: Swedish BertForTokenClassification Base Cased model (from KBLab) +author: John Snow Labs +name: bert_ner_bert_base_swedish_cased_neriob +date: 2023-11-06 +tags: [bert, ner, open_source, sv, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-swedish-cased-neriob` is a Swedish model originally trained by `KBLab`. + +## Predicted Entities + +`PER`, `LOC`, `LOCORG`, `EVN`, `TME`, `WRK`, `MSR`, `OBJ`, `PRSWRK`, `OBJORG`, `ORG`, `ORGPRS`, `LOCPRS` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_swedish_cased_neriob_sv_5.2.0_3.0_1699288022037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_base_swedish_cased_neriob_sv_5.2.0_3.0_1699288022037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_swedish_cased_neriob","sv") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Jag älskar Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_base_swedish_cased_neriob","sv") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Jag älskar Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sv.ner.bert.cased_base.neriob.by_kblab").predict("""Jag älskar Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_base_swedish_cased_neriob| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/KBLab/bert-base-swedish-cased-neriob \ No newline at end of file From 4e7070343b424de56a643a9b213fcd2b8b249463 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:09:07 +0700 Subject: [PATCH 324/667] Add model 2023-11-06-bert_ner_offlangdetectionturkish_tr --- ...-06-bert_ner_offlangdetectionturkish_tr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_offlangdetectionturkish_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_offlangdetectionturkish_tr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_offlangdetectionturkish_tr.md new file mode 100644 index 00000000000000..20b1d3aea6317d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_offlangdetectionturkish_tr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Turkish bert_ner_offlangdetectionturkish BertForTokenClassification from savasy +author: John Snow Labs +name: bert_ner_offlangdetectionturkish +date: 2023-11-06 +tags: [bert, tr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_offlangdetectionturkish` is a Turkish model originally trained by savasy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_offlangdetectionturkish_tr_5.2.0_3.0_1699298650328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_offlangdetectionturkish_tr_5.2.0_3.0_1699298650328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_offlangdetectionturkish","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_offlangdetectionturkish", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_offlangdetectionturkish| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.5 MB| + +## References + +https://huggingface.co/savasy/offLangDetectionTurkish \ No newline at end of file From 3166c74990c7a7838a363d1a9ad9cb4d0d106fcc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:10:08 +0700 Subject: [PATCH 325/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar --- ...amelbert_catalan_sayula_popoluca_msa_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar.md new file mode 100644 index 00000000000000..053e348e1d710b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar_5.2.0_3.0_1699302606661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa_ar_5.2.0_3.0_1699302606661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_catalan_sayula_popoluca_msa| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.7 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-ca-pos-msa \ No newline at end of file From 6191fcc8161aa59cd56162f3d8dd461258a5fec2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:11:08 +0700 Subject: [PATCH 326/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl --- ...ed_finetuned_udlassy_sayula_popoluca_nl.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl.md new file mode 100644 index 00000000000000..d28daffaf766aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Dutch, Flemish bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca BertForTokenClassification from wietsedv +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca +date: 2023-11-06 +tags: [bert, nl, open_source, token_classification, onnx] +task: Named Entity Recognition +language: nl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca` is a Dutch, Flemish model originally trained by wietsedv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl_5.2.0_3.0_1699297812004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca_nl_5.2.0_3.0_1699297812004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_dutch_cased_finetuned_udlassy_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|406.7 MB| + +## References + +https://huggingface.co/wietsedv/bert-base-dutch-cased-finetuned-udlassy-pos \ No newline at end of file From 75395d905565cf6e3d8abcd10e4bc01ce13fa380 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:12:08 +0700 Subject: [PATCH 327/667] Add model 2023-11-06-bert_ner_bert_srb_ner_setimes_en --- ...-11-06-bert_ner_bert_srb_ner_setimes_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_srb_ner_setimes_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_srb_ner_setimes_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_srb_ner_setimes_en.md new file mode 100644 index 00000000000000..7126b8356d39b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_srb_ner_setimes_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Aleksandar) +author: John Snow Labs +name: bert_ner_bert_srb_ner_setimes +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-srb-ner-setimes` is a English model originally trained by `Aleksandar`. + +## Predicted Entities + +`misc`, `deriv`, `org`, `loc`, `per` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_srb_ner_setimes_en_5.2.0_3.0_1699289138381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_srb_ner_setimes_en_5.2.0_3.0_1699289138381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_srb_ner_setimes","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_srb_ner_setimes","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_aleksandar").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_srb_ner_setimes| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Aleksandar/bert-srb-ner-setimes \ No newline at end of file From 266b661d40f0909f8bb1486f808f7b6d859e6efb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:13:08 +0700 Subject: [PATCH 328/667] Add model 2023-11-06-bert_token_classifier_base_han_chinese_ws_xiandai_zh --- ...assifier_base_han_chinese_ws_xiandai_zh.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_xiandai_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_xiandai_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_xiandai_zh.md new file mode 100644 index 00000000000000..701ff0e2dfa80a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_xiandai_zh.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_ws_xiandai +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-han-chinese-ws-xiandai` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`B`, `I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_xiandai_zh_5.2.0_3.0_1699303132711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_xiandai_zh_5.2.0_3.0_1699303132711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_xiandai","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_xiandai","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_ws_xiandai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-han-chinese-ws-xiandai +- https://github.com/ckiplab/han-transformers \ No newline at end of file From d99c51a1101ed43f35b0284f7587253a8c1c069b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:14:08 +0700 Subject: [PATCH 329/667] Add model 2023-11-06-bert_token_classifier_base_han_chinese_ws_shanggu_zh --- ...assifier_base_han_chinese_ws_shanggu_zh.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_shanggu_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_shanggu_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_shanggu_zh.md new file mode 100644 index 00000000000000..32630e75e1082d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_shanggu_zh.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_ws_shanggu +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-han-chinese-ws-shanggu` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`B`, `I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_shanggu_zh_5.2.0_3.0_1699302841534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_shanggu_zh_5.2.0_3.0_1699302841534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_shanggu","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_shanggu","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_ws_shanggu| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-han-chinese-ws-shanggu +- https://github.com/ckiplab/han-transformers \ No newline at end of file From 2d630f8dbd5bde7b152d963b8bd10f42dce444d5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:15:09 +0700 Subject: [PATCH 330/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en --- ...ingual_cased_sayula_popoluca_english_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en.md new file mode 100644 index 00000000000000..e31c741d265085 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english BertForTokenClassification from QCRI +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english` is a English model originally trained by QCRI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en_5.2.0_3.0_1699303340728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english_en_5.2.0_3.0_1699303340728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_multilingual_cased_sayula_popoluca_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.2 MB| + +## References + +https://huggingface.co/QCRI/bert-base-multilingual-cased-pos-english \ No newline at end of file From c6bb84562a2e2ce1dd2fe499c9b22762fc1fa1ab Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:16:09 +0700 Subject: [PATCH 331/667] Add model 2023-11-06-bert_ner_butchland_bert_finetuned_ner_en --- ...ert_ner_butchland_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_butchland_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_butchland_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_butchland_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..945ca52090f135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_butchland_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from butchland) +author: John Snow Labs +name: bert_ner_butchland_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `butchland`. + +## Predicted Entities + +`ORG`, `LOC`, `MISC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_butchland_bert_finetuned_ner_en_5.2.0_3.0_1699290573795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_butchland_bert_finetuned_ner_en_5.2.0_3.0_1699290573795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_butchland_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_butchland_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_butchland").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_butchland_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/butchland/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From b27b77cd7e6690598002572aa245088321c87c4a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:17:09 +0700 Subject: [PATCH 332/667] Add model 2023-11-06-bert_ner_original_bluebert_linnaeus_en --- ...-bert_ner_original_bluebert_linnaeus_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_linnaeus_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_linnaeus_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_linnaeus_en.md new file mode 100644 index 00000000000000..83ac9b7d7d69cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_bluebert_linnaeus_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_bluebert_linnaeus BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_bluebert_linnaeus +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_bluebert_linnaeus` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_bluebert_linnaeus_en_5.2.0_3.0_1699281191949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_bluebert_linnaeus_en_5.2.0_3.0_1699281191949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_bluebert_linnaeus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_bluebert_linnaeus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_bluebert_linnaeus| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-BlueBERT-Linnaeus \ No newline at end of file From c33f9ea4e53ac288a4e478fbd85b56efc4e00d58 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:18:10 +0700 Subject: [PATCH 333/667] Add model 2023-11-06-jobbert_knowledge_extraction_en --- ...3-11-06-jobbert_knowledge_extraction_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-jobbert_knowledge_extraction_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-jobbert_knowledge_extraction_en.md b/docs/_posts/ahmedlone127/2023-11-06-jobbert_knowledge_extraction_en.md new file mode 100644 index 00000000000000..63ea24878b9f05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-jobbert_knowledge_extraction_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English jobbert_knowledge_extraction BertForTokenClassification from jjzha +author: John Snow Labs +name: jobbert_knowledge_extraction +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobbert_knowledge_extraction` is a English model originally trained by jjzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobbert_knowledge_extraction_en_5.2.0_3.0_1699304016062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobbert_knowledge_extraction_en_5.2.0_3.0_1699304016062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("jobbert_knowledge_extraction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("jobbert_knowledge_extraction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobbert_knowledge_extraction| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|402.2 MB| + +## References + +https://huggingface.co/jjzha/jobbert_knowledge_extraction \ No newline at end of file From 7126471adab5309a2ea4e126a1361c9e5a3954a3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:19:10 +0700 Subject: [PATCH 334/667] Add model 2023-11-06-bert_ner_bert_mention_english_vera_pro_en --- ...rt_ner_bert_mention_english_vera_pro_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_english_vera_pro_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_english_vera_pro_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_english_vera_pro_en.md new file mode 100644 index 00000000000000..7e50446bc010fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_mention_english_vera_pro_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bert_mention_english_vera_pro BertForTokenClassification from vera-pro +author: John Snow Labs +name: bert_ner_bert_mention_english_vera_pro +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bert_mention_english_vera_pro` is a English model originally trained by vera-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mention_english_vera_pro_en_5.2.0_3.0_1699288592892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_mention_english_vera_pro_en_5.2.0_3.0_1699288592892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_mention_english_vera_pro","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bert_mention_english_vera_pro", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_mention_english_vera_pro| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/vera-pro/bert-mention-en \ No newline at end of file From 0435e436bb2c4637e0cd6a713bac569533450bcc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:20:10 +0700 Subject: [PATCH 335/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_thai_upos_th --- ..._sayula_popoluca_bert_base_thai_upos_th.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_thai_upos_th.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_thai_upos_th.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_thai_upos_th.md new file mode 100644 index 00000000000000..15c0b4738f0bef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_thai_upos_th.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Thai bert_sayula_popoluca_bert_base_thai_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_thai_upos +date: 2023-11-06 +tags: [bert, th, open_source, token_classification, onnx] +task: Named Entity Recognition +language: th +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_thai_upos` is a Thai model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_thai_upos_th_5.2.0_3.0_1699303051514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_thai_upos_th_5.2.0_3.0_1699303051514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_thai_upos","th") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_thai_upos", "th") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_thai_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|th| +|Size:|345.3 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-base-thai-upos \ No newline at end of file From 985a60f70fd07b0b6b7b1fca93300c492018c870 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:21:11 +0700 Subject: [PATCH 336/667] Add model 2023-11-06-bert_ner_ner_test_en --- .../2023-11-06-bert_ner_ner_test_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_test_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_test_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_test_en.md new file mode 100644 index 00000000000000..eae3c1d517e667 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_ner_test_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from fgravelaine) +author: John Snow Labs +name: bert_ner_ner_test +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `ner-test` is a English model originally trained by `fgravelaine`. + +## Predicted Entities + +`MADIN`, `TAG`, `COLOR`, `LOC`, `CAT`, `COUNTRY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_ner_test_en_5.2.0_3.0_1699297211148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_ner_test_en_5.2.0_3.0_1699297211148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_test","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_ner_test","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_fgravelaine").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_ner_test| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/fgravelaine/ner-test \ No newline at end of file From b515e99a9f03bb5d075447a35c2a43df30cf0dd9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:22:11 +0700 Subject: [PATCH 337/667] Add model 2023-11-06-bert_sayula_popoluca_mbert_grammatical_error_tagger_en --- ...oluca_mbert_grammatical_error_tagger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_mbert_grammatical_error_tagger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_mbert_grammatical_error_tagger_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_mbert_grammatical_error_tagger_en.md new file mode 100644 index 00000000000000..59ad87bdce9acf --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_mbert_grammatical_error_tagger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_mbert_grammatical_error_tagger BertForTokenClassification from alice-hml +author: John Snow Labs +name: bert_sayula_popoluca_mbert_grammatical_error_tagger +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_mbert_grammatical_error_tagger` is a English model originally trained by alice-hml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_mbert_grammatical_error_tagger_en_5.2.0_3.0_1699309288172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_mbert_grammatical_error_tagger_en_5.2.0_3.0_1699309288172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_mbert_grammatical_error_tagger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_mbert_grammatical_error_tagger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_mbert_grammatical_error_tagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/alice-hml/mBERT_grammatical_error_tagger \ No newline at end of file From 0a1c180624218df1cbf95640ce2d6fc08dfd4845 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:23:11 +0700 Subject: [PATCH 338/667] Add model 2023-11-06-bert_ner_yv_bert_finetuned_ner_en --- ...11-06-bert_ner_yv_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..68be942356ba42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_yv_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_yv_bert_finetuned_ner BertForTokenClassification from Yv +author: John Snow Labs +name: bert_ner_yv_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_yv_bert_finetuned_ner` is a English model originally trained by Yv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_yv_bert_finetuned_ner_en_5.2.0_3.0_1699282015925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_yv_bert_finetuned_ner_en_5.2.0_3.0_1699282015925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_yv_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_yv_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_yv_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Yv/bert-finetuned-ner \ No newline at end of file From 7ffafd12e16492efabdc3f9eeee26c59bd174fb9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:24:11 +0700 Subject: [PATCH 339/667] Add model 2023-11-06-bert_ner_labse_ner_nerel_ru --- .../2023-11-06-bert_ner_labse_ner_nerel_ru.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_labse_ner_nerel_ru.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_labse_ner_nerel_ru.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_labse_ner_nerel_ru.md new file mode 100644 index 00000000000000..3a5e136135b26a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_labse_ner_nerel_ru.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Russian bert_ner_labse_ner_nerel BertForTokenClassification from surdan +author: John Snow Labs +name: bert_ner_labse_ner_nerel +date: 2023-11-06 +tags: [bert, ru, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_labse_ner_nerel` is a Russian model originally trained by surdan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_labse_ner_nerel_ru_5.2.0_3.0_1699280332092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_labse_ner_nerel_ru_5.2.0_3.0_1699280332092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_labse_ner_nerel","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_labse_ner_nerel", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_labse_ner_nerel| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ru| +|Size:|480.5 MB| + +## References + +https://huggingface.co/surdan/LaBSE_ner_nerel \ No newline at end of file From 6757ceccd19afe414b0406cbb1ca9518715cf848 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:25:12 +0700 Subject: [PATCH 340/667] Add model 2023-11-06-bert_sayula_popoluca_wwdd_tiny_en --- ...11-06-bert_sayula_popoluca_wwdd_tiny_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_wwdd_tiny_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_wwdd_tiny_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_wwdd_tiny_en.md new file mode 100644 index 00000000000000..d4c03d50d23c4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_wwdd_tiny_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_wwdd_tiny BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_wwdd_tiny +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_wwdd_tiny` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_wwdd_tiny_en_5.2.0_3.0_1699308782051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_wwdd_tiny_en_5.2.0_3.0_1699308782051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_wwdd_tiny","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_wwdd_tiny", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_wwdd_tiny| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/wwdd_tiny \ No newline at end of file From 1214d12eb79cb78260b5a549e2f9935da94ae758 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:26:12 +0700 Subject: [PATCH 341/667] Add model 2023-11-06-bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en --- ...scibert_scivocab_uncased_tv_sdu21_ai_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en.md new file mode 100644 index 00000000000000..786ac6be07f8c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_scibert_scivocab_uncased_tv_sdu21_ai BertForTokenClassification from napsternxg +author: John Snow Labs +name: bert_ner_scibert_scivocab_uncased_tv_sdu21_ai +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_scibert_scivocab_uncased_tv_sdu21_ai` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en_5.2.0_3.0_1699299559527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_scibert_scivocab_uncased_tv_sdu21_ai_en_5.2.0_3.0_1699299559527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_scibert_scivocab_uncased_tv_sdu21_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_scibert_scivocab_uncased_tv_sdu21_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_scibert_scivocab_uncased_tv_sdu21_ai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/napsternxg/scibert_scivocab_uncased_tv_SDU21_AI \ No newline at end of file From 65d159c35ebb677696e7c9e1ff35bebbf8c63d4b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:27:12 +0700 Subject: [PATCH 342/667] Add model 2023-11-06-negation_and_uncertainty_scope_detection_mbert_fine_tuned_en --- ...nty_scope_detection_mbert_fine_tuned_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-negation_and_uncertainty_scope_detection_mbert_fine_tuned_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-negation_and_uncertainty_scope_detection_mbert_fine_tuned_en.md b/docs/_posts/ahmedlone127/2023-11-06-negation_and_uncertainty_scope_detection_mbert_fine_tuned_en.md new file mode 100644 index 00000000000000..f977aec616f4e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-negation_and_uncertainty_scope_detection_mbert_fine_tuned_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English negation_and_uncertainty_scope_detection_mbert_fine_tuned BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: negation_and_uncertainty_scope_detection_mbert_fine_tuned +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`negation_and_uncertainty_scope_detection_mbert_fine_tuned` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/negation_and_uncertainty_scope_detection_mbert_fine_tuned_en_5.2.0_3.0_1699309597062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/negation_and_uncertainty_scope_detection_mbert_fine_tuned_en_5.2.0_3.0_1699309597062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("negation_and_uncertainty_scope_detection_mbert_fine_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("negation_and_uncertainty_scope_detection_mbert_fine_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|negation_and_uncertainty_scope_detection_mbert_fine_tuned| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/Negation_and_Uncertainty_Scope_Detection_mBERT_fine_tuned \ No newline at end of file From ec447ab8b9294213dc323034b398ca584f0b4efd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:28:13 +0700 Subject: [PATCH 343/667] Add model 2023-11-06-bert_sayula_popoluca_bert_large_german_upos_de --- ...yula_popoluca_bert_large_german_upos_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_german_upos_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_german_upos_de.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_german_upos_de.md new file mode 100644 index 00000000000000..51aeab9d7f5aec --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_large_german_upos_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German bert_sayula_popoluca_bert_large_german_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_large_german_upos +date: 2023-11-06 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_large_german_upos` is a German model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_german_upos_de_5.2.0_3.0_1699303896635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_large_german_upos_de_5.2.0_3.0_1699303896635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_large_german_upos","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_large_german_upos", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_large_german_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|1.3 GB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-large-german-upos \ No newline at end of file From f6f6863ff8e9f4a47b5102c264063864e401c260 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:29:13 +0700 Subject: [PATCH 344/667] Add model 2023-11-06-bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en --- ...arge_cased_finetuned_conll03_english_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en.md new file mode 100644 index 00000000000000..f17330bbfcdc22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Large Cased model (from imvladikon) +author: John Snow Labs +name: bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-large-cased-finetuned-conll03-english` is a English model originally trained by `imvladikon`. + +## Predicted Entities + +`PER`, `LOC`, `MISC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en_5.2.0_3.0_1699293022370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english_en_5.2.0_3.0_1699293022370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.cased_large_finetuned.by_imvladikon").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_imvladikon_bert_large_cased_finetuned_conll03_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/imvladikon/bert-large-cased-finetuned-conll03-english \ No newline at end of file From 0110108f9de5efd00d07e9f72601ee5928a58a9c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:30:13 +0700 Subject: [PATCH 345/667] Add model 2023-11-06-jobbert_skill_extraction_en --- .../2023-11-06-jobbert_skill_extraction_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-jobbert_skill_extraction_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-jobbert_skill_extraction_en.md b/docs/_posts/ahmedlone127/2023-11-06-jobbert_skill_extraction_en.md new file mode 100644 index 00000000000000..f5cbcfe3141ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-jobbert_skill_extraction_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English jobbert_skill_extraction BertForTokenClassification from jjzha +author: John Snow Labs +name: jobbert_skill_extraction +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobbert_skill_extraction` is a English model originally trained by jjzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobbert_skill_extraction_en_5.2.0_3.0_1699304183375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobbert_skill_extraction_en_5.2.0_3.0_1699304183375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("jobbert_skill_extraction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("jobbert_skill_extraction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobbert_skill_extraction| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|402.2 MB| + +## References + +https://huggingface.co/jjzha/jobbert_skill_extraction \ No newline at end of file From 8dd0e6dc4e18b458b2f4bf6bbd376ac1c63b05a7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:31:13 +0700 Subject: [PATCH 346/667] Add model 2023-11-06-bert_token_classifier_berturk_sunlp_ner_turkish_tr --- ...classifier_berturk_sunlp_ner_turkish_tr.md | 102 ++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_sunlp_ner_turkish_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_sunlp_ner_turkish_tr.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_sunlp_ner_turkish_tr.md new file mode 100644 index 00000000000000..81edfca44a15b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_berturk_sunlp_ner_turkish_tr.md @@ -0,0 +1,102 @@ +--- +layout: model +title: Turkish BertForTokenClassification Cased model (from busecarik) +author: John Snow Labs +name: bert_token_classifier_berturk_sunlp_ner_turkish +date: 2023-11-06 +tags: [tr, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `berturk-sunlp-ner-turkish` is a Turkish model originally trained by `busecarik`. + +## Predicted Entities + +`ORGANIZATION`, `TVSHOW`, `MONEY`, `LOCATION`, `PRODUCT`, `TIME`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_berturk_sunlp_ner_turkish_tr_5.2.0_3.0_1699304172705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_berturk_sunlp_ner_turkish_tr_5.2.0_3.0_1699304172705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_berturk_sunlp_ner_turkish","tr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_berturk_sunlp_ner_turkish","tr") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_berturk_sunlp_ner_turkish| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|689.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/busecarik/berturk-sunlp-ner-turkish +- https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset +- http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.484.pdf \ No newline at end of file From f34167bfd57946c0f58d3d4b942a8751aa196ff8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:32:14 +0700 Subject: [PATCH 347/667] Add model 2023-11-06-bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en --- ...arge_cased_finetuned_conll03_english_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en.md new file mode 100644 index 00000000000000..86ce7b7d4d2f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Tiny Cased model (from sshleifer) +author: John Snow Labs +name: bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tiny-dbmdz-bert-large-cased-finetuned-conll03-english` is a English model originally trained by `sshleifer`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en_5.2.0_3.0_1699301132855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english_en_5.2.0_3.0_1699301132855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.cased_large_tiny_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tiny_dbmdz_bert_large_cased_finetuned_conll03_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|528.1 KB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sshleifer/tiny-dbmdz-bert-large-cased-finetuned-conll03-english \ No newline at end of file From 3d678a8b197920d7b7f86ae29ce41900b7ae0c89 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:33:14 +0700 Subject: [PATCH 348/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_cased_ccg_en --- ..._sayula_popoluca_bert_base_cased_ccg_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_ccg_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_ccg_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_ccg_en.md new file mode 100644 index 00000000000000..b259c7f79f3a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_cased_ccg_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_base_cased_ccg BertForTokenClassification from QCRI +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_cased_ccg +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_cased_ccg` is a English model originally trained by QCRI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_cased_ccg_en_5.2.0_3.0_1699300808412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_cased_ccg_en_5.2.0_3.0_1699300808412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_cased_ccg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_cased_ccg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_cased_ccg| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.5 MB| + +## References + +https://huggingface.co/QCRI/bert-base-cased-ccg \ No newline at end of file From 00fd50ee31f386482f8829ef6b41928982c169fe Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:34:14 +0700 Subject: [PATCH 349/667] Add model 2023-11-06-bert_token_classifier_autotrain_final_784824206_en --- ...classifier_autotrain_final_784824206_en.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_final_784824206_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_final_784824206_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_final_784824206_en.md new file mode 100644 index 00000000000000..db5f336ad224b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_final_784824206_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Lucifermorningstar011) +author: John Snow Labs +name: bert_token_classifier_autotrain_final_784824206 +date: 2023-11-06 +tags: [en, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-final-784824206` is a English model originally trained by `Lucifermorningstar011`. + +## Predicted Entities + +`9`, `0` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_final_784824206_en_5.2.0_3.0_1699308782243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_final_784824206_en_5.2.0_3.0_1699308782243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_autotrain_final_784824206","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_autotrain_final_784824206","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_autotrain_final_784824206| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Lucifermorningstar011/autotrain-final-784824206 \ No newline at end of file From a29863e7f95304c05e1825802fb98575a43746c4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:35:15 +0700 Subject: [PATCH 350/667] Add model 2023-11-06-bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm --- ...rt_base_uncased_ner_nigerian_pidgin_pcm.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm.md new file mode 100644 index 00000000000000..c2c69cc99267b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Nigerian Pidgin bert_ner_mbert_base_uncased_ner_nigerian_pidgin BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_mbert_base_uncased_ner_nigerian_pidgin +date: 2023-11-06 +tags: [bert, pcm, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pcm +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_uncased_ner_nigerian_pidgin` is a Nigerian Pidgin model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm_5.2.0_3.0_1699297293688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_ner_nigerian_pidgin_pcm_5.2.0_3.0_1699297293688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_uncased_ner_nigerian_pidgin","pcm") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_uncased_ner_nigerian_pidgin", "pcm") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_uncased_ner_nigerian_pidgin| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pcm| +|Size:|665.1 MB| + +## References + +https://huggingface.co/arnolfokam/mbert-base-uncased-ner-pcm \ No newline at end of file From 75de466d8802e77a7ef4126d3e3b8945c28042e6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:36:15 +0700 Subject: [PATCH 351/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv --- ...t_base_swedish_cased_sayula_popoluca_sv.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv.md new file mode 100644 index 00000000000000..bb00c38e717d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Swedish bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca BertForTokenClassification from KBLab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca +date: 2023-11-06 +tags: [bert, sv, open_source, token_classification, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca` is a Swedish model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv_5.2.0_3.0_1699303862586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca_sv_5.2.0_3.0_1699303862586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca","sv") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca", "sv") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_swedish_cased_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.3 MB| + +## References + +https://huggingface.co/KBLab/bert-base-swedish-cased-pos \ No newline at end of file From 2f7a58e2ce3ea69435efd0533b9df0acbaac12bc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:37:15 +0700 Subject: [PATCH 352/667] Add model 2023-11-06-bert_ner_m_bert_ner_en --- .../2023-11-06-bert_ner_m_bert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_m_bert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_m_bert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_m_bert_ner_en.md new file mode 100644 index 00000000000000..ea6352ea650aaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_m_bert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_m_bert_ner BertForTokenClassification from Andrija +author: John Snow Labs +name: bert_ner_m_bert_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_m_bert_ner` is a English model originally trained by Andrija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_m_bert_ner_en_5.2.0_3.0_1699278976314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_m_bert_ner_en_5.2.0_3.0_1699278976314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_m_bert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_m_bert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_m_bert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Andrija/M-bert-NER \ No newline at end of file From d650cc2eb501d5471704a784c85ee72e248848c1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:38:16 +0700 Subject: [PATCH 353/667] Add model 2023-11-06-bert_token_classifier_autotrain_turkmen_1181244086_en --- ...ssifier_autotrain_turkmen_1181244086_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_turkmen_1181244086_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_turkmen_1181244086_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_turkmen_1181244086_en.md new file mode 100644 index 00000000000000..d3f3dbae0eede5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_autotrain_turkmen_1181244086_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_token_classifier_autotrain_turkmen_1181244086 BertForTokenClassification from Shenzy2 +author: John Snow Labs +name: bert_token_classifier_autotrain_turkmen_1181244086 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_autotrain_turkmen_1181244086` is a English model originally trained by Shenzy2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_turkmen_1181244086_en_5.2.0_3.0_1699310235474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_turkmen_1181244086_en_5.2.0_3.0_1699310235474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_autotrain_turkmen_1181244086","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_autotrain_turkmen_1181244086", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_autotrain_turkmen_1181244086| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Shenzy2/autotrain-tk-1181244086 \ No newline at end of file From 8d0bc83c76828b3f85a066e29cfe007ed8ac29e3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:39:16 +0700 Subject: [PATCH 354/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl --- ...ase_dutch_cased_upos_alpino_gronings_nl.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl.md new file mode 100644 index 00000000000000..c27ec8abdd21c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Dutch, Flemish bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings BertForTokenClassification from GroNLP +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings +date: 2023-11-06 +tags: [bert, nl, open_source, token_classification, onnx] +task: Named Entity Recognition +language: nl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings` is a Dutch, Flemish model originally trained by GroNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl_5.2.0_3.0_1699302820080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings_nl_5.2.0_3.0_1699302820080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_dutch_cased_upos_alpino_gronings| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|348.9 MB| + +## References + +https://huggingface.co/GroNLP/bert-base-dutch-cased-upos-alpino-gronings \ No newline at end of file From 98bd2d12c8896e3d2fdb8f8c862121d8c30f20dd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:40:16 +0700 Subject: [PATCH 355/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk --- ...oluca_bert_base_slavic_cyrillic_upos_uk.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk.md new file mode 100644 index 00000000000000..ca566302e12aa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Ukrainian bert_sayula_popoluca_bert_base_slavic_cyrillic_upos BertForTokenClassification from KoichiYasuoka +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_slavic_cyrillic_upos +date: 2023-11-06 +tags: [bert, uk, open_source, token_classification, onnx] +task: Named Entity Recognition +language: uk +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_slavic_cyrillic_upos` is a Ukrainian model originally trained by KoichiYasuoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk_5.2.0_3.0_1699303611590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_slavic_cyrillic_upos_uk_5.2.0_3.0_1699303611590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_slavic_cyrillic_upos","uk") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_slavic_cyrillic_upos", "uk") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_slavic_cyrillic_upos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|uk| +|Size:|667.5 MB| + +## References + +https://huggingface.co/KoichiYasuoka/bert-base-slavic-cyrillic-upos \ No newline at end of file From ff087da9e6c1bc20fc6cbecf8c8e6a6a8e1f8888 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:41:17 +0700 Subject: [PATCH 356/667] Add model 2023-11-06-bert_ner_tiny_distilbert_base_cased_en --- ...-bert_ner_tiny_distilbert_base_cased_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_distilbert_base_cased_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_distilbert_base_cased_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_distilbert_base_cased_en.md new file mode 100644 index 00000000000000..15838595a1bd7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tiny_distilbert_base_cased_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Tiny Cased model (from sshleifer) +author: John Snow Labs +name: bert_ner_tiny_distilbert_base_cased +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tiny-distilbert-base-cased` is a English model originally trained by `sshleifer`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tiny_distilbert_base_cased_en_5.2.0_3.0_1699300582474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tiny_distilbert_base_cased_en_5.2.0_3.0_1699300582474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tiny_distilbert_base_cased","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tiny_distilbert_base_cased","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.distilled_cased_base_tiny").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tiny_distilbert_base_cased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|528.1 KB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/sshleifer/tiny-distilbert-base-cased \ No newline at end of file From 9e7dadd696672f5e6fcce69da497ec93b6ab1ee9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:42:17 +0700 Subject: [PATCH 357/667] Add model 2023-11-06-bert_ner_mbateman_bert_finetuned_ner_accelerate_en --- ...ateman_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..8dde76068c98ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbateman_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from mbateman) +author: John Snow Labs +name: bert_ner_mbateman_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `mbateman`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbateman_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699296100140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbateman_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699296100140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbateman_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbateman_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_mbateman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbateman_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/mbateman/bert-finetuned-ner-accelerate \ No newline at end of file From 2e1de3753839427d8101e48df012dfba8e47e821 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:43:17 +0700 Subject: [PATCH 358/667] Add model 2023-11-06-bert_ner_biored_dis_modified_pubmedbert_384_5_en --- ...biored_dis_modified_pubmedbert_384_5_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_384_5_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_384_5_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_384_5_en.md new file mode 100644 index 00000000000000..49bfb8654332d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_biored_dis_modified_pubmedbert_384_5_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_biored_dis_modified_pubmedbert_384_5 BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_biored_dis_modified_pubmedbert_384_5 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_biored_dis_modified_pubmedbert_384_5` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_modified_pubmedbert_384_5_en_5.2.0_3.0_1699278284185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_biored_dis_modified_pubmedbert_384_5_en_5.2.0_3.0_1699278284185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_biored_dis_modified_pubmedbert_384_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_biored_dis_modified_pubmedbert_384_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_biored_dis_modified_pubmedbert_384_5| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioRed-Dis-Modified-PubMedBERT-384-5 \ No newline at end of file From 988d8d5e85bea39fca17eaa69d0bc2c3b9d8c8cc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:44:17 +0700 Subject: [PATCH 359/667] Add model 2023-11-06-bert_ner_hing_bert_lid_hi --- .../2023-11-06-bert_ner_hing_bert_lid_hi.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_hing_bert_lid_hi.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hing_bert_lid_hi.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hing_bert_lid_hi.md new file mode 100644 index 00000000000000..cbc3dad9842868 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_hing_bert_lid_hi.md @@ -0,0 +1,110 @@ +--- +layout: model +title: Hindi Named Entity Recognition (from l3cube-pune) +author: John Snow Labs +name: bert_ner_hing_bert_lid +date: 2023-11-06 +tags: [bert, ner, token_classification, hi, open_source, onnx] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, uploaded to Hugging Face, adapted and imported into Spark NLP. `hing-bert-lid` is a Hindi model orginally trained by `l3cube-pune`. + +## Predicted Entities + +`EN`, `HI` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_hing_bert_lid_hi_5.2.0_3.0_1699292024423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_hing_bert_lid_hi_5.2.0_3.0_1699292024423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_hing_bert_lid","hi") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["मुझे स्पार्क एनएलपी बहुत पसंद है"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_hing_bert_lid","hi") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("मुझे स्पार्क एनएलपी बहुत पसंद है").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_hing_bert_lid| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|hi| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/l3cube-pune/hing-bert-lid +- https://github.com/l3cube-pune/code-mixed-nlp +- https://arxiv.org/abs/2204.08398 \ No newline at end of file From 4f7cac7db1d6ee16cbc20a0b7c8e5c408bfe4b91 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:45:18 +0700 Subject: [PATCH 360/667] Add model 2023-11-06-bert_ner_prot_bert_bfd_ss3_en --- ...023-11-06-bert_ner_prot_bert_bfd_ss3_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_prot_bert_bfd_ss3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_prot_bert_bfd_ss3_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_prot_bert_bfd_ss3_en.md new file mode 100644 index 00000000000000..2d49ba04c598ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_prot_bert_bfd_ss3_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Rostlab) +author: John Snow Labs +name: bert_ner_prot_bert_bfd_ss3 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `prot_bert_bfd_ss3` is a English model originally trained by `Rostlab`. + +## Predicted Entities + +`H`, `C`, `E` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_prot_bert_bfd_ss3_en_5.2.0_3.0_1699297853422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_prot_bert_bfd_ss3_en_5.2.0_3.0_1699297853422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_prot_bert_bfd_ss3","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_prot_bert_bfd_ss3","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_rostlab").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_prot_bert_bfd_ss3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.6 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Rostlab/prot_bert_bfd_ss3 \ No newline at end of file From 45fb253d5f655c8ada27b467ab5443061810c7c6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:46:18 +0700 Subject: [PATCH 361/667] Add model 2023-11-06-bert_ner_umlsbert_ner_en --- .../2023-11-06-bert_ner_umlsbert_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_umlsbert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_umlsbert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_umlsbert_ner_en.md new file mode 100644 index 00000000000000..d39eee548b88e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_umlsbert_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from RohanVB) +author: John Snow Labs +name: bert_ner_umlsbert_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `umlsbert_ner` is a English model originally trained by `RohanVB`. + +## Predicted Entities + +`test`, `problem`, `treatment` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_umlsbert_ner_en_5.2.0_3.0_1699298642614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_umlsbert_ner_en_5.2.0_3.0_1699298642614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_umlsbert_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_umlsbert_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_rohanvb").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_umlsbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/RohanVB/umlsbert_ner \ No newline at end of file From c86e3257520559c131a590fd153fe335df602d27 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:47:18 +0700 Subject: [PATCH 362/667] Add model 2023-11-06-bert_ner_original_scibert_linnaeus_en --- ...6-bert_ner_original_scibert_linnaeus_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_linnaeus_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_linnaeus_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_linnaeus_en.md new file mode 100644 index 00000000000000..fc375800807bab --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_linnaeus_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_scibert_linnaeus BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_scibert_linnaeus +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_scibert_linnaeus` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_linnaeus_en_5.2.0_3.0_1699282219957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_linnaeus_en_5.2.0_3.0_1699282219957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_scibert_linnaeus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_scibert_linnaeus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_scibert_linnaeus| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-SciBERT-Linnaeus \ No newline at end of file From 9285c60677bc6876eb063efd4a121025db5de38a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:48:19 +0700 Subject: [PATCH 363/667] Add model 2023-11-06-bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en --- ...oluca_bert_finetuned_sayula_popoluca_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en.md new file mode 100644 index 00000000000000..f4d8320af73332 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bert_finetuned_sayula_popoluca BertForTokenClassification from Fredvv +author: John Snow Labs +name: bert_sayula_popoluca_bert_finetuned_sayula_popoluca +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_finetuned_sayula_popoluca` is a English model originally trained by Fredvv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en_5.2.0_3.0_1699303573296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_finetuned_sayula_popoluca_en_5.2.0_3.0_1699303573296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_finetuned_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_finetuned_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_finetuned_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Fredvv/bert-finetuned-pos \ No newline at end of file From e1aaf4e1985d9c8cd62b6f80a7b9baa5e3301c32 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:49:19 +0700 Subject: [PATCH 364/667] Add model 2023-11-06-bert_ner_bioformer_cased_v1.0_bc2gm_en --- ...-bert_ner_bioformer_cased_v1.0_bc2gm_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_bc2gm_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_bc2gm_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_bc2gm_en.md new file mode 100644 index 00000000000000..ff3d4e3163e60b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bioformer_cased_v1.0_bc2gm_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from bioformers) +author: John Snow Labs +name: bert_ner_bioformer_cased_v1.0_bc2gm +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bioformer-cased-v1.0-bc2gm` is a English model originally trained by `bioformers`. + +## Predicted Entities + +`bio` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bioformer_cased_v1.0_bc2gm_en_5.2.0_3.0_1699292053667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bioformer_cased_v1.0_bc2gm_en_5.2.0_3.0_1699292053667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bioformer_cased_v1.0_bc2gm","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bioformer_cased_v1.0_bc2gm","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bioformer.bc2gm.cased").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bioformer_cased_v1.0_bc2gm| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|158.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/bioformers/bioformer-cased-v1.0-bc2gm +- https://doi.org/10.1186/gb-2008-9-s2-s2 \ No newline at end of file From f18b3853d3ebcd763f502eb7ef09ecc892452ad9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:50:19 +0700 Subject: [PATCH 365/667] Add model 2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es --- ...nish_cased_finetuned_sayula_popoluca_es.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es.md new file mode 100644 index 00000000000000..aa70ab1f5dadce --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Castilian, Spanish bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca BertForTokenClassification from mrm8488 +author: John Snow Labs +name: bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca +date: 2023-11-06 +tags: [bert, es, open_source, token_classification, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es_5.2.0_3.0_1699303777052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca_es_5.2.0_3.0_1699303777052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_spanish_cased_finetuned_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-pos \ No newline at end of file From 2518f2e7b893e3734215ea4a0b850d0e69c2ca82 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:51:19 +0700 Subject: [PATCH 366/667] Add model 2023-11-06-bert_ner_small2_en --- .../2023-11-06-bert_ner_small2_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_small2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_small2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_small2_en.md new file mode 100644 index 00000000000000..91dd5ca5382d32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_small2_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Small Cased model (from Narsil) +author: John Snow Labs +name: bert_ner_small2 +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `small2` is a English model originally trained by `Narsil`. + +## Predicted Entities + +`ORG`, `PER`, `LOC`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_small2_en_5.2.0_3.0_1699299785612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_small2_en_5.2.0_3.0_1699299785612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_small2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_small2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.small.by_narsil").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_small2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|527.6 KB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Narsil/small2 \ No newline at end of file From 60fecd22815047e94b1d9f69619a38c55e853ebf Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:52:20 +0700 Subject: [PATCH 367/667] Add model 2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_accelerate_en --- ...terhsu_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..0e27b2b7d8ed7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_peterhsu_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from peterhsu) +author: John Snow Labs +name: bert_ner_peterhsu_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `peterhsu`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_peterhsu_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699299193052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_peterhsu_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699299193052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_peterhsu_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_peterhsu_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.accelerate.by_peterhsu").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_peterhsu_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/peterhsu/bert-finetuned-ner-accelerate \ No newline at end of file From 1a6d98591fe89ee90a4066597c5ec4db55087b89 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:53:20 +0700 Subject: [PATCH 368/667] Add model 2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar --- ...ic_camelbert_msa_sayula_popoluca_glf_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar.md new file mode 100644 index 00000000000000..4024f513460710 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf BertForTokenClassification from CAMeL-Lab +author: John Snow Labs +name: bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf +date: 2023-11-06 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar_5.2.0_3.0_1699302805972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf_ar_5.2.0_3.0_1699302805972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bert_base_arabic_camelbert_msa_sayula_popoluca_glf| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|406.4 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-pos-glf \ No newline at end of file From 8c4fc51b26ab1cd26fe6f61e03575c8647ca032f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:54:20 +0700 Subject: [PATCH 369/667] Add model 2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en --- ...inetuned_sayula_popoluca_accelerate2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en.md new file mode 100644 index 00000000000000..01d8f1e3d09217 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2 BertForTokenClassification from Deborah +author: John Snow Labs +name: bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2` is a English model originally trained by Deborah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en_5.2.0_3.0_1699303981929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2_en_5.2.0_3.0_1699303981929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Deborah/bertimbau-finetuned-pos-accelerate2 \ No newline at end of file From b3b76df478a978b778978ee55e9969078784b940 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:55:21 +0700 Subject: [PATCH 370/667] Add model 2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en --- ...finetuned_sayula_popoluca_accelerate_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en.md new file mode 100644 index 00000000000000..bdc92ea860030b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate BertForTokenClassification from Deborah +author: John Snow Labs +name: bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate` is a English model originally trained by Deborah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en_5.2.0_3.0_1699304519838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate_en_5.2.0_3.0_1699304519838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_bertimbau_finetuned_sayula_popoluca_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Deborah/bertimbau-finetuned-pos-accelerate \ No newline at end of file From c1f80997bf4fa1527538149c280255481994d923 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:56:21 +0700 Subject: [PATCH 371/667] Add model 2023-11-06-bert_token_classifier_satellite_instrument_ner_pt --- ..._classifier_satellite_instrument_ner_pt.md | 102 ++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_satellite_instrument_ner_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_satellite_instrument_ner_pt.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_satellite_instrument_ner_pt.md new file mode 100644 index 00000000000000..c3d434d14ebf07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_satellite_instrument_ner_pt.md @@ -0,0 +1,102 @@ +--- +layout: model +title: Portuguese BertForTokenClassification Cased model (from m-lin20) +author: John Snow Labs +name: bert_token_classifier_satellite_instrument_ner +date: 2023-11-06 +tags: [pt, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `satellite-instrument-bert-NER` is a Portuguese model originally trained by `m-lin20`. + +## Predicted Entities + +`instrument`, `satellite` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_satellite_instrument_ner_pt_5.2.0_3.0_1699303598163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_satellite_instrument_ner_pt_5.2.0_3.0_1699303598163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_satellite_instrument_ner","pt") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_satellite_instrument_ner","pt") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_satellite_instrument_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|1.2 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/m-lin20/satellite-instrument-bert-NER +- https://github.com/THU-EarthInformationScienceLab/Satellite-Instrument-NER +- https://www.tandfonline.com/doi/full/10.1080/17538947.2022.2107098 \ No newline at end of file From 9e4e1ed6ad7fa618214378b24673f42a59e24878 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:57:21 +0700 Subject: [PATCH 372/667] Add model 2023-11-06-bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en --- ...utonlp_tele_nepal_bhasa_5k_557515810_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en.md new file mode 100644 index 00000000000000..97c59fb75dfa31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_autonlp_tele_nepal_bhasa_5k_557515810 BertForTokenClassification from kSaluja +author: John Snow Labs +name: bert_ner_autonlp_tele_nepal_bhasa_5k_557515810 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_autonlp_tele_nepal_bhasa_5k_557515810` is a English model originally trained by kSaluja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en_5.2.0_3.0_1699283291210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_autonlp_tele_nepal_bhasa_5k_557515810_en_5.2.0_3.0_1699283291210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_autonlp_tele_nepal_bhasa_5k_557515810","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_autonlp_tele_nepal_bhasa_5k_557515810", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_autonlp_tele_nepal_bhasa_5k_557515810| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kSaluja/autonlp-tele_new_5k-557515810 \ No newline at end of file From 7af04457786af4a649532f9785a54699165f6940 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:58:22 +0700 Subject: [PATCH 373/667] Add model 2023-11-06-bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en --- ...bionlp13cg_chem_imbalancedpubmedbert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en.md new file mode 100644 index 00000000000000..82fab6c08e15ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_bionlp13cg_chem_imbalancedpubmedbert BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_bionlp13cg_chem_imbalancedpubmedbert +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_bionlp13cg_chem_imbalancedpubmedbert` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en_5.2.0_3.0_1699274624854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bionlp13cg_chem_imbalancedpubmedbert_en_5.2.0_3.0_1699274624854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bionlp13cg_chem_imbalancedpubmedbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_bionlp13cg_chem_imbalancedpubmedbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bionlp13cg_chem_imbalancedpubmedbert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ghadeermobasher/BioNLP13CG-Chem_ImbalancedPubMedBERT \ No newline at end of file From 1473f3118fbb5ad34b7644674d62d3672d22b0c8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 05:59:22 +0700 Subject: [PATCH 374/667] Add model 2023-11-06-bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en --- ...tag_model_10000_9_16_more_ingredient_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en.md new file mode 100644 index 00000000000000..56b023b5d3287f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Media1129) +author: John Snow Labs +name: bert_ner_keyword_tag_model_10000_9_16_more_ingredient +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyword-tag-model-10000-9-16_more_ingredient` is a English model originally trained by `Media1129`. + +## Predicted Entities + +`negingredient`, `occasion`, `mealcourse`, `cuisines`, `ingredient` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en_5.2.0_3.0_1699293647025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_keyword_tag_model_10000_9_16_more_ingredient_en_5.2.0_3.0_1699293647025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_10000_9_16_more_ingredient","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_keyword_tag_model_10000_9_16_more_ingredient","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.ingredient.").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_keyword_tag_model_10000_9_16_more_ingredient| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Media1129/keyword-tag-model-10000-9-16_more_ingredient \ No newline at end of file From 2c7f0ea24ae949e18681375f82ea26cc5d72dad5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:00:22 +0700 Subject: [PATCH 375/667] Add model 2023-11-06-bert_ner_tinybert_spanish_uncased_finetuned_ner_es --- ...nybert_spanish_uncased_finetuned_ner_es.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_spanish_uncased_finetuned_ner_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_spanish_uncased_finetuned_ner_es.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_spanish_uncased_finetuned_ner_es.md new file mode 100644 index 00000000000000..4db752355ba7f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_tinybert_spanish_uncased_finetuned_ner_es.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Castilian, Spanish bert_ner_tinybert_spanish_uncased_finetuned_ner BertForTokenClassification from mrm8488 +author: John Snow Labs +name: bert_ner_tinybert_spanish_uncased_finetuned_ner +date: 2023-11-06 +tags: [bert, es, open_source, token_classification, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_tinybert_spanish_uncased_finetuned_ner` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_tinybert_spanish_uncased_finetuned_ner_es_5.2.0_3.0_1699283940718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_tinybert_spanish_uncased_finetuned_ner_es_5.2.0_3.0_1699283940718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_tinybert_spanish_uncased_finetuned_ner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_tinybert_spanish_uncased_finetuned_ner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_tinybert_spanish_uncased_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|54.3 MB| + +## References + +https://huggingface.co/mrm8488/TinyBERT-spanish-uncased-finetuned-ner \ No newline at end of file From b93455a611c3b6462f09b8c77435ae90121f9749 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:01:22 +0700 Subject: [PATCH 376/667] Add model 2023-11-06-bert_token_classifier_base_chinese_sayula_popoluca_zh --- ...ssifier_base_chinese_sayula_popoluca_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_sayula_popoluca_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_sayula_popoluca_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_sayula_popoluca_zh.md new file mode 100644 index 00000000000000..4b443c578f3a45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_sayula_popoluca_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_token_classifier_base_chinese_sayula_popoluca BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_token_classifier_base_chinese_sayula_popoluca +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_chinese_sayula_popoluca` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_chinese_sayula_popoluca_zh_5.2.0_3.0_1699311615246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_chinese_sayula_popoluca_zh_5.2.0_3.0_1699311615246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_chinese_sayula_popoluca","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_chinese_sayula_popoluca", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_chinese_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/ckiplab/bert-base-chinese-pos \ No newline at end of file From 1785c4e433e4806dcce3dc54a70e5a2ab002e359 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:02:23 +0700 Subject: [PATCH 377/667] Add model 2023-11-06-bert_ner_shivanand_wikineural_multilingual_ner_en --- ...hivanand_wikineural_multilingual_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_shivanand_wikineural_multilingual_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_shivanand_wikineural_multilingual_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_shivanand_wikineural_multilingual_ner_en.md new file mode 100644 index 00000000000000..83c8fa9bf6bd19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_shivanand_wikineural_multilingual_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_shivanand_wikineural_multilingual_ner BertForTokenClassification from Shivanand +author: John Snow Labs +name: bert_ner_shivanand_wikineural_multilingual_ner +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_shivanand_wikineural_multilingual_ner` is a English model originally trained by Shivanand. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_shivanand_wikineural_multilingual_ner_en_5.2.0_3.0_1699282400699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_shivanand_wikineural_multilingual_ner_en_5.2.0_3.0_1699282400699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_shivanand_wikineural_multilingual_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_shivanand_wikineural_multilingual_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_shivanand_wikineural_multilingual_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Shivanand/wikineural-multilingual-ner \ No newline at end of file From 09ae2070f8f73ef9f40bf4dd4f0a7934bac6a3bc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:03:24 +0700 Subject: [PATCH 378/667] Add model 2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_en --- ...ner_original_scibert_bc5cdr_chemical_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_en.md new file mode 100644 index 00000000000000..d7e2e59f3d093c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_original_scibert_bc5cdr_chemical_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_original_scibert_bc5cdr_chemical BertForTokenClassification from ghadeermobasher +author: John Snow Labs +name: bert_ner_original_scibert_bc5cdr_chemical +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_original_scibert_bc5cdr_chemical` is a English model originally trained by ghadeermobasher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc5cdr_chemical_en_5.2.0_3.0_1699282021654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_original_scibert_bc5cdr_chemical_en_5.2.0_3.0_1699282021654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_original_scibert_bc5cdr_chemical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_original_scibert_bc5cdr_chemical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_original_scibert_bc5cdr_chemical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/ghadeermobasher/Original-SciBERT-BC5CDR-Chemical \ No newline at end of file From ffd68608bba9073e272fc5c98860d610e1d57921 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:04:23 +0700 Subject: [PATCH 379/667] Add model 2023-11-06-bert_ner_bert_keyword_extractor_en --- ...1-06-bert_ner_bert_keyword_extractor_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_keyword_extractor_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_keyword_extractor_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_keyword_extractor_en.md new file mode 100644 index 00000000000000..63fa0f0653fab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_bert_keyword_extractor_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from yanekyuk) +author: John Snow Labs +name: bert_ner_bert_keyword_extractor +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-keyword-extractor` is a English model originally trained by `yanekyuk`. + +## Predicted Entities + +`KEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_bert_keyword_extractor_en_5.2.0_3.0_1699286944350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_bert_keyword_extractor_en_5.2.0_3.0_1699286944350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_keyword_extractor","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_bert_keyword_extractor","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_yanekyuk").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_bert_keyword_extractor| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/yanekyuk/bert-keyword-extractor \ No newline at end of file From c12522d5748091a6457ec773c5b8c15b09e9bddb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:05:23 +0700 Subject: [PATCH 380/667] Add model 2023-11-06-bert_ner_winson_bert_finetuned_ner_accelerate_en --- ...winson_bert_finetuned_ner_accelerate_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_winson_bert_finetuned_ner_accelerate_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_winson_bert_finetuned_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_winson_bert_finetuned_ner_accelerate_en.md new file mode 100644 index 00000000000000..b6b2396e207278 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_winson_bert_finetuned_ner_accelerate_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from winson) +author: John Snow Labs +name: bert_ner_winson_bert_finetuned_ner_accelerate +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner-accelerate` is a English model originally trained by `winson`. + +## Predicted Entities + +`LOC`, `PER`, `ORG`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_winson_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699296108481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_winson_bert_finetuned_ner_accelerate_en_5.2.0_3.0_1699296108481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_winson_bert_finetuned_ner_accelerate","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") ++ +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_winson_bert_finetuned_ner_accelerate","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.finetuned.by_winson").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_winson_bert_finetuned_ner_accelerate| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/winson/bert-finetuned-ner-accelerate \ No newline at end of file From 7142a830439d00940a71888c6ff7fe7d40f61122 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:06:23 +0700 Subject: [PATCH 381/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_focal_ckpt_en --- ...bert_sayula_popoluca_tiny_focal_ckpt_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_ckpt_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_ckpt_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_ckpt_en.md new file mode 100644 index 00000000000000..8389c50a94e931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_ckpt_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_focal_ckpt BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_focal_ckpt +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_focal_ckpt` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_ckpt_en_5.2.0_3.0_1699311196571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_ckpt_en_5.2.0_3.0_1699311196571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_focal_ckpt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_focal_ckpt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_focal_ckpt| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_focal_ckpt \ No newline at end of file From 4589f048b391022c10f727aec9ab0408e713cbf9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:07:24 +0700 Subject: [PATCH 382/667] Add model 2023-11-06-bert_ner_xkang_bert_finetuned_ner_en --- ...06-bert_ner_xkang_bert_finetuned_ner_en.md | 115 ++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..d2f8d9ff78628d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_xkang_bert_finetuned_ner_en.md @@ -0,0 +1,115 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from xkang) +author: John Snow Labs +name: bert_ner_xkang_bert_finetuned_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-finetuned-ner` is a English model originally trained by `xkang`. + +## Predicted Entities + +`ORG`, `LOC`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_xkang_bert_finetuned_ner_en_5.2.0_3.0_1699301262858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_xkang_bert_finetuned_ner_en_5.2.0_3.0_1699301262858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_xkang_bert_finetuned_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_xkang_bert_finetuned_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.conll.finetuned.by_xkang").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_xkang_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/xkang/bert-finetuned-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file From 3c69ef1f0d8592a26a0ccd772b3e9987ce20eed2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:08:24 +0700 Subject: [PATCH 383/667] Add model 2023-11-06-bert_ner_agro_ner_en --- .../2023-11-06-bert_ner_agro_ner_en.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_agro_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_agro_ner_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_agro_ner_en.md new file mode 100644 index 00000000000000..b1226bb611abaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_agro_ner_en.md @@ -0,0 +1,114 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from gauravnuti) +author: John Snow Labs +name: bert_ner_agro_ner +date: 2023-11-06 +tags: [bert, ner, open_source, en, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `agro-ner` is a English model originally trained by `gauravnuti`. + +## Predicted Entities + +`ITEM`, `REGION`, `METRIC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_agro_ner_en_5.2.0_3.0_1699283913171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_agro_ner_en_5.2.0_3.0_1699283913171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_agro_ner","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_ner_agro_ner","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.bert.by_gauravnuti").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_agro_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/gauravnuti/agro-ner \ No newline at end of file From 029689e0c600eeacbad9e4b1a4823f9d0f44851b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:09:24 +0700 Subject: [PATCH 384/667] Add model 2023-11-06-bert_ner_mbert_base_uncased_nigerian_pidgin_pcm --- ..._mbert_base_uncased_nigerian_pidgin_pcm.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_nigerian_pidgin_pcm.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_nigerian_pidgin_pcm.md b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_nigerian_pidgin_pcm.md new file mode 100644 index 00000000000000..f4366b7e6a5681 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_ner_mbert_base_uncased_nigerian_pidgin_pcm.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Nigerian Pidgin bert_ner_mbert_base_uncased_nigerian_pidgin BertForTokenClassification from arnolfokam +author: John Snow Labs +name: bert_ner_mbert_base_uncased_nigerian_pidgin +date: 2023-11-06 +tags: [bert, pcm, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pcm +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_mbert_base_uncased_nigerian_pidgin` is a Nigerian Pidgin model originally trained by arnolfokam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_nigerian_pidgin_pcm_5.2.0_3.0_1699297500464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_mbert_base_uncased_nigerian_pidgin_pcm_5.2.0_3.0_1699297500464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_mbert_base_uncased_nigerian_pidgin","pcm") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_mbert_base_uncased_nigerian_pidgin", "pcm") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_mbert_base_uncased_nigerian_pidgin| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pcm| +|Size:|665.1 MB| + +## References + +https://huggingface.co/arnolfokam/mbert-base-uncased-pcm \ No newline at end of file From ca41f8d6b7acebecb91f00434e83d61e948a6cee Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:10:25 +0700 Subject: [PATCH 385/667] Add model 2023-11-06-bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq --- ...lingual_cased_finetuned_albanian_ner_sq.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq.md new file mode 100644 index 00000000000000..3e51978ec2231b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Albanian BertForTokenClassification Base Cased model (from Kushtrim) +author: John Snow Labs +name: bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner +date: 2023-11-06 +tags: [sq, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: sq +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-multilingual-cased-finetuned-albanian-ner` is a Albanian model originally trained by `Kushtrim`. + +## Predicted Entities + +`LOC`, `ORG`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq_5.2.0_3.0_1699303539654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner_sq_5.2.0_3.0_1699303539654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner","sq") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner","sq") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_multilingual_cased_finetuned_albanian_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sq| +|Size:|665.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Kushtrim/bert-base-multilingual-cased-finetuned-albanian-ner \ No newline at end of file From ff3782e6dbef45c60664183e4aa9768e3fae1548 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:11:25 +0700 Subject: [PATCH 386/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_focal_alpah_en --- ...ert_sayula_popoluca_tiny_focal_alpah_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah_en.md new file mode 100644 index 00000000000000..416c053b4a476f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_alpah_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_focal_alpah BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_focal_alpah +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_focal_alpah` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_alpah_en_5.2.0_3.0_1699310254283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_alpah_en_5.2.0_3.0_1699310254283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_focal_alpah","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_focal_alpah", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_focal_alpah| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_focal_alpah \ No newline at end of file From 058c4e770a2eeec5f716f4818a3ddab61b53c66b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:12:25 +0700 Subject: [PATCH 387/667] Add model 2023-11-06-vietnamese_ner_v1_4_0a2_vi --- .../2023-11-06-vietnamese_ner_v1_4_0a2_vi.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-vietnamese_ner_v1_4_0a2_vi.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-vietnamese_ner_v1_4_0a2_vi.md b/docs/_posts/ahmedlone127/2023-11-06-vietnamese_ner_v1_4_0a2_vi.md new file mode 100644 index 00000000000000..0bcadcac4d3785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-vietnamese_ner_v1_4_0a2_vi.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Vietnamese vietnamese_ner_v1_4_0a2 BertForTokenClassification from undertheseanlp +author: John Snow Labs +name: vietnamese_ner_v1_4_0a2 +date: 2023-11-06 +tags: [bert, vi, open_source, token_classification, onnx] +task: Named Entity Recognition +language: vi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vietnamese_ner_v1_4_0a2` is a Vietnamese model originally trained by undertheseanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vietnamese_ner_v1_4_0a2_vi_5.2.0_3.0_1699312310697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vietnamese_ner_v1_4_0a2_vi_5.2.0_3.0_1699312310697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("vietnamese_ner_v1_4_0a2","vi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("vietnamese_ner_v1_4_0a2", "vi") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vietnamese_ner_v1_4_0a2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|vi| +|Size:|428.8 MB| + +## References + +https://huggingface.co/undertheseanlp/vietnamese-ner-v1.4.0a2 \ No newline at end of file From 974e2a7cac45f3851f6900c5d6258e8a40ce27c7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:17:06 +0700 Subject: [PATCH 388/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_focal_v2_label_en --- ..._sayula_popoluca_tiny_focal_v2_label_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v2_label_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v2_label_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v2_label_en.md new file mode 100644 index 00000000000000..d0a11d16e166ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v2_label_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_focal_v2_label BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_focal_v2_label +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_focal_v2_label` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_v2_label_en_5.2.0_3.0_1699312620695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_v2_label_en_5.2.0_3.0_1699312620695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_focal_v2_label","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_focal_v2_label", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_focal_v2_label| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_focal_v2_label \ No newline at end of file From 08846e4044a54f63f343f75148f79512fcaacb20 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:22:52 +0700 Subject: [PATCH 389/667] Add model 2023-11-06-bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh --- ..._han_chinese_sayula_popoluca_zhonggu_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh.md new file mode 100644 index 00000000000000..60a99832522a88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu +date: 2023-11-06 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh_5.2.0_3.0_1699312965297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu_zh_5.2.0_3.0_1699312965297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_sayula_popoluca_zhonggu| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.7 MB| + +## References + +https://huggingface.co/ckiplab/bert-base-han-chinese-pos-zhonggu \ No newline at end of file From 5689803a4dd3b769aebe79246728242cb0a9e13f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:32:10 +0700 Subject: [PATCH 390/667] Add model 2023-11-06-bert_sayula_popoluca_tiny_focal_v3_en --- ...6-bert_sayula_popoluca_tiny_focal_v3_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v3_en.md b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v3_en.md new file mode 100644 index 00000000000000..0a135dd93bd29e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_sayula_popoluca_tiny_focal_v3_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_sayula_popoluca_tiny_focal_v3 BertForTokenClassification from kktoto +author: John Snow Labs +name: bert_sayula_popoluca_tiny_focal_v3 +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sayula_popoluca_tiny_focal_v3` is a English model originally trained by kktoto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_v3_en_5.2.0_3.0_1699313528503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sayula_popoluca_tiny_focal_v3_en_5.2.0_3.0_1699313528503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_sayula_popoluca_tiny_focal_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_sayula_popoluca_tiny_focal_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sayula_popoluca_tiny_focal_v3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|42.7 MB| + +## References + +https://huggingface.co/kktoto/tiny_focal_v3 \ No newline at end of file From a8c75dfdc7eef8a46199b86f67bbacd4b88c067a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:41:05 +0700 Subject: [PATCH 391/667] Add model 2023-11-06-bent_pubmedbert_ner_disease_en --- ...23-11-06-bent_pubmedbert_ner_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_disease_en.md b/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_disease_en.md new file mode 100644 index 00000000000000..86932b12de7d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_disease BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_disease +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_disease` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_disease_en_5.2.0_3.0_1699314054888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_disease_en_5.2.0_3.0_1699314054888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Disease \ No newline at end of file From c1609a0241f7671e113ba517ec91463cdd452e8e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:42:05 +0700 Subject: [PATCH 392/667] Add model 2023-11-06-bent_pubmedbert_ner_chemical_en --- ...3-11-06-bent_pubmedbert_ner_chemical_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_chemical_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_chemical_en.md b/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_chemical_en.md new file mode 100644 index 00000000000000..e558fa336232b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bent_pubmedbert_ner_chemical_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_chemical BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_chemical +date: 2023-11-06 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_chemical` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_chemical_en_5.2.0_3.0_1699314054977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_chemical_en_5.2.0_3.0_1699314054977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_chemical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_chemical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_chemical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Chemical \ No newline at end of file From 87441b23c45c6e3bb9596f8c0f9bb6fefe55a713 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:45:44 +0700 Subject: [PATCH 393/667] Add model 2023-11-06-bert_token_classifier_base_han_chinese_ws_jindai_zh --- ...lassifier_base_han_chinese_ws_jindai_zh.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_jindai_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_jindai_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_jindai_zh.md new file mode 100644 index 00000000000000..9ac15f78f98386 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_han_chinese_ws_jindai_zh.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_ws_jindai +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-han-chinese-ws-jindai` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`B`, `I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_jindai_zh_5.2.0_3.0_1699314337007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_jindai_zh_5.2.0_3.0_1699314337007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_jindai","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_jindai","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_ws_jindai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-han-chinese-ws-jindai +- https://github.com/ckiplab/han-transformers \ No newline at end of file From e8c92964935c101848a9e36ff428f021aa70844c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 06:49:34 +0700 Subject: [PATCH 394/667] Add model 2023-11-06-bert_token_classifier_base_chinese_ws_zh --- ...ert_token_classifier_base_chinese_ws_zh.md | 105 ++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ws_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ws_zh.md b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ws_zh.md new file mode 100644 index 00000000000000..9b7c728b8c3cdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-06-bert_token_classifier_base_chinese_ws_zh.md @@ -0,0 +1,105 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_chinese_ws +date: 2023-11-06 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-chinese-ws` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`B`, `I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_chinese_ws_zh_5.2.0_3.0_1699314567003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_chinese_ws_zh_5.2.0_3.0_1699314567003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_chinese_ws","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_chinese_ws","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_chinese_ws| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-chinese-ws +- https://github.com/ckiplab/ckip-transformers +- https://muyang.pro +- https://ckip.iis.sinica.edu.tw +- https://github.com/ckiplab/ckip-transformers +- https://github.com/ckiplab/ckip-transformers \ No newline at end of file From b1193cb38dde81be5556d16df63f578e564af6e2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:07:43 +0700 Subject: [PATCH 395/667] Add model 2023-11-07-gilbert_en --- .../ahmedlone127/2023-11-07-gilbert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-gilbert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-gilbert_en.md b/docs/_posts/ahmedlone127/2023-11-07-gilbert_en.md new file mode 100644 index 00000000000000..bfcb510b766d20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-gilbert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English gilbert BertForTokenClassification from rajpurkarlab +author: John Snow Labs +name: gilbert +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gilbert` is a English model originally trained by rajpurkarlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gilbert_en_5.2.0_3.0_1699315652248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gilbert_en_5.2.0_3.0_1699315652248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("gilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("gilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gilbert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/rajpurkarlab/gilbert \ No newline at end of file From bbae686ec12af85cf3d7f81c8b15e5774421aa95 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:08:43 +0700 Subject: [PATCH 396/667] Add model 2023-11-07-bent_pubmedbert_ner_bioprocess_en --- ...11-07-bent_pubmedbert_ner_bioprocess_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_bioprocess_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_bioprocess_en.md b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_bioprocess_en.md new file mode 100644 index 00000000000000..998aa5840b03a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_bioprocess_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_bioprocess BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_bioprocess +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_bioprocess` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_bioprocess_en_5.2.0_3.0_1699315652198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_bioprocess_en_5.2.0_3.0_1699315652198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_bioprocess","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_bioprocess", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_bioprocess| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Bioprocess \ No newline at end of file From 7440ecdd563d13cfd2ca869afa04911a7514aaf8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:23:52 +0700 Subject: [PATCH 397/667] Add model 2023-11-07-bert_token_classifier_base_swedish_cased_ner_sv --- ...en_classifier_base_swedish_cased_ner_sv.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_swedish_cased_ner_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_swedish_cased_ner_sv.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_swedish_cased_ner_sv.md new file mode 100644 index 00000000000000..26dd104b556a71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_swedish_cased_ner_sv.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Swedish BertForTokenClassification Base Cased model (from KBLab) +author: John Snow Labs +name: bert_token_classifier_base_swedish_cased_ner +date: 2023-11-07 +tags: [sv, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-swedish-cased-ner` is a Swedish model originally trained by `KBLab`. + +## Predicted Entities + +`PER`, `LOC`, `TME`, `WRK`, `PRS/WRK`, `LOC/ORG`, `MSR`, `ORG`, `OBJ/ORG`, `ORG/PRS`, `OBJ`, `LOC/PRS`, `EVN` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_swedish_cased_ner_sv_5.2.0_3.0_1699316623845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_swedish_cased_ner_sv_5.2.0_3.0_1699316623845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_swedish_cased_ner","sv") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_swedish_cased_ner","sv") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_swedish_cased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/KBLab/bert-base-swedish-cased-ner \ No newline at end of file From 9b8fd6d7470be356c81580a9efbda912374509f2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:29:41 +0700 Subject: [PATCH 398/667] Add model 2023-11-07-bent_pubmedbert_ner_variant_en --- ...23-11-07-bent_pubmedbert_ner_variant_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_variant_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_variant_en.md b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_variant_en.md new file mode 100644 index 00000000000000..60c20acd113568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_variant_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_variant BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_variant +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_variant` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_variant_en_5.2.0_3.0_1699316972395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_variant_en_5.2.0_3.0_1699316972395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_variant","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_variant", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_variant| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMEdBERT-NER-Variant \ No newline at end of file From f3697e46ebe7d1d6daf84544daeab78862a131e0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:30:42 +0700 Subject: [PATCH 399/667] Add model 2023-11-07-bert_token_classifier_base_han_chinese_ws_zhonggu_zh --- ...assifier_base_han_chinese_ws_zhonggu_zh.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_ws_zhonggu_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_ws_zhonggu_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_ws_zhonggu_zh.md new file mode 100644 index 00000000000000..ba80ea90e29450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_ws_zhonggu_zh.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Chinese BertForTokenClassification Base Cased model (from ckiplab) +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_ws_zhonggu +date: 2023-11-07 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-base-han-chinese-ws-zhonggu` is a Chinese model originally trained by `ckiplab`. + +## Predicted Entities + +`B`, `I` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_zhonggu_zh_5.2.0_3.0_1699316982060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_ws_zhonggu_zh_5.2.0_3.0_1699316982060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_zhonggu","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_ws_zhonggu","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_ws_zhonggu| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/ckiplab/bert-base-han-chinese-ws-zhonggu +- https://github.com/ckiplab/han-transformers \ No newline at end of file From ffdfe1bc6b3e2bf848df7426138791fb2cb09224 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:31:42 +0700 Subject: [PATCH 400/667] Add model 2023-11-07-phibert_finetuned_ner_girinlp_i2i_en --- ...07-phibert_finetuned_ner_girinlp_i2i_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-phibert_finetuned_ner_girinlp_i2i_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-phibert_finetuned_ner_girinlp_i2i_en.md b/docs/_posts/ahmedlone127/2023-11-07-phibert_finetuned_ner_girinlp_i2i_en.md new file mode 100644 index 00000000000000..c3f8c7fba04fb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-phibert_finetuned_ner_girinlp_i2i_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English phibert_finetuned_ner_girinlp_i2i BertForTokenClassification from girinlp-i2i +author: John Snow Labs +name: phibert_finetuned_ner_girinlp_i2i +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phibert_finetuned_ner_girinlp_i2i` is a English model originally trained by girinlp-i2i. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phibert_finetuned_ner_girinlp_i2i_en_5.2.0_3.0_1699316986375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phibert_finetuned_ner_girinlp_i2i_en_5.2.0_3.0_1699316986375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("phibert_finetuned_ner_girinlp_i2i","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("phibert_finetuned_ner_girinlp_i2i", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phibert_finetuned_ner_girinlp_i2i| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.2 MB| + +## References + +https://huggingface.co/girinlp-i2i/phibert-finetuned-ner \ No newline at end of file From 70f96eb665be8cc9a3d20e33f90b93928b623df0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:40:42 +0700 Subject: [PATCH 401/667] Add model 2023-11-07-bert_token_classifier_arabic_ner_ar --- ...-07-bert_token_classifier_arabic_ner_ar.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_arabic_ner_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_arabic_ner_ar.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_arabic_ner_ar.md new file mode 100644 index 00000000000000..0aa97b36f1e68b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_arabic_ner_ar.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Arabic BertForTokenClassification Cased model (from hatmimoha) +author: John Snow Labs +name: bert_token_classifier_arabic_ner +date: 2023-11-07 +tags: [ar, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `arabic-ner` is a Arabic model originally trained by `hatmimoha`. + +## Predicted Entities + +`PRODUCT`, `COMPETITION`, `DATE`, `LOCATION`, `PERSON`, `ORGANIZATION`, `DISEASE`, `PRICE`, `EVENT` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_arabic_ner_ar_5.2.0_3.0_1699317634318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_arabic_ner_ar_5.2.0_3.0_1699317634318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_arabic_ner","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_arabic_ner","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_arabic_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|412.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/hatmimoha/arabic-ner +- https://github.com/hatmimoha/arabic-ner \ No newline at end of file From 3d3f6d384cc787624bd66e35faeea7d18fed20b5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:53:15 +0700 Subject: [PATCH 402/667] Add model 2023-11-07-bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en --- ...ssification_for_atc_english_uwb_atcc_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en.md new file mode 100644 index 00000000000000..eef00480a7c7c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc BertForTokenClassification from Jzuluaga +author: John Snow Labs +name: bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc` is a English model originally trained by Jzuluaga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en_5.2.0_3.0_1699318386290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc_en_5.2.0_3.0_1699318386290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_token_classification_for_atc_english_uwb_atcc| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jzuluaga/bert-base-token-classification-for-atc-en-uwb-atcc \ No newline at end of file From a2ca7c6e80f61c71aad7e839378a49ba686ed59f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:55:22 +0700 Subject: [PATCH 403/667] Add model 2023-11-07-drbert_casm2_fr --- .../2023-11-07-drbert_casm2_fr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-drbert_casm2_fr.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-drbert_casm2_fr.md b/docs/_posts/ahmedlone127/2023-11-07-drbert_casm2_fr.md new file mode 100644 index 00000000000000..d8d990d5f63f29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-drbert_casm2_fr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: French drbert_casm2 BertForTokenClassification from camila-ud +author: John Snow Labs +name: drbert_casm2 +date: 2023-11-07 +tags: [bert, fr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: fr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`drbert_casm2` is a French model originally trained by camila-ud. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/drbert_casm2_fr_5.2.0_3.0_1699318515122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/drbert_casm2_fr_5.2.0_3.0_1699318515122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("drbert_casm2","fr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("drbert_casm2", "fr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|drbert_casm2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|fr| +|Size:|408.2 MB| + +## References + +https://huggingface.co/camila-ud/DrBERT-CASM2 \ No newline at end of file From 7834b5b3195ef857e1a7b98f183330f536061afc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 07:57:22 +0700 Subject: [PATCH 404/667] Add model 2023-11-07-clinicalnerpt_disorder_pt --- .../2023-11-07-clinicalnerpt_disorder_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disorder_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disorder_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disorder_pt.md new file mode 100644 index 00000000000000..e6ac7305d90e61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disorder_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_disorder BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_disorder +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_disorder` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_disorder_pt_5.2.0_3.0_1699318631290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_disorder_pt_5.2.0_3.0_1699318631290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_disorder","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_disorder", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_disorder| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-disorder \ No newline at end of file From 8c2a886a02e2a4bf77b00821904d4160beef7c66 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 08:06:04 +0700 Subject: [PATCH 405/667] Add model 2023-11-07-bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en --- ...autotrain_oms_ner_bislama_1044135953_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en.md new file mode 100644 index 00000000000000..7bc982877f571e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_token_classifier_autotrain_oms_ner_bislama_1044135953 BertForTokenClassification from danielmantisnlp +author: John Snow Labs +name: bert_token_classifier_autotrain_oms_ner_bislama_1044135953 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_autotrain_oms_ner_bislama_1044135953` is a English model originally trained by danielmantisnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en_5.2.0_3.0_1699319157111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_autotrain_oms_ner_bislama_1044135953_en_5.2.0_3.0_1699319157111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_autotrain_oms_ner_bislama_1044135953","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_autotrain_oms_ner_bislama_1044135953", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_autotrain_oms_ner_bislama_1044135953| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielmantisnlp/autotrain-oms-ner-bi-1044135953 \ No newline at end of file From 0e5226605e6f7d665e165a6ced7df700e43c3360 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 08:28:19 +0700 Subject: [PATCH 406/667] Add model 2023-11-07-clinicalnerpt_diagnostic_pt --- .../2023-11-07-clinicalnerpt_diagnostic_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_diagnostic_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_diagnostic_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_diagnostic_pt.md new file mode 100644 index 00000000000000..42bfe77f3b8fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_diagnostic_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_diagnostic BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_diagnostic +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_diagnostic` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_diagnostic_pt_5.2.0_3.0_1699320480550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_diagnostic_pt_5.2.0_3.0_1699320480550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_diagnostic","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_diagnostic", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_diagnostic| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-diagnostic \ No newline at end of file From b194d9b730701199aaab4ef7c1df5d3bebb33e0b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 08:29:20 +0700 Subject: [PATCH 407/667] Add model 2023-11-07-bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx --- ...ne_tuned_ner_wikineural_multilingual_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx.md new file mode 100644 index 00000000000000..f375089d38cdcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual BertForTokenClassification from DunnBC22 +author: John Snow Labs +name: bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual` is a Multilingual model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx_5.2.0_3.0_1699320480543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual_xx_5.2.0_3.0_1699320480543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_fine_tuned_ner_wikineural_multilingual| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-multilingual-cased-fine_tuned-ner-WikiNeural_Multilingual \ No newline at end of file From bfd4984aeb950a4da47cc6a8c91ebb52e934715f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 08:31:46 +0700 Subject: [PATCH 408/667] Add model 2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh --- ...e_han_chinese_sayula_popoluca_jindai_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh.md new file mode 100644 index 00000000000000..a793b2f78ef819 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_token_classifier_base_han_chinese_sayula_popoluca_jindai BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_sayula_popoluca_jindai +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_han_chinese_sayula_popoluca_jindai` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh_5.2.0_3.0_1699320699279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_jindai_zh_5.2.0_3.0_1699320699279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_jindai","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_jindai", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_sayula_popoluca_jindai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.7 MB| + +## References + +https://huggingface.co/ckiplab/bert-base-han-chinese-pos-jindai \ No newline at end of file From b2d3a38aaccf8508ea22eea470ff06f2a3a0e364 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 08:57:46 +0700 Subject: [PATCH 409/667] Add model 2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh --- ..._han_chinese_sayula_popoluca_shanggu_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh.md new file mode 100644 index 00000000000000..703590249fd79a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh_5.2.0_3.0_1699322259275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu_zh_5.2.0_3.0_1699322259275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_sayula_popoluca_shanggu| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|396.6 MB| + +## References + +https://huggingface.co/ckiplab/bert-base-han-chinese-pos-shanggu \ No newline at end of file From abd7b43910fdbe6bcd3ab77fe8d4e24fcfa961f5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 09:22:11 +0700 Subject: [PATCH 410/667] Add model 2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh --- ..._han_chinese_sayula_popoluca_xiandai_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh.md new file mode 100644 index 00000000000000..0161dd2ec7db00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh_5.2.0_3.0_1699323722187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai_zh_5.2.0_3.0_1699323722187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_base_han_chinese_sayula_popoluca_xiandai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|395.6 MB| + +## References + +https://huggingface.co/ckiplab/bert-base-han-chinese-pos-xiandai \ No newline at end of file From 0d9f1eca33f95dc74c3aca8f6ce06864729b20a3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 09:26:26 +0700 Subject: [PATCH 411/667] Add model 2023-11-07-ncbi_bc5cdr_disease_en --- .../2023-11-07-ncbi_bc5cdr_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ncbi_bc5cdr_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ncbi_bc5cdr_disease_en.md b/docs/_posts/ahmedlone127/2023-11-07-ncbi_bc5cdr_disease_en.md new file mode 100644 index 00000000000000..9b8c3982f6658e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ncbi_bc5cdr_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ncbi_bc5cdr_disease BertForTokenClassification from datummd +author: John Snow Labs +name: ncbi_bc5cdr_disease +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ncbi_bc5cdr_disease` is a English model originally trained by datummd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ncbi_bc5cdr_disease_en_5.2.0_3.0_1699323955872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ncbi_bc5cdr_disease_en_5.2.0_3.0_1699323955872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ncbi_bc5cdr_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ncbi_bc5cdr_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ncbi_bc5cdr_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/datummd/NCBI_BC5CDR_disease \ No newline at end of file From beead3dec4606ade7b031b2fbbf6ce9a4c75b957 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 09:31:46 +0700 Subject: [PATCH 412/667] Add model 2023-11-07-bert_base_multilingual_cased_finetuned_sayula_popoluca_xx --- ...gual_cased_finetuned_sayula_popoluca_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_sayula_popoluca_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_sayula_popoluca_xx.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_sayula_popoluca_xx.md new file mode 100644 index 00000000000000..955fee9e14c127 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_sayula_popoluca_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_sayula_popoluca BertForTokenClassification from MayaGalvez +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_sayula_popoluca +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_sayula_popoluca` is a Multilingual model originally trained by MayaGalvez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_sayula_popoluca_xx_5.2.0_3.0_1699324295140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_sayula_popoluca_xx_5.2.0_3.0_1699324295140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_finetuned_sayula_popoluca","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_multilingual_cased_finetuned_sayula_popoluca", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/MayaGalvez/bert-base-multilingual-cased-finetuned-pos \ No newline at end of file From 9f4bc02b8e9fb80eb31d05e328e0aa03ccb2080e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 09:45:11 +0700 Subject: [PATCH 413/667] Add model 2023-11-07-bioformer_8l_ncbi_disease_en --- ...2023-11-07-bioformer_8l_ncbi_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bioformer_8l_ncbi_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bioformer_8l_ncbi_disease_en.md b/docs/_posts/ahmedlone127/2023-11-07-bioformer_8l_ncbi_disease_en.md new file mode 100644 index 00000000000000..bb349619d43ff3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bioformer_8l_ncbi_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bioformer_8l_ncbi_disease BertForTokenClassification from bioformers +author: John Snow Labs +name: bioformer_8l_ncbi_disease +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bioformer_8l_ncbi_disease` is a English model originally trained by bioformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bioformer_8l_ncbi_disease_en_5.2.0_3.0_1699325104597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bioformer_8l_ncbi_disease_en_5.2.0_3.0_1699325104597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bioformer_8l_ncbi_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bioformer_8l_ncbi_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bioformer_8l_ncbi_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|158.5 MB| + +## References + +https://huggingface.co/bioformers/bioformer-8L-ncbi-disease \ No newline at end of file From 39503e3415e90a643fc86e7ca8337b6f3eb973c5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 09:47:44 +0700 Subject: [PATCH 414/667] Add model 2023-11-07-bert_token_classifier_danish_ner_base_da --- ...ert_token_classifier_danish_ner_base_da.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_danish_ner_base_da.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_danish_ner_base_da.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_danish_ner_base_da.md new file mode 100644 index 00000000000000..8f20202106a210 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_danish_ner_base_da.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Danish bert_token_classifier_danish_ner_base BertForTokenClassification from alexandrainst +author: John Snow Labs +name: bert_token_classifier_danish_ner_base +date: 2023-11-07 +tags: [bert, da, open_source, token_classification, onnx] +task: Named Entity Recognition +language: da +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_token_classifier_danish_ner_base` is a Danish model originally trained by alexandrainst. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_danish_ner_base_da_5.2.0_3.0_1699325255783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_danish_ner_base_da_5.2.0_3.0_1699325255783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_danish_ner_base","da") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_token_classifier_danish_ner_base", "da") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_danish_ner_base| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|da| +|Size:|412.3 MB| + +## References + +https://huggingface.co/alexandrainst/da-ner-base \ No newline at end of file From 474c8b59849781450c31a57e5dd7232a3e92d945 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 09:54:46 +0700 Subject: [PATCH 415/667] Add model 2023-11-07-bert_base_cased_finetuned_conll03_english_en --- ...base_cased_finetuned_conll03_english_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_finetuned_conll03_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_finetuned_conll03_english_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_finetuned_conll03_english_en.md new file mode 100644 index 00000000000000..37eb94f1668e0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_finetuned_conll03_english_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_cased_finetuned_conll03_english BertForTokenClassification from dbmdz +author: John Snow Labs +name: bert_base_cased_finetuned_conll03_english +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_conll03_english` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_conll03_english_en_5.2.0_3.0_1699325678056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_conll03_english_en_5.2.0_3.0_1699325678056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_conll03_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_cased_finetuned_conll03_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_conll03_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-cased-finetuned-conll03-english \ No newline at end of file From 98821ea23cb6fff1700ab3390487bcd4edf33f81 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 10:12:36 +0700 Subject: [PATCH 416/667] Add model 2023-11-07-deepct_en --- .../ahmedlone127/2023-11-07-deepct_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-deepct_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-deepct_en.md b/docs/_posts/ahmedlone127/2023-11-07-deepct_en.md new file mode 100644 index 00000000000000..c89265c5468952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-deepct_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English deepct BertForTokenClassification from macavaney +author: John Snow Labs +name: deepct +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepct` is a English model originally trained by macavaney. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepct_en_5.2.0_3.0_1699326749454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepct_en_5.2.0_3.0_1699326749454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("deepct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("deepct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepct| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/macavaney/deepct \ No newline at end of file From f3527df4da7e809f997af89063c81a160fd14e8b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 10:52:40 +0700 Subject: [PATCH 417/667] Add model 2023-11-07-skillner_en --- .../ahmedlone127/2023-11-07-skillner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-skillner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-skillner_en.md b/docs/_posts/ahmedlone127/2023-11-07-skillner_en.md new file mode 100644 index 00000000000000..4e168483732348 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-skillner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English skillner BertForTokenClassification from ihk +author: John Snow Labs +name: skillner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`skillner` is a English model originally trained by ihk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/skillner_en_5.2.0_3.0_1699329137492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/skillner_en_5.2.0_3.0_1699329137492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("skillner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("skillner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|skillner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|402.2 MB| + +## References + +https://huggingface.co/ihk/skillner \ No newline at end of file From ecbc47759252ed2670a034fc5b11b406984c30a2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 11:01:54 +0700 Subject: [PATCH 418/667] Add model 2023-11-07-bert_token_classifier_sentcore_zh --- ...11-07-bert_token_classifier_sentcore_zh.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_sentcore_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_sentcore_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_sentcore_zh.md new file mode 100644 index 00000000000000..b867a44b0f28aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_sentcore_zh.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Chinese BertForTokenClassification Cased model (from theta) +author: John Snow Labs +name: bert_token_classifier_sentcore +date: 2023-11-07 +tags: [zh, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `sentcore` is a Chinese model originally trained by `theta`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_sentcore_zh_5.2.0_3.0_1699329704958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_sentcore_zh_5.2.0_3.0_1699329704958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_sentcore","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_sentcore","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_sentcore| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/theta/sentcore \ No newline at end of file From 73ec8cbb16a4b9af49c9bad8fc5986089de34a6d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 11:09:32 +0700 Subject: [PATCH 419/667] Add model 2023-11-07-bert_token_classifier_berturk_uncased_keyword_discriminator_tr --- ...erturk_uncased_keyword_discriminator_tr.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_berturk_uncased_keyword_discriminator_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_berturk_uncased_keyword_discriminator_tr.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_berturk_uncased_keyword_discriminator_tr.md new file mode 100644 index 00000000000000..82e15693e365eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_berturk_uncased_keyword_discriminator_tr.md @@ -0,0 +1,100 @@ +--- +layout: model +title: Turkish BertForTokenClassification Uncased model (from yanekyuk) +author: John Snow Labs +name: bert_token_classifier_berturk_uncased_keyword_discriminator +date: 2023-11-07 +tags: [tr, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `berturk-uncased-keyword-discriminator` is a Turkish model originally trained by `yanekyuk`. + +## Predicted Entities + +`ENT`, `CON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_berturk_uncased_keyword_discriminator_tr_5.2.0_3.0_1699330162546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_berturk_uncased_keyword_discriminator_tr_5.2.0_3.0_1699330162546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_berturk_uncased_keyword_discriminator","tr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_berturk_uncased_keyword_discriminator","tr") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_berturk_uncased_keyword_discriminator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/yanekyuk/berturk-uncased-keyword-discriminator \ No newline at end of file From 03626ff19b8caeeeb2a85a3a74aded2f781cdf39 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 11:15:06 +0700 Subject: [PATCH 420/667] Add model 2023-11-07-spanish_capitalization_punctuation_restoration_es --- ...pitalization_punctuation_restoration_es.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-spanish_capitalization_punctuation_restoration_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-spanish_capitalization_punctuation_restoration_es.md b/docs/_posts/ahmedlone127/2023-11-07-spanish_capitalization_punctuation_restoration_es.md new file mode 100644 index 00000000000000..a7d2dac1df2613 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-spanish_capitalization_punctuation_restoration_es.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Castilian, Spanish spanish_capitalization_punctuation_restoration BertForTokenClassification from UMUTeam +author: John Snow Labs +name: spanish_capitalization_punctuation_restoration +date: 2023-11-07 +tags: [bert, es, open_source, token_classification, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_capitalization_punctuation_restoration` is a Castilian, Spanish model originally trained by UMUTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_capitalization_punctuation_restoration_es_5.2.0_3.0_1699330499358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_capitalization_punctuation_restoration_es_5.2.0_3.0_1699330499358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("spanish_capitalization_punctuation_restoration","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("spanish_capitalization_punctuation_restoration", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_capitalization_punctuation_restoration| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.6 MB| + +## References + +https://huggingface.co/UMUTeam/spanish_capitalization_punctuation_restoration \ No newline at end of file From 243d77ec27f09593501560556e27a1bd4d03a7aa Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 11:27:38 +0700 Subject: [PATCH 421/667] Add model 2023-11-07-bert_token_classifier_uncased_keyword_extractor_en --- ...classifier_uncased_keyword_extractor_en.md | 98 +++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_uncased_keyword_extractor_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_uncased_keyword_extractor_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_uncased_keyword_extractor_en.md new file mode 100644 index 00000000000000..e2796954f3fc04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_uncased_keyword_extractor_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English BertForTokenClassification Uncased model (from yanekyuk) +author: John Snow Labs +name: bert_token_classifier_uncased_keyword_extractor +date: 2023-11-07 +tags: [en, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-uncased-keyword-extractor` is a English model originally trained by `yanekyuk`. + +## Predicted Entities + +`KEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_uncased_keyword_extractor_en_5.2.0_3.0_1699331251723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_uncased_keyword_extractor_en_5.2.0_3.0_1699331251723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_uncased_keyword_extractor","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_uncased_keyword_extractor","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_uncased_keyword_extractor| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +References + +- https://huggingface.co/yanekyuk/bert-uncased-keyword-extractor \ No newline at end of file From f5258a4447cd9943c28f1df3b5904e54e96702af Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 11:31:24 +0700 Subject: [PATCH 422/667] Add model 2023-11-07-clinicalnerpt_chemical_pt --- .../2023-11-07-clinicalnerpt_chemical_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_chemical_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_chemical_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_chemical_pt.md new file mode 100644 index 00000000000000..f7432e88155fc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_chemical_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_chemical BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_chemical +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_chemical` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_chemical_pt_5.2.0_3.0_1699331473378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_chemical_pt_5.2.0_3.0_1699331473378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_chemical","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_chemical", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_chemical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-chemical \ No newline at end of file From f4f27cba7aa9177dc941158577dd1b1544e5b190 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Tue, 7 Nov 2023 11:37:10 +0700 Subject: [PATCH 423/667] Add model 2023-11-07-bert_base_finnish_uncased_ner_fi --- ...-11-07-bert_base_finnish_uncased_ner_fi.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_finnish_uncased_ner_fi.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_finnish_uncased_ner_fi.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_finnish_uncased_ner_fi.md new file mode 100644 index 00000000000000..97498a8b604790 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_finnish_uncased_ner_fi.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Finnish bert_base_finnish_uncased_ner BertForTokenClassification from iguanodon-ai +author: John Snow Labs +name: bert_base_finnish_uncased_ner +date: 2023-11-07 +tags: [bert, fi, open_source, token_classification, onnx] +task: Named Entity Recognition +language: fi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finnish_uncased_ner` is a Finnish model originally trained by iguanodon-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finnish_uncased_ner_fi_5.2.0_3.0_1699331821322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finnish_uncased_ner_fi_5.2.0_3.0_1699331821322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_finnish_uncased_ner","fi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_finnish_uncased_ner", "fi") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finnish_uncased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|fi| +|Size:|464.7 MB| + +## References + +https://huggingface.co/iguanodon-ai/bert-base-finnish-uncased-ner \ No newline at end of file From 52735467cdfd1806430ee3a4270da9b6434a4a76 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:49:59 +0700 Subject: [PATCH 424/667] Add model 2023-11-07-bert_token_classifier_german_intensifiers_tagging_de --- ...assifier_german_intensifiers_tagging_de.md | 98 +++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_german_intensifiers_tagging_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_german_intensifiers_tagging_de.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_german_intensifiers_tagging_de.md new file mode 100644 index 00000000000000..71b98438a344ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_german_intensifiers_tagging_de.md @@ -0,0 +1,98 @@ +--- +layout: model +title: German BertForTokenClassification Cased model (from TariqYousef) +author: John Snow Labs +name: bert_token_classifier_german_intensifiers_tagging +date: 2023-11-07 +tags: [de, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `german-intensifiers-tagging` is a German model originally trained by `TariqYousef`. + +## Predicted Entities + +`INT` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_german_intensifiers_tagging_de_5.2.0_3.0_1699382987270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_german_intensifiers_tagging_de_5.2.0_3.0_1699382987270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_german_intensifiers_tagging","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_german_intensifiers_tagging","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_german_intensifiers_tagging| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.9 MB| + +## References + +References + +- https://huggingface.co/TariqYousef/german-intensifiers-tagging \ No newline at end of file From 50314f600182f315f4da8675afd93a3001ae43c6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:50:58 +0700 Subject: [PATCH 425/667] Add model 2023-11-07-bert_token_classifier_wg_bert_en --- ...-11-07-bert_token_classifier_wg_bert_en.md | 98 +++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_wg_bert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_wg_bert_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_wg_bert_en.md new file mode 100644 index 00000000000000..72e11eed4e9ec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_wg_bert_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from krishjothi) +author: John Snow Labs +name: bert_token_classifier_wg_bert +date: 2023-11-07 +tags: [en, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `WG_Bert` is a English model originally trained by `krishjothi`. + +## Predicted Entities + +`LOC`, `TYPE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_wg_bert_en_5.2.0_3.0_1699382984736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_wg_bert_en_5.2.0_3.0_1699382984736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_wg_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_wg_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_wg_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +References + +- https://huggingface.co/krishjothi/WG_Bert \ No newline at end of file From eb6446512720ff0144623da62385df331673a74d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:51:59 +0700 Subject: [PATCH 426/667] Add model 2023-11-07-bert_finetuned_unpunctual_text_segmentation_v2_en --- ...uned_unpunctual_text_segmentation_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_unpunctual_text_segmentation_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_unpunctual_text_segmentation_v2_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_unpunctual_text_segmentation_v2_en.md new file mode 100644 index 00000000000000..52d0e8407d724b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_unpunctual_text_segmentation_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_unpunctual_text_segmentation_v2 BertForTokenClassification from TankuVie +author: John Snow Labs +name: bert_finetuned_unpunctual_text_segmentation_v2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_unpunctual_text_segmentation_v2` is a English model originally trained by TankuVie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_unpunctual_text_segmentation_v2_en_5.2.0_3.0_1699382983044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_unpunctual_text_segmentation_v2_en_5.2.0_3.0_1699382983044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_unpunctual_text_segmentation_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_unpunctual_text_segmentation_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_unpunctual_text_segmentation_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.0 MB| + +## References + +https://huggingface.co/TankuVie/bert-finetuned-unpunctual-text-segmentation-v2 \ No newline at end of file From 9809a964bcbe16bc686e79b76a039826b3c9f072 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:52:59 +0700 Subject: [PATCH 427/667] Add model 2023-11-07-scibert_scivocab_uncased_finetuned_ner_jsylee_en --- ...civocab_uncased_finetuned_ner_jsylee_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_finetuned_ner_jsylee_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_finetuned_ner_jsylee_en.md b/docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_finetuned_ner_jsylee_en.md new file mode 100644 index 00000000000000..6d8d7371ca164d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_finetuned_ner_jsylee_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English scibert_scivocab_uncased_finetuned_ner_jsylee BertForTokenClassification from jsylee +author: John Snow Labs +name: scibert_scivocab_uncased_finetuned_ner_jsylee +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_scivocab_uncased_finetuned_ner_jsylee` is a English model originally trained by jsylee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_scivocab_uncased_finetuned_ner_jsylee_en_5.2.0_3.0_1699383047707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_scivocab_uncased_finetuned_ner_jsylee_en_5.2.0_3.0_1699383047707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("scibert_scivocab_uncased_finetuned_ner_jsylee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("scibert_scivocab_uncased_finetuned_ner_jsylee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_scivocab_uncased_finetuned_ner_jsylee| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/jsylee/scibert_scivocab_uncased-finetuned-ner \ No newline at end of file From d9ecf4c034556c3d0340df43bb5bb0d131289850 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:54:01 +0700 Subject: [PATCH 428/667] Add model 2023-11-07-bert_tiny_chinese_ws_zh --- .../2023-11-07-bert_tiny_chinese_ws_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_tiny_chinese_ws_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_chinese_ws_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_chinese_ws_zh.md new file mode 100644 index 00000000000000..c61290efabee5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_chinese_ws_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_tiny_chinese_ws BertForTokenClassification from ckiplab +author: John Snow Labs +name: bert_tiny_chinese_ws +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_chinese_ws` is a Chinese model originally trained by ckiplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_chinese_ws_zh_5.2.0_3.0_1699383183911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_chinese_ws_zh_5.2.0_3.0_1699383183911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_tiny_chinese_ws","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_tiny_chinese_ws", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_chinese_ws| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|43.0 MB| + +## References + +https://huggingface.co/ckiplab/bert-tiny-chinese-ws \ No newline at end of file From 94976ec71b5e7e19c08e229efd449b47fea1d35f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:55:00 +0700 Subject: [PATCH 429/667] Add model 2023-11-07-bert_token_classifier_instafood_ner_en --- ...-bert_token_classifier_instafood_ner_en.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_instafood_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_instafood_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_instafood_ner_en.md new file mode 100644 index 00000000000000..6ec8f75c2785cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_instafood_ner_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English BertForTokenClassification Cased model (from Dizex) +author: John Snow Labs +name: bert_token_classifier_instafood_ner +date: 2023-11-07 +tags: [en, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `InstaFoodBERT-NER` is a English model originally trained by `Dizex`. + +## Predicted Entities + +`FOOD` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_instafood_ner_en_5.2.0_3.0_1699383278855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_instafood_ner_en_5.2.0_3.0_1699383278855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_instafood_ner","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_instafood_ner","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_instafood_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/Dizex/InstaFoodBERT-NER \ No newline at end of file From 569f5e0ee4ef7376332010a5057e0746eb356c06 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:56:00 +0700 Subject: [PATCH 430/667] Add model 2023-11-07-hebert_medical_ner_fixed_labels_v3_en --- ...7-hebert_medical_ner_fixed_labels_v3_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v3_en.md b/docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v3_en.md new file mode 100644 index 00000000000000..2a01f206d189bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v3_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English hebert_medical_ner_fixed_labels_v3 BertForTokenClassification from cp500 +author: John Snow Labs +name: hebert_medical_ner_fixed_labels_v3 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hebert_medical_ner_fixed_labels_v3` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hebert_medical_ner_fixed_labels_v3_en_5.2.0_3.0_1699383333088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hebert_medical_ner_fixed_labels_v3_en_5.2.0_3.0_1699383333088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("hebert_medical_ner_fixed_labels_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("hebert_medical_ner_fixed_labels_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hebert_medical_ner_fixed_labels_v3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.6 MB| + +## References + +https://huggingface.co/cp500/hebert_medical_ner_fixed_labels_v3 \ No newline at end of file From f8bf98e76fffecae68c96e069d10885e02d1b5b8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:57:00 +0700 Subject: [PATCH 431/667] Add model 2023-11-07-bioner_en --- .../ahmedlone127/2023-11-07-bioner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bioner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bioner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bioner_en.md new file mode 100644 index 00000000000000..f3d8853080f659 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bioner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bioner BertForTokenClassification from MilosKosRad +author: John Snow Labs +name: bioner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bioner` is a English model originally trained by MilosKosRad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bioner_en_5.2.0_3.0_1699383388414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bioner_en_5.2.0_3.0_1699383388414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bioner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bioner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bioner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/MilosKosRad/BioNER \ No newline at end of file From 0302cdb93118543382caf3ebc666704803723fb3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:58:01 +0700 Subject: [PATCH 432/667] Add model 2023-11-07-bert_base_chinese_stock_ner_zh --- ...23-11-07-bert_base_chinese_stock_ner_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_stock_ner_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_stock_ner_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_stock_ner_zh.md new file mode 100644 index 00000000000000..6a0cee910b098c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_stock_ner_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_base_chinese_stock_ner BertForTokenClassification from JasonYan +author: John Snow Labs +name: bert_base_chinese_stock_ner +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_stock_ner` is a Chinese model originally trained by JasonYan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_stock_ner_zh_5.2.0_3.0_1699383470842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_stock_ner_zh_5.2.0_3.0_1699383470842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_stock_ner","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_chinese_stock_ner", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_stock_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/JasonYan/bert-base-chinese-stock-ner \ No newline at end of file From 4e0e559f0ccb83028b2ff95e3c01f5d461fb91b6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 01:59:01 +0700 Subject: [PATCH 433/667] Add model 2023-11-07-bert_portuguese_ner_archive_en --- ...23-11-07-bert_portuguese_ner_archive_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_portuguese_ner_archive_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_portuguese_ner_archive_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_portuguese_ner_archive_en.md new file mode 100644 index 00000000000000..c261ccd5da8ee7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_portuguese_ner_archive_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_portuguese_ner_archive BertForTokenClassification from lfcc +author: John Snow Labs +name: bert_portuguese_ner_archive +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_portuguese_ner_archive` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_portuguese_ner_archive_en_5.2.0_3.0_1699383518668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_portuguese_ner_archive_en_5.2.0_3.0_1699383518668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_portuguese_ner_archive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_portuguese_ner_archive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_portuguese_ner_archive| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/lfcc/bert-portuguese-ner-archive \ No newline at end of file From c962325aa800efe7723352a64e434f51e8c53cc6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:00:01 +0700 Subject: [PATCH 434/667] Add model 2023-11-07-bpmn_information_extraction_en --- ...23-11-07-bpmn_information_extraction_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_en.md b/docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_en.md new file mode 100644 index 00000000000000..aa4677a595670f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bpmn_information_extraction BertForTokenClassification from jtlicardo +author: John Snow Labs +name: bpmn_information_extraction +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpmn_information_extraction` is a English model originally trained by jtlicardo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpmn_information_extraction_en_5.2.0_3.0_1699383560746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpmn_information_extraction_en_5.2.0_3.0_1699383560746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bpmn_information_extraction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bpmn_information_extraction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpmn_information_extraction| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/jtlicardo/bpmn-information-extraction \ No newline at end of file From 8cb0923d4e5c42e6b278c0282e348e6206eb6487 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:01:02 +0700 Subject: [PATCH 435/667] Add model 2023-11-07-bert_medical_ner_proj_en --- .../2023-11-07-bert_medical_ner_proj_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_medical_ner_proj_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_medical_ner_proj_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_medical_ner_proj_en.md new file mode 100644 index 00000000000000..d87525a3f24da9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_medical_ner_proj_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_medical_ner_proj BertForTokenClassification from medical-ner-proj +author: John Snow Labs +name: bert_medical_ner_proj +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_medical_ner_proj` is a English model originally trained by medical-ner-proj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_medical_ner_proj_en_5.2.0_3.0_1699383204967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_medical_ner_proj_en_5.2.0_3.0_1699383204967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_medical_ner_proj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_medical_ner_proj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_medical_ner_proj| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/medical-ner-proj/bert-medical-ner-proj \ No newline at end of file From 78549e18a58404dec810d3fdabedc00f77639649 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:02:02 +0700 Subject: [PATCH 436/667] Add model 2023-11-07-bert_base_uncased_city_country_ner_ml6team_en --- ...ase_uncased_city_country_ner_ml6team_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_city_country_ner_ml6team_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_city_country_ner_ml6team_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_city_country_ner_ml6team_en.md new file mode 100644 index 00000000000000..3340f375019914 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_city_country_ner_ml6team_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_uncased_city_country_ner_ml6team BertForTokenClassification from ml6team +author: John Snow Labs +name: bert_base_uncased_city_country_ner_ml6team +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_city_country_ner_ml6team` is a English model originally trained by ml6team. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_city_country_ner_ml6team_en_5.2.0_3.0_1699383583804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_city_country_ner_ml6team_en_5.2.0_3.0_1699383583804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_city_country_ner_ml6team","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_uncased_city_country_ner_ml6team", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_city_country_ner_ml6team| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ml6team/bert-base-uncased-city-country-ner \ No newline at end of file From c7ab0804447ac58a99bbefa184627be7851794bd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:03:02 +0700 Subject: [PATCH 437/667] Add model 2023-11-07-resumeparserbert_en --- .../2023-11-07-resumeparserbert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-resumeparserbert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-resumeparserbert_en.md b/docs/_posts/ahmedlone127/2023-11-07-resumeparserbert_en.md new file mode 100644 index 00000000000000..32edff2c8e4fd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-resumeparserbert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English resumeparserbert BertForTokenClassification from sravya-abburi +author: John Snow Labs +name: resumeparserbert +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`resumeparserbert` is a English model originally trained by sravya-abburi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/resumeparserbert_en_5.2.0_3.0_1699383698898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/resumeparserbert_en_5.2.0_3.0_1699383698898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("resumeparserbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("resumeparserbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|resumeparserbert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/sravya-abburi/ResumeParserBERT \ No newline at end of file From d20c5226b3ea74262124f1849af0c97c2d41c554 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:04:02 +0700 Subject: [PATCH 438/667] Add model 2023-11-07-bent_pubmedbert_ner_organism_en --- ...3-11-07-bent_pubmedbert_ner_organism_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_organism_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_organism_en.md b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_organism_en.md new file mode 100644 index 00000000000000..fbc475de54c7d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_organism_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_organism BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_organism +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_organism` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_organism_en_5.2.0_3.0_1699383678035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_organism_en_5.2.0_3.0_1699383678035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_organism","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_organism", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_organism| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Organism \ No newline at end of file From 851275de4ee0a14681fb33d58616c10afe7552ca Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:05:03 +0700 Subject: [PATCH 439/667] Add model 2023-11-07-rubert_ext_sum_gazeta_ru --- .../2023-11-07-rubert_ext_sum_gazeta_ru.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-rubert_ext_sum_gazeta_ru.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-rubert_ext_sum_gazeta_ru.md b/docs/_posts/ahmedlone127/2023-11-07-rubert_ext_sum_gazeta_ru.md new file mode 100644 index 00000000000000..d39b9e1760ced4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-rubert_ext_sum_gazeta_ru.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Russian rubert_ext_sum_gazeta BertForTokenClassification from IlyaGusev +author: John Snow Labs +name: rubert_ext_sum_gazeta +date: 2023-11-07 +tags: [bert, ru, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_ext_sum_gazeta` is a Russian model originally trained by IlyaGusev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_ext_sum_gazeta_ru_5.2.0_3.0_1699383839435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_ext_sum_gazeta_ru_5.2.0_3.0_1699383839435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("rubert_ext_sum_gazeta","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("rubert_ext_sum_gazeta", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_ext_sum_gazeta| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ru| +|Size:|664.3 MB| + +## References + +https://huggingface.co/IlyaGusev/rubert_ext_sum_gazeta \ No newline at end of file From 1c088c7afe3424556f85f4900ff58ff7c587ddec Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:06:03 +0700 Subject: [PATCH 440/667] Add model 2023-11-07-assignment2_meher_test3_en --- .../2023-11-07-assignment2_meher_test3_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-assignment2_meher_test3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-assignment2_meher_test3_en.md b/docs/_posts/ahmedlone127/2023-11-07-assignment2_meher_test3_en.md new file mode 100644 index 00000000000000..0e70cd8060b3d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-assignment2_meher_test3_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English assignment2_meher_test3 BertForTokenClassification from mpalaval +author: John Snow Labs +name: assignment2_meher_test3 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`assignment2_meher_test3` is a English model originally trained by mpalaval. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/assignment2_meher_test3_en_5.2.0_3.0_1699383048254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/assignment2_meher_test3_en_5.2.0_3.0_1699383048254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("assignment2_meher_test3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("assignment2_meher_test3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|assignment2_meher_test3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mpalaval/assignment2_meher_test3 \ No newline at end of file From 6e7248de12d51b913fae2ab5be7f9c372b9ce030 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:07:03 +0700 Subject: [PATCH 441/667] Add model 2023-11-07-pashto_word_segmentation_en --- .../2023-11-07-pashto_word_segmentation_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-pashto_word_segmentation_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-pashto_word_segmentation_en.md b/docs/_posts/ahmedlone127/2023-11-07-pashto_word_segmentation_en.md new file mode 100644 index 00000000000000..40a3024f7b57ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-pashto_word_segmentation_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English pashto_word_segmentation BertForTokenClassification from ijazulhaq +author: John Snow Labs +name: pashto_word_segmentation +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pashto_word_segmentation` is a English model originally trained by ijazulhaq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pashto_word_segmentation_en_5.2.0_3.0_1699383974575.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pashto_word_segmentation_en_5.2.0_3.0_1699383974575.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("pashto_word_segmentation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("pashto_word_segmentation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pashto_word_segmentation| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.5 MB| + +## References + +https://huggingface.co/ijazulhaq/pashto-word-segmentation \ No newline at end of file From 397f2b660aab11442525950d8a8d7f632d8a504a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:08:03 +0700 Subject: [PATCH 442/667] Add model 2023-11-07-idrisi_lmr_en_random_typeless_en --- ...-11-07-idrisi_lmr_en_random_typeless_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typeless_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typeless_en.md b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typeless_en.md new file mode 100644 index 00000000000000..ea0f4a669af650 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typeless_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English idrisi_lmr_en_random_typeless BertForTokenClassification from rsuwaileh +author: John Snow Labs +name: idrisi_lmr_en_random_typeless +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`idrisi_lmr_en_random_typeless` is a English model originally trained by rsuwaileh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_random_typeless_en_5.2.0_3.0_1699383382404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_random_typeless_en_5.2.0_3.0_1699383382404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("idrisi_lmr_en_random_typeless","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("idrisi_lmr_en_random_typeless", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|idrisi_lmr_en_random_typeless| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/rsuwaileh/IDRISI-LMR-EN-random-typeless \ No newline at end of file From caf666972569af6260deeae984d2425b9053fe16 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:09:03 +0700 Subject: [PATCH 443/667] Add model 2023-11-07-bert_base_finetuned_sayula_popoluca_ud_english_ewt_en --- ...tuned_sayula_popoluca_ud_english_ewt_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_finetuned_sayula_popoluca_ud_english_ewt_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_finetuned_sayula_popoluca_ud_english_ewt_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_finetuned_sayula_popoluca_ud_english_ewt_en.md new file mode 100644 index 00000000000000..564e9c7c3cb4d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_finetuned_sayula_popoluca_ud_english_ewt_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_finetuned_sayula_popoluca_ud_english_ewt BertForTokenClassification from TokenfreeEMNLPSubmission +author: John Snow Labs +name: bert_base_finetuned_sayula_popoluca_ud_english_ewt +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_sayula_popoluca_ud_english_ewt` is a English model originally trained by TokenfreeEMNLPSubmission. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sayula_popoluca_ud_english_ewt_en_5.2.0_3.0_1699383854788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sayula_popoluca_ud_english_ewt_en_5.2.0_3.0_1699383854788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_finetuned_sayula_popoluca_ud_english_ewt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_finetuned_sayula_popoluca_ud_english_ewt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_sayula_popoluca_ud_english_ewt| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/TokenfreeEMNLPSubmission/bert-base-finetuned-pos-ud-english-ewt \ No newline at end of file From 49aab5d1792aea8f3ba636048656b7e6b59ee403 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:10:04 +0700 Subject: [PATCH 444/667] Add model 2023-11-07-bert_token_classifier_swedish_ner_sv --- ...07-bert_token_classifier_swedish_ner_sv.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_swedish_ner_sv.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_swedish_ner_sv.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_swedish_ner_sv.md new file mode 100644 index 00000000000000..c77d138e155ead --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_swedish_ner_sv.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Swedish BertForTokenClassification Cased model (from hkaraoguz) +author: John Snow Labs +name: bert_token_classifier_swedish_ner +date: 2023-11-07 +tags: [sv, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: sv +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `BERT_swedish-ner` is a Swedish model originally trained by `hkaraoguz`. + +## Predicted Entities + +`LOC`, `ORG`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_swedish_ner_sv_5.2.0_3.0_1699384090991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_swedish_ner_sv_5.2.0_3.0_1699384090991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_swedish_ner","sv") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_swedish_ner","sv") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_swedish_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sv| +|Size:|465.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/hkaraoguz/BERT_swedish-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=wikiann \ No newline at end of file From 6b4f49be384b5825149d03a5ce72b05474930c5a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:11:04 +0700 Subject: [PATCH 445/667] Add model 2023-11-07-ner_bert_base_cased_ontonotesv5_englishv4_en --- ...ert_base_cased_ontonotesv5_englishv4_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ner_bert_base_cased_ontonotesv5_englishv4_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ner_bert_base_cased_ontonotesv5_englishv4_en.md b/docs/_posts/ahmedlone127/2023-11-07-ner_bert_base_cased_ontonotesv5_englishv4_en.md new file mode 100644 index 00000000000000..647ecc4d2b0e3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ner_bert_base_cased_ontonotesv5_englishv4_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ner_bert_base_cased_ontonotesv5_englishv4 BertForTokenClassification from djagatiya +author: John Snow Labs +name: ner_bert_base_cased_ontonotesv5_englishv4 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bert_base_cased_ontonotesv5_englishv4` is a English model originally trained by djagatiya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bert_base_cased_ontonotesv5_englishv4_en_5.2.0_3.0_1699384083694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bert_base_cased_ontonotesv5_englishv4_en_5.2.0_3.0_1699384083694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_bert_base_cased_ontonotesv5_englishv4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_bert_base_cased_ontonotesv5_englishv4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bert_base_cased_ontonotesv5_englishv4| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/djagatiya/ner-bert-base-cased-ontonotesv5-englishv4 \ No newline at end of file From 5d25271aa2e189c4a2b0997210875ea069fd4a0b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:12:04 +0700 Subject: [PATCH 446/667] Add model 2023-11-07-bent_pubmedbert_ner_cell_line_en --- ...-11-07-bent_pubmedbert_ner_cell_line_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_line_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_line_en.md b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_line_en.md new file mode 100644 index 00000000000000..2186b6892e41a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_line_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_cell_line BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_cell_line +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_cell_line` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_cell_line_en_5.2.0_3.0_1699384184050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_cell_line_en_5.2.0_3.0_1699384184050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_cell_line","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_cell_line", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_cell_line| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Cell-Line \ No newline at end of file From d86cea66b3366ee14babcdcbc0b323f17e4df57f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:13:05 +0700 Subject: [PATCH 447/667] Add model 2023-11-07-porttagger_base_en --- .../2023-11-07-porttagger_base_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-porttagger_base_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-porttagger_base_en.md b/docs/_posts/ahmedlone127/2023-11-07-porttagger_base_en.md new file mode 100644 index 00000000000000..1fd27ae7ff1467 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-porttagger_base_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English porttagger_base BertForTokenClassification from Emanuel +author: John Snow Labs +name: porttagger_base +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`porttagger_base` is a English model originally trained by Emanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/porttagger_base_en_5.2.0_3.0_1699384183984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/porttagger_base_en_5.2.0_3.0_1699384183984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("porttagger_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("porttagger_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|porttagger_base| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Emanuel/porttagger-base \ No newline at end of file From 53e9dfe141335c056e54e2e53e26e70e565d553f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:14:05 +0700 Subject: [PATCH 448/667] Add model 2023-11-07-bert_finetuned_ner_konic_en --- .../2023-11-07-bert_finetuned_ner_konic_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_konic_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_konic_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_konic_en.md new file mode 100644 index 00000000000000..abc7493cb3bd7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_konic_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_konic BertForTokenClassification from Konic +author: John Snow Labs +name: bert_finetuned_ner_konic +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_konic` is a English model originally trained by Konic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_konic_en_5.2.0_3.0_1699384414045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_konic_en_5.2.0_3.0_1699384414045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_konic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_konic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_konic| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Konic/bert-finetuned-ner \ No newline at end of file From d4dd1ef5702a74429e919d030ef1fedd7ce7b72b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:15:05 +0700 Subject: [PATCH 449/667] Add model 2023-11-07-clinicalnerpt_disease_pt --- .../2023-11-07-clinicalnerpt_disease_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disease_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disease_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disease_pt.md new file mode 100644 index 00000000000000..907b1d1ca27947 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_disease_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_disease BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_disease +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_disease` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_disease_pt_5.2.0_3.0_1699384452527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_disease_pt_5.2.0_3.0_1699384452527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_disease","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_disease", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-disease \ No newline at end of file From 97fedf30d3d1f8756b94dc7adf06c2c73cf13498 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:16:06 +0700 Subject: [PATCH 450/667] Add model 2023-11-07-bert_token_classifier_restore_punctuation_ptbr_pt --- ..._classifier_restore_punctuation_ptbr_pt.md | 104 ++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_restore_punctuation_ptbr_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_restore_punctuation_ptbr_pt.md b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_restore_punctuation_ptbr_pt.md new file mode 100644 index 00000000000000..ad939d748b22a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_token_classifier_restore_punctuation_ptbr_pt.md @@ -0,0 +1,104 @@ +--- +layout: model +title: Portuguese BertForTokenClassification Cased model (from dominguesm) +author: John Snow Labs +name: bert_token_classifier_restore_punctuation_ptbr +date: 2023-11-07 +tags: [pt, open_source, bert, token_classification, ner, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bert-restore-punctuation-ptbr` is a Portuguese model originally trained by `dominguesm`. + +## Predicted Entities + +`.U`, `!O`, `:O`, `:U`, `;O`, `OU`, `?U`, `!U`, `OO`, `.O`, `-O`, `'O`, `?O` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_token_classifier_restore_punctuation_ptbr_pt_5.2.0_3.0_1699383762732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_token_classifier_restore_punctuation_ptbr_pt_5.2.0_3.0_1699383762732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_restore_punctuation_ptbr","pt") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_restore_punctuation_ptbr","pt") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_token_classifier_restore_punctuation_ptbr| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|406.0 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +References + +- https://huggingface.co/dominguesm/bert-restore-punctuation-ptbr +- https://wandb.ai/dominguesm/RestorePunctuationPTBR +- https://github.com/DominguesM/respunct +- https://github.com/esdurmus/Wikilingua +- https://paperswithcode.com/sota?task=named-entity-recognition&dataset=wiki_lingua \ No newline at end of file From 2fd3ff7d4fe7fc4e4ff960934bbde1ecc48112ac Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:17:06 +0700 Subject: [PATCH 451/667] Add model 2023-11-07-nlp_tokenclass_ner_en --- .../2023-11-07-nlp_tokenclass_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-nlp_tokenclass_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-nlp_tokenclass_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-nlp_tokenclass_ner_en.md new file mode 100644 index 00000000000000..f90bc6d00acf7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-nlp_tokenclass_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English nlp_tokenclass_ner BertForTokenClassification from Endika99 +author: John Snow Labs +name: nlp_tokenclass_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_tokenclass_ner` is a English model originally trained by Endika99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_tokenclass_ner_en_5.2.0_3.0_1699384183925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_tokenclass_ner_en_5.2.0_3.0_1699384183925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("nlp_tokenclass_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("nlp_tokenclass_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_tokenclass_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Endika99/NLP-TokenClass-NER \ No newline at end of file From fa7171b9021624419bf8c192d5c396117ff3ce37 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:18:06 +0700 Subject: [PATCH 452/667] Add model 2023-11-07-rubert_tiny_obj_asp_en --- .../2023-11-07-rubert_tiny_obj_asp_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-rubert_tiny_obj_asp_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-rubert_tiny_obj_asp_en.md b/docs/_posts/ahmedlone127/2023-11-07-rubert_tiny_obj_asp_en.md new file mode 100644 index 00000000000000..4012884002b8a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-rubert_tiny_obj_asp_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English rubert_tiny_obj_asp BertForTokenClassification from lilaspourpre +author: John Snow Labs +name: rubert_tiny_obj_asp +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny_obj_asp` is a English model originally trained by lilaspourpre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny_obj_asp_en_5.2.0_3.0_1699384483183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny_obj_asp_en_5.2.0_3.0_1699384483183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("rubert_tiny_obj_asp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("rubert_tiny_obj_asp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny_obj_asp| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|43.8 MB| + +## References + +https://huggingface.co/lilaspourpre/rubert-tiny-obj-asp \ No newline at end of file From 7cf0a2979185610a71d2e61825a7b35b4d60dae8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:19:07 +0700 Subject: [PATCH 453/667] Add model 2023-11-07-elhberteu_sayula_popoluca_ud1_2_eu --- ...1-07-elhberteu_sayula_popoluca_ud1_2_eu.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-elhberteu_sayula_popoluca_ud1_2_eu.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-elhberteu_sayula_popoluca_ud1_2_eu.md b/docs/_posts/ahmedlone127/2023-11-07-elhberteu_sayula_popoluca_ud1_2_eu.md new file mode 100644 index 00000000000000..f6f01ad26ea84e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-elhberteu_sayula_popoluca_ud1_2_eu.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Basque elhberteu_sayula_popoluca_ud1_2 BertForTokenClassification from orai-nlp +author: John Snow Labs +name: elhberteu_sayula_popoluca_ud1_2 +date: 2023-11-07 +tags: [bert, eu, open_source, token_classification, onnx] +task: Named Entity Recognition +language: eu +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`elhberteu_sayula_popoluca_ud1_2` is a Basque model originally trained by orai-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/elhberteu_sayula_popoluca_ud1_2_eu_5.2.0_3.0_1699384623242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/elhberteu_sayula_popoluca_ud1_2_eu_5.2.0_3.0_1699384623242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("elhberteu_sayula_popoluca_ud1_2","eu") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("elhberteu_sayula_popoluca_ud1_2", "eu") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|elhberteu_sayula_popoluca_ud1_2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|eu| +|Size:|464.7 MB| + +## References + +https://huggingface.co/orai-nlp/ElhBERTeu-pos-ud1.2 \ No newline at end of file From ed9b7aba8e828c797670367cff0be3770c09f37f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:20:06 +0700 Subject: [PATCH 454/667] Add model 2023-11-07-biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en --- ..._2_ncbi_disease_softmax_labelall_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en.md new file mode 100644 index 00000000000000..8955c1d651262f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner BertForTokenClassification from jordyvl +author: John Snow Labs +name: biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner` is a English model originally trained by jordyvl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en_5.2.0_3.0_1699384773826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner_en_5.2.0_3.0_1699384773826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_base_cased_v1_2_ncbi_disease_softmax_labelall_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/jordyvl/biobert-base-cased-v1.2_ncbi_disease-softmax-labelall-ner \ No newline at end of file From c5b15cfd8cd3f51f907a0b58b08c41fa286fe72d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:21:07 +0700 Subject: [PATCH 455/667] Add model 2023-11-07-tiny_random_bertfortokenclassification_hf_internal_testing_en --- ...enclassification_hf_internal_testing_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-tiny_random_bertfortokenclassification_hf_internal_testing_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-tiny_random_bertfortokenclassification_hf_internal_testing_en.md b/docs/_posts/ahmedlone127/2023-11-07-tiny_random_bertfortokenclassification_hf_internal_testing_en.md new file mode 100644 index 00000000000000..d0e912198e8a76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-tiny_random_bertfortokenclassification_hf_internal_testing_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English tiny_random_bertfortokenclassification_hf_internal_testing BertForTokenClassification from hf-internal-testing +author: John Snow Labs +name: tiny_random_bertfortokenclassification_hf_internal_testing +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_bertfortokenclassification_hf_internal_testing` is a English model originally trained by hf-internal-testing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_bertfortokenclassification_hf_internal_testing_en_5.2.0_3.0_1699384782616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_bertfortokenclassification_hf_internal_testing_en_5.2.0_3.0_1699384782616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("tiny_random_bertfortokenclassification_hf_internal_testing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("tiny_random_bertfortokenclassification_hf_internal_testing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_bertfortokenclassification_hf_internal_testing| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|349.9 KB| + +## References + +https://huggingface.co/hf-internal-testing/tiny-random-BertForTokenClassification \ No newline at end of file From 91bec0a7dbbf4fa742efd6cab2c7f9425a846b6a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:22:07 +0700 Subject: [PATCH 456/667] Add model 2023-11-07-bert_base_named_entity_extractor_en --- ...-07-bert_base_named_entity_extractor_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_named_entity_extractor_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_named_entity_extractor_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_named_entity_extractor_en.md new file mode 100644 index 00000000000000..0d7f99d291bba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_named_entity_extractor_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_named_entity_extractor BertForTokenClassification from Azma-AI +author: John Snow Labs +name: bert_base_named_entity_extractor +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_named_entity_extractor` is a English model originally trained by Azma-AI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_named_entity_extractor_en_5.2.0_3.0_1699384799757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_named_entity_extractor_en_5.2.0_3.0_1699384799757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_named_entity_extractor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_named_entity_extractor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_named_entity_extractor| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Azma-AI/bert-base-named-entity-extractor \ No newline at end of file From 5c4693d81df7fc2dad95b6f71a07cfe05c84bc34 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:23:07 +0700 Subject: [PATCH 457/667] Add model 2023-11-07-bert_base_spanish_wwm_cased_finetuned_ner_en --- ...base_spanish_wwm_cased_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_spanish_wwm_cased_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_spanish_wwm_cased_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_spanish_wwm_cased_finetuned_ner_en.md new file mode 100644 index 00000000000000..09c073a659aa87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_spanish_wwm_cased_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_ner BertForTokenClassification from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_ner` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_ner_en_5.2.0_3.0_1699384970511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_ner_en_5.2.0_3.0_1699384970511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_spanish_wwm_cased_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-ner \ No newline at end of file From 070ff60727a01ab6f73c6099d7f507977e4ce2dc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:24:07 +0700 Subject: [PATCH 458/667] Add model 2023-11-07-polymerner_en --- .../ahmedlone127/2023-11-07-polymerner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-polymerner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-polymerner_en.md b/docs/_posts/ahmedlone127/2023-11-07-polymerner_en.md new file mode 100644 index 00000000000000..b7ffecf915ccf6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-polymerner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English polymerner BertForTokenClassification from pranav-s +author: John Snow Labs +name: polymerner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polymerner` is a English model originally trained by pranav-s. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polymerner_en_5.2.0_3.0_1699384979853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polymerner_en_5.2.0_3.0_1699384979853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("polymerner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("polymerner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polymerner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/pranav-s/PolymerNER \ No newline at end of file From 1051c122e901190df2b8cfe63b07249a2db4db36 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:25:08 +0700 Subject: [PATCH 459/667] Add model 2023-11-07-deprem_ner_tr --- .../ahmedlone127/2023-11-07-deprem_ner_tr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-deprem_ner_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-deprem_ner_tr.md b/docs/_posts/ahmedlone127/2023-11-07-deprem_ner_tr.md new file mode 100644 index 00000000000000..2eaafed6259aec --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-deprem_ner_tr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Turkish deprem_ner BertForTokenClassification from deprem-ml +author: John Snow Labs +name: deprem_ner +date: 2023-11-07 +tags: [bert, tr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deprem_ner` is a Turkish model originally trained by deprem-ml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deprem_ner_tr_5.2.0_3.0_1699384420770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deprem_ner_tr_5.2.0_3.0_1699384420770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("deprem_ner","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("deprem_ner", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deprem_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| + +## References + +https://huggingface.co/deprem-ml/deprem-ner \ No newline at end of file From 18c380017791459f936f5f2fab9c12b53aafd08d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:26:08 +0700 Subject: [PATCH 460/667] Add model 2023-11-07-bert_finetuned_ner_pii_en --- .../2023-11-07-bert_finetuned_ner_pii_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_pii_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_pii_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_pii_en.md new file mode 100644 index 00000000000000..1f4a204dfcd176 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_pii_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_pii BertForTokenClassification from ArunaSaraswathy +author: John Snow Labs +name: bert_finetuned_ner_pii +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_pii` is a English model originally trained by ArunaSaraswathy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_pii_en_5.2.0_3.0_1699385100996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_pii_en_5.2.0_3.0_1699385100996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_pii","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_pii", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_pii| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|404.0 MB| + +## References + +https://huggingface.co/ArunaSaraswathy/bert-finetuned-ner-pii \ No newline at end of file From 2082c01fbaf1c782e76bca1a09bee3f30940ba36 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:27:08 +0700 Subject: [PATCH 461/667] Add model 2023-11-07-ner_fine_tune_bert_en --- .../2023-11-07-ner_fine_tune_bert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_en.md b/docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_en.md new file mode 100644 index 00000000000000..8b7d801139aa0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ner_fine_tune_bert BertForTokenClassification from cehongw +author: John Snow Labs +name: ner_fine_tune_bert +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_fine_tune_bert` is a English model originally trained by cehongw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_fine_tune_bert_en_5.2.0_3.0_1699385195759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_fine_tune_bert_en_5.2.0_3.0_1699385195759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_fine_tune_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_fine_tune_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_fine_tune_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/cehongw/ner-fine-tune-bert \ No newline at end of file From 7746782869200a2efefffbe77601f18f5b594157 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:28:09 +0700 Subject: [PATCH 462/667] Add model 2023-11-07-wikiser_bert_base_en --- .../2023-11-07-wikiser_bert_base_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_base_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_base_en.md b/docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_base_en.md new file mode 100644 index 00000000000000..c5eaae7f10f157 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_base_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English wikiser_bert_base BertForTokenClassification from taidng +author: John Snow Labs +name: wikiser_bert_base +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikiser_bert_base` is a English model originally trained by taidng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikiser_bert_base_en_5.2.0_3.0_1699384974903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikiser_bert_base_en_5.2.0_3.0_1699384974903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("wikiser_bert_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("wikiser_bert_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikiser_bert_base| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/taidng/wikiser-bert-base \ No newline at end of file From be841668135dea44f12dbbfe97335b49a5dca3f4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:29:09 +0700 Subject: [PATCH 463/667] Add model 2023-11-07-bent_pubmedbert_ner_anatomical_en --- ...11-07-bent_pubmedbert_ner_anatomical_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_anatomical_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_anatomical_en.md b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_anatomical_en.md new file mode 100644 index 00000000000000..9d3ea9a3078992 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_anatomical_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_anatomical BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_anatomical +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_anatomical` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_anatomical_en_5.2.0_3.0_1699385335505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_anatomical_en_5.2.0_3.0_1699385335505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_anatomical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_anatomical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_anatomical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Anatomical \ No newline at end of file From 5e358588dba8af21e0456edb0b13aff956e3ad8d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:30:09 +0700 Subject: [PATCH 464/667] Add model 2023-11-07-emscad_skill_extraction_conference_token_classification_en --- ...tion_conference_token_classification_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_conference_token_classification_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_conference_token_classification_en.md b/docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_conference_token_classification_en.md new file mode 100644 index 00000000000000..8bd7e889ad9552 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_conference_token_classification_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English emscad_skill_extraction_conference_token_classification BertForTokenClassification from Ivo +author: John Snow Labs +name: emscad_skill_extraction_conference_token_classification +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emscad_skill_extraction_conference_token_classification` is a English model originally trained by Ivo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_token_classification_en_5.2.0_3.0_1699385391119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_token_classification_en_5.2.0_3.0_1699385391119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("emscad_skill_extraction_conference_token_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("emscad_skill_extraction_conference_token_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emscad_skill_extraction_conference_token_classification| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ivo/emscad-skill-extraction-conference-token-classification \ No newline at end of file From 5e3ae0f420dc281bd7819a709c27d3e113ab27e1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:31:09 +0700 Subject: [PATCH 465/667] Add model 2023-11-07-biobert_diseases_ner_alvaroalon2_en --- ...-07-biobert_diseases_ner_alvaroalon2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_alvaroalon2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_alvaroalon2_en.md b/docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_alvaroalon2_en.md new file mode 100644 index 00000000000000..02bddb8a486fef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_alvaroalon2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biobert_diseases_ner_alvaroalon2 BertForTokenClassification from alvaroalon2 +author: John Snow Labs +name: biobert_diseases_ner_alvaroalon2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_diseases_ner_alvaroalon2` is a English model originally trained by alvaroalon2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_diseases_ner_alvaroalon2_en_5.2.0_3.0_1699384011144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_diseases_ner_alvaroalon2_en_5.2.0_3.0_1699384011144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biobert_diseases_ner_alvaroalon2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biobert_diseases_ner_alvaroalon2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_diseases_ner_alvaroalon2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/alvaroalon2/biobert_diseases_ner \ No newline at end of file From ec61db99242b6b5acdfd635b587c88d81dbc9c1e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:32:09 +0700 Subject: [PATCH 466/667] Add model 2023-11-07-rubert_base_cased_conversational_ner_v1_en --- ...ert_base_cased_conversational_ner_v1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-rubert_base_cased_conversational_ner_v1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-rubert_base_cased_conversational_ner_v1_en.md b/docs/_posts/ahmedlone127/2023-11-07-rubert_base_cased_conversational_ner_v1_en.md new file mode 100644 index 00000000000000..762f44e1b9fa56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-rubert_base_cased_conversational_ner_v1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English rubert_base_cased_conversational_ner_v1 BertForTokenClassification from Data-Lab +author: John Snow Labs +name: rubert_base_cased_conversational_ner_v1 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_base_cased_conversational_ner_v1` is a English model originally trained by Data-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_base_cased_conversational_ner_v1_en_5.2.0_3.0_1699385391187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_base_cased_conversational_ner_v1_en_5.2.0_3.0_1699385391187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("rubert_base_cased_conversational_ner_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("rubert_base_cased_conversational_ner_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_base_cased_conversational_ner_v1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|662.2 MB| + +## References + +https://huggingface.co/Data-Lab/rubert-base-cased-conversational_ner-v1 \ No newline at end of file From 9445bf2bd05bcf2ba2fdb0949fd2dd8154ad4ba5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:33:09 +0700 Subject: [PATCH 467/667] Add model 2023-11-07-biobert_base_cased_v1_2_bc2gm_ner_en --- ...07-biobert_base_cased_v1_2_bc2gm_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_bc2gm_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_bc2gm_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_bc2gm_ner_en.md new file mode 100644 index 00000000000000..08ebaa458f1eef --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biobert_base_cased_v1_2_bc2gm_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biobert_base_cased_v1_2_bc2gm_ner BertForTokenClassification from chintagunta85 +author: John Snow Labs +name: biobert_base_cased_v1_2_bc2gm_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_base_cased_v1_2_bc2gm_ner` is a English model originally trained by chintagunta85. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_base_cased_v1_2_bc2gm_ner_en_5.2.0_3.0_1699383762749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_base_cased_v1_2_bc2gm_ner_en_5.2.0_3.0_1699383762749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biobert_base_cased_v1_2_bc2gm_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biobert_base_cased_v1_2_bc2gm_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_base_cased_v1_2_bc2gm_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/chintagunta85/biobert-base-cased-v1.2-bc2gm-ner \ No newline at end of file From ac59cac144b82f7528e1eb6516176bf100c1897c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:34:10 +0700 Subject: [PATCH 468/667] Add model 2023-11-07-bengali_language_ner_bn --- .../2023-11-07-bengali_language_ner_bn.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bengali_language_ner_bn.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bengali_language_ner_bn.md b/docs/_posts/ahmedlone127/2023-11-07-bengali_language_ner_bn.md new file mode 100644 index 00000000000000..ecf724d7f061bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bengali_language_ner_bn.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Bengali bengali_language_ner BertForTokenClassification from Suchandra +author: John Snow Labs +name: bengali_language_ner +date: 2023-11-07 +tags: [bert, bn, open_source, token_classification, onnx] +task: Named Entity Recognition +language: bn +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bengali_language_ner` is a Bengali model originally trained by Suchandra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bengali_language_ner_bn_5.2.0_3.0_1699385474779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bengali_language_ner_bn_5.2.0_3.0_1699385474779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bengali_language_ner","bn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bengali_language_ner", "bn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bengali_language_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|bn| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Suchandra/bengali_language_NER \ No newline at end of file From 69cc00b858917e8940cb7239f6e32c89977e3f4d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:35:10 +0700 Subject: [PATCH 469/667] Add model 2023-11-07-jira_bert_nerr_en --- .../2023-11-07-jira_bert_nerr_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-jira_bert_nerr_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-jira_bert_nerr_en.md b/docs/_posts/ahmedlone127/2023-11-07-jira_bert_nerr_en.md new file mode 100644 index 00000000000000..94df20c22a073a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-jira_bert_nerr_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English jira_bert_nerr BertForTokenClassification from rouabelgacem +author: John Snow Labs +name: jira_bert_nerr +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jira_bert_nerr` is a English model originally trained by rouabelgacem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jira_bert_nerr_en_5.2.0_3.0_1699385661443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jira_bert_nerr_en_5.2.0_3.0_1699385661443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("jira_bert_nerr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("jira_bert_nerr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jira_bert_nerr| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|404.0 MB| + +## References + +https://huggingface.co/rouabelgacem/jira-bert-nerr \ No newline at end of file From 74da149eace89f896b8bf3a08d828e92014f46cb Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:36:10 +0700 Subject: [PATCH 470/667] Add model 2023-11-07-ner_bert_large_cased_portuguese_lenerbr_pt --- ..._bert_large_cased_portuguese_lenerbr_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ner_bert_large_cased_portuguese_lenerbr_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ner_bert_large_cased_portuguese_lenerbr_pt.md b/docs/_posts/ahmedlone127/2023-11-07-ner_bert_large_cased_portuguese_lenerbr_pt.md new file mode 100644 index 00000000000000..15b6bf89305554 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ner_bert_large_cased_portuguese_lenerbr_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese ner_bert_large_cased_portuguese_lenerbr BertForTokenClassification from pierreguillou +author: John Snow Labs +name: ner_bert_large_cased_portuguese_lenerbr +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bert_large_cased_portuguese_lenerbr` is a Portuguese model originally trained by pierreguillou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bert_large_cased_portuguese_lenerbr_pt_5.2.0_3.0_1699384462079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bert_large_cased_portuguese_lenerbr_pt_5.2.0_3.0_1699384462079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_bert_large_cased_portuguese_lenerbr","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_bert_large_cased_portuguese_lenerbr", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bert_large_cased_portuguese_lenerbr| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pierreguillou/ner-bert-large-cased-pt-lenerbr \ No newline at end of file From 501439427a7502b8ca13e8a82b679faedd81f67a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:37:10 +0700 Subject: [PATCH 471/667] Add model 2023-11-07-vila_scibert_cased_s2vl_en --- .../2023-11-07-vila_scibert_cased_s2vl_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-vila_scibert_cased_s2vl_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-vila_scibert_cased_s2vl_en.md b/docs/_posts/ahmedlone127/2023-11-07-vila_scibert_cased_s2vl_en.md new file mode 100644 index 00000000000000..5fea228cf27ef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-vila_scibert_cased_s2vl_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English vila_scibert_cased_s2vl BertForTokenClassification from allenai +author: John Snow Labs +name: vila_scibert_cased_s2vl +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vila_scibert_cased_s2vl` is a English model originally trained by allenai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vila_scibert_cased_s2vl_en_5.2.0_3.0_1699385199476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vila_scibert_cased_s2vl_en_5.2.0_3.0_1699385199476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("vila_scibert_cased_s2vl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("vila_scibert_cased_s2vl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vila_scibert_cased_s2vl| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/allenai/vila-scibert-cased-s2vl \ No newline at end of file From 3b9ccda2b568835e9686b6d15e2f7cfa51f40e97 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:38:11 +0700 Subject: [PATCH 472/667] Add model 2023-11-07-named_entity_recognition_en --- .../2023-11-07-named_entity_recognition_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-named_entity_recognition_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-named_entity_recognition_en.md b/docs/_posts/ahmedlone127/2023-11-07-named_entity_recognition_en.md new file mode 100644 index 00000000000000..77d17c03600423 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-named_entity_recognition_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English named_entity_recognition BertForTokenClassification from mdarhri00 +author: John Snow Labs +name: named_entity_recognition +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`named_entity_recognition` is a English model originally trained by mdarhri00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/named_entity_recognition_en_5.2.0_3.0_1699385114625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/named_entity_recognition_en_5.2.0_3.0_1699385114625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("named_entity_recognition","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("named_entity_recognition", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|named_entity_recognition| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/mdarhri00/named-entity-recognition \ No newline at end of file From 447acab9e63741477934452ea6028b0b3e84019e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:39:11 +0700 Subject: [PATCH 473/667] Add model 2023-11-07-bert_finetuned_ner_lightsaber689_en --- ...-07-bert_finetuned_ner_lightsaber689_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lightsaber689_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lightsaber689_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lightsaber689_en.md new file mode 100644 index 00000000000000..3b5eb09aea0465 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lightsaber689_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_lightsaber689 BertForTokenClassification from lightsaber689 +author: John Snow Labs +name: bert_finetuned_ner_lightsaber689 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_lightsaber689` is a English model originally trained by lightsaber689. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_lightsaber689_en_5.2.0_3.0_1699385847274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_lightsaber689_en_5.2.0_3.0_1699385847274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_lightsaber689","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_lightsaber689", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_lightsaber689| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lightsaber689/bert-finetuned-ner \ No newline at end of file From 7100e1628c1fb48a7968b32c90df7b4d517a98e5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:40:11 +0700 Subject: [PATCH 474/667] Add model 2023-11-07-body_part_annotator_en --- .../2023-11-07-body_part_annotator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-body_part_annotator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-body_part_annotator_en.md b/docs/_posts/ahmedlone127/2023-11-07-body_part_annotator_en.md new file mode 100644 index 00000000000000..efe52363d63653 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-body_part_annotator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English body_part_annotator BertForTokenClassification from cp500 +author: John Snow Labs +name: body_part_annotator +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`body_part_annotator` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/body_part_annotator_en_5.2.0_3.0_1699385848533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/body_part_annotator_en_5.2.0_3.0_1699385848533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("body_part_annotator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("body_part_annotator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|body_part_annotator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.4 MB| + +## References + +https://huggingface.co/cp500/body_part_annotator \ No newline at end of file From 1ce1e09100026ead4310a36d8e47d0e6106e6d04 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:41:12 +0700 Subject: [PATCH 475/667] Add model 2023-11-07-clinicalnerpt_medical_pt --- .../2023-11-07-clinicalnerpt_medical_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_medical_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_medical_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_medical_pt.md new file mode 100644 index 00000000000000..1fe515783d9364 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_medical_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_medical BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_medical +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_medical` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_medical_pt_5.2.0_3.0_1699385973342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_medical_pt_5.2.0_3.0_1699385973342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_medical","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_medical", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_medical| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-medical \ No newline at end of file From 7483856a270dd811d979b9dde2e42d17443af601 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:42:11 +0700 Subject: [PATCH 476/667] Add model 2023-11-07-finbert_ner_fi --- .../ahmedlone127/2023-11-07-finbert_ner_fi.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-finbert_ner_fi.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-finbert_ner_fi.md b/docs/_posts/ahmedlone127/2023-11-07-finbert_ner_fi.md new file mode 100644 index 00000000000000..9d77d7d8a32a4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-finbert_ner_fi.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Finnish finbert_ner BertForTokenClassification from Kansallisarkisto +author: John Snow Labs +name: finbert_ner +date: 2023-11-07 +tags: [bert, fi, open_source, token_classification, onnx] +task: Named Entity Recognition +language: fi +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_ner` is a Finnish model originally trained by Kansallisarkisto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_ner_fi_5.2.0_3.0_1699385738219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_ner_fi_5.2.0_3.0_1699385738219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("finbert_ner","fi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("finbert_ner", "fi") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|fi| +|Size:|464.7 MB| + +## References + +https://huggingface.co/Kansallisarkisto/finbert-ner \ No newline at end of file From 6755bf16380879ff9a0cd812f6cbe854e888bc49 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:43:11 +0700 Subject: [PATCH 477/667] Add model 2023-11-07-tempclin_biobertpt_all_pt --- .../2023-11-07-tempclin_biobertpt_all_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-tempclin_biobertpt_all_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-tempclin_biobertpt_all_pt.md b/docs/_posts/ahmedlone127/2023-11-07-tempclin_biobertpt_all_pt.md new file mode 100644 index 00000000000000..0df4158c574123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-tempclin_biobertpt_all_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese tempclin_biobertpt_all BertForTokenClassification from pucpr-br +author: John Snow Labs +name: tempclin_biobertpt_all +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tempclin_biobertpt_all` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tempclin_biobertpt_all_pt_5.2.0_3.0_1699386094949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tempclin_biobertpt_all_pt_5.2.0_3.0_1699386094949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("tempclin_biobertpt_all","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("tempclin_biobertpt_all", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tempclin_biobertpt_all| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.9 MB| + +## References + +https://huggingface.co/pucpr-br/tempclin-biobertpt-all \ No newline at end of file From 6155d3d61dd4084559021f02dacd58076e2c4107 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:44:12 +0700 Subject: [PATCH 478/667] Add model 2023-11-07-unbias_ner_en --- .../ahmedlone127/2023-11-07-unbias_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-unbias_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-unbias_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-unbias_ner_en.md new file mode 100644 index 00000000000000..74f70b4e4d971f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-unbias_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English unbias_ner BertForTokenClassification from newsmediabias +author: John Snow Labs +name: unbias_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unbias_ner` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unbias_ner_en_5.2.0_3.0_1699386172323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unbias_ner_en_5.2.0_3.0_1699386172323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("unbias_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("unbias_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unbias_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/newsmediabias/UnBIAS-NER \ No newline at end of file From 0435bc46580dab10a0a45b5979ae4d20a2620254 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:45:12 +0700 Subject: [PATCH 479/667] Add model 2023-11-07-bulbert_ner_bsnlp_en --- .../2023-11-07-bulbert_ner_bsnlp_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bulbert_ner_bsnlp_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bulbert_ner_bsnlp_en.md b/docs/_posts/ahmedlone127/2023-11-07-bulbert_ner_bsnlp_en.md new file mode 100644 index 00000000000000..94aea5de8ce795 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bulbert_ner_bsnlp_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bulbert_ner_bsnlp BertForTokenClassification from mor40 +author: John Snow Labs +name: bulbert_ner_bsnlp +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bulbert_ner_bsnlp` is a English model originally trained by mor40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bulbert_ner_bsnlp_en_5.2.0_3.0_1699386167194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bulbert_ner_bsnlp_en_5.2.0_3.0_1699386167194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bulbert_ner_bsnlp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bulbert_ner_bsnlp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bulbert_ner_bsnlp| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.1 MB| + +## References + +https://huggingface.co/mor40/BulBERT-ner-bsnlp \ No newline at end of file From a91c4265cd03a35d4eb8034d3956b7cdde499205 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:46:12 +0700 Subject: [PATCH 480/667] Add model 2023-11-07-treatment_disease_ner_en --- .../2023-11-07-treatment_disease_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-treatment_disease_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-treatment_disease_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-treatment_disease_ner_en.md new file mode 100644 index 00000000000000..1da0f0b39ee4c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-treatment_disease_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English treatment_disease_ner BertForTokenClassification from jnferfer +author: John Snow Labs +name: treatment_disease_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`treatment_disease_ner` is a English model originally trained by jnferfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/treatment_disease_ner_en_5.2.0_3.0_1699386357805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/treatment_disease_ner_en_5.2.0_3.0_1699386357805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("treatment_disease_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("treatment_disease_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|treatment_disease_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/jnferfer/treatment-disease-NER \ No newline at end of file From b9b30611c1d606104a11812c20ce5f5848f9e3ca Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:47:12 +0700 Subject: [PATCH 481/667] Add model 2023-11-07-scbert_ser3_en --- .../ahmedlone127/2023-11-07-scbert_ser3_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-scbert_ser3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-scbert_ser3_en.md b/docs/_posts/ahmedlone127/2023-11-07-scbert_ser3_en.md new file mode 100644 index 00000000000000..3583bd9643e3ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-scbert_ser3_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English scbert_ser3 BertForTokenClassification from havens2 +author: John Snow Labs +name: scbert_ser3 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scbert_ser3` is a English model originally trained by havens2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scbert_ser3_en_5.2.0_3.0_1699385594161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scbert_ser3_en_5.2.0_3.0_1699385594161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("scbert_ser3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("scbert_ser3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scbert_ser3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/havens2/scBERT_SER3 \ No newline at end of file From 754fcd14e8ddf667a7765e48472bd230e57db4f9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:48:12 +0700 Subject: [PATCH 482/667] Add model 2023-11-07-bert_restore_punctuation_turkish_tr --- ...-07-bert_restore_punctuation_turkish_tr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_restore_punctuation_turkish_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_restore_punctuation_turkish_tr.md b/docs/_posts/ahmedlone127/2023-11-07-bert_restore_punctuation_turkish_tr.md new file mode 100644 index 00000000000000..1b0916bec2897a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_restore_punctuation_turkish_tr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Turkish bert_restore_punctuation_turkish BertForTokenClassification from uygarkurt +author: John Snow Labs +name: bert_restore_punctuation_turkish +date: 2023-11-07 +tags: [bert, tr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_restore_punctuation_turkish` is a Turkish model originally trained by uygarkurt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_restore_punctuation_turkish_tr_5.2.0_3.0_1699385993721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_restore_punctuation_turkish_tr_5.2.0_3.0_1699385993721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_restore_punctuation_turkish","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_restore_punctuation_turkish", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_restore_punctuation_turkish| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| + +## References + +https://huggingface.co/uygarkurt/bert-restore-punctuation-turkish \ No newline at end of file From 43bb5048273cc2696f277f68743344526752a59e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:49:13 +0700 Subject: [PATCH 483/667] Add model 2023-11-07-bert_finetuned_ner_minea_en --- .../2023-11-07-bert_finetuned_ner_minea_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_minea_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_minea_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_minea_en.md new file mode 100644 index 00000000000000..b543b0b783d196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_minea_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_minea BertForTokenClassification from minea +author: John Snow Labs +name: bert_finetuned_ner_minea +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_minea` is a English model originally trained by minea. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_minea_en_5.2.0_3.0_1699386438023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_minea_en_5.2.0_3.0_1699386438023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_minea","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_minea", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_minea| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/minea/bert-finetuned-ner \ No newline at end of file From 8a7063233bef1a8afff940e37ecea311693fb58c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:50:13 +0700 Subject: [PATCH 484/667] Add model 2023-11-07-bert_tiny_finetuned_ner_en --- .../2023-11-07-bert_tiny_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_ner_en.md new file mode 100644 index 00000000000000..490eb18664ee1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_tiny_finetuned_ner BertForTokenClassification from gagan3012 +author: John Snow Labs +name: bert_tiny_finetuned_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_finetuned_ner` is a English model originally trained by gagan3012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_finetuned_ner_en_5.2.0_3.0_1699386559106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_finetuned_ner_en_5.2.0_3.0_1699386559106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_tiny_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_tiny_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/gagan3012/bert-tiny-finetuned-ner \ No newline at end of file From 8d108de90d5f06c617d5eaa319639ecb75c1049f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:51:13 +0700 Subject: [PATCH 485/667] Add model 2023-11-07-unbias_named_entity_recognition_en --- ...1-07-unbias_named_entity_recognition_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-unbias_named_entity_recognition_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-unbias_named_entity_recognition_en.md b/docs/_posts/ahmedlone127/2023-11-07-unbias_named_entity_recognition_en.md new file mode 100644 index 00000000000000..f746cbe8c17f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-unbias_named_entity_recognition_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English unbias_named_entity_recognition BertForTokenClassification from newsmediabias +author: John Snow Labs +name: unbias_named_entity_recognition +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unbias_named_entity_recognition` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unbias_named_entity_recognition_en_5.2.0_3.0_1699386641857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unbias_named_entity_recognition_en_5.2.0_3.0_1699386641857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("unbias_named_entity_recognition","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("unbias_named_entity_recognition", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unbias_named_entity_recognition| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/newsmediabias/UnBIAS-Named-Entity-Recognition \ No newline at end of file From 0c3da7207ceb5443b54b882dfc52a86295e04e3f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:52:14 +0700 Subject: [PATCH 486/667] Add model 2023-11-07-mbert_bengali_ner_bn --- .../2023-11-07-mbert_bengali_ner_bn.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-mbert_bengali_ner_bn.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-mbert_bengali_ner_bn.md b/docs/_posts/ahmedlone127/2023-11-07-mbert_bengali_ner_bn.md new file mode 100644 index 00000000000000..ad0843cb0acc53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-mbert_bengali_ner_bn.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Bengali mbert_bengali_ner BertForTokenClassification from sagorsarker +author: John Snow Labs +name: mbert_bengali_ner +date: 2023-11-07 +tags: [bert, bn, open_source, token_classification, onnx] +task: Named Entity Recognition +language: bn +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbert_bengali_ner` is a Bengali model originally trained by sagorsarker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbert_bengali_ner_bn_5.2.0_3.0_1699386696050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbert_bengali_ner_bn_5.2.0_3.0_1699386696050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("mbert_bengali_ner","bn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("mbert_bengali_ner", "bn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbert_bengali_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|bn| +|Size:|625.5 MB| + +## References + +https://huggingface.co/sagorsarker/mbert-bengali-ner \ No newline at end of file From 93fc986ad1a68dc7a2500f72f0322285e780196c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:53:14 +0700 Subject: [PATCH 487/667] Add model 2023-11-07-roberta_finetuned_privacy_detection_zh --- ...-roberta_finetuned_privacy_detection_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-roberta_finetuned_privacy_detection_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-roberta_finetuned_privacy_detection_zh.md b/docs/_posts/ahmedlone127/2023-11-07-roberta_finetuned_privacy_detection_zh.md new file mode 100644 index 00000000000000..a10cdd8aaaabf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-roberta_finetuned_privacy_detection_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese roberta_finetuned_privacy_detection BertForTokenClassification from gyr66 +author: John Snow Labs +name: roberta_finetuned_privacy_detection +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_privacy_detection` is a Chinese model originally trained by gyr66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_privacy_detection_zh_5.2.0_3.0_1699386723039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_privacy_detection_zh_5.2.0_3.0_1699386723039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("roberta_finetuned_privacy_detection","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("roberta_finetuned_privacy_detection", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_privacy_detection| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|1.2 GB| + +## References + +https://huggingface.co/gyr66/RoBERTa-finetuned-privacy-detection \ No newline at end of file From b1bd3959e3af88d04e24c53e76bc2ea2c659a4ff Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:54:14 +0700 Subject: [PATCH 488/667] Add model 2023-11-07-zeroshotbioner_en --- .../2023-11-07-zeroshotbioner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-zeroshotbioner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-zeroshotbioner_en.md b/docs/_posts/ahmedlone127/2023-11-07-zeroshotbioner_en.md new file mode 100644 index 00000000000000..38ec98ccaa8f74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-zeroshotbioner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English zeroshotbioner BertForTokenClassification from ProdicusII +author: John Snow Labs +name: zeroshotbioner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`zeroshotbioner` is a English model originally trained by ProdicusII. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/zeroshotbioner_en_5.2.0_3.0_1699386730433.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/zeroshotbioner_en_5.2.0_3.0_1699386730433.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("zeroshotbioner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("zeroshotbioner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|zeroshotbioner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.2 MB| + +## References + +https://huggingface.co/ProdicusII/ZeroShotBioNER \ No newline at end of file From 134c5325a741c94904ea3cf96e690066bcf68d31 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:55:14 +0700 Subject: [PATCH 489/667] Add model 2023-11-07-bert_base_uncased_finetuned_scientific_eval_en --- ...se_uncased_finetuned_scientific_eval_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_finetuned_scientific_eval_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_finetuned_scientific_eval_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_finetuned_scientific_eval_en.md new file mode 100644 index 00000000000000..26ffc694f14073 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_finetuned_scientific_eval_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_scientific_eval BertForTokenClassification from reyhanemyr +author: John Snow Labs +name: bert_base_uncased_finetuned_scientific_eval +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_scientific_eval` is a English model originally trained by reyhanemyr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_scientific_eval_en_5.2.0_3.0_1699384660893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_scientific_eval_en_5.2.0_3.0_1699384660893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_finetuned_scientific_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_uncased_finetuned_scientific_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_scientific_eval| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/reyhanemyr/bert-base-uncased-finetuned-scientific-eval \ No newline at end of file From 0a6241dc393ebbea51da91a99f290f8915294239 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:56:15 +0700 Subject: [PATCH 490/667] Add model 2023-11-07-bert_base_romanian_ner_ro --- .../2023-11-07-bert_base_romanian_ner_ro.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_romanian_ner_ro.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_romanian_ner_ro.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_romanian_ner_ro.md new file mode 100644 index 00000000000000..0fe870d7f606ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_romanian_ner_ro.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian bert_base_romanian_ner BertForTokenClassification from dumitrescustefan +author: John Snow Labs +name: bert_base_romanian_ner +date: 2023-11-07 +tags: [bert, ro, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ro +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_romanian_ner` is a Moldavian, Moldovan, Romanian model originally trained by dumitrescustefan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_romanian_ner_ro_5.2.0_3.0_1699386817889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_romanian_ner_ro_5.2.0_3.0_1699386817889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_romanian_ner","ro") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_romanian_ner", "ro") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_romanian_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ro| +|Size:|464.1 MB| + +## References + +https://huggingface.co/dumitrescustefan/bert-base-romanian-ner \ No newline at end of file From 17f522c2fc0ec3af9d9779bd48d86ca7b38658de Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:57:15 +0700 Subject: [PATCH 491/667] Add model 2023-11-07-bent_pubmedbert_ner_cell_component_en --- ...7-bent_pubmedbert_ner_cell_component_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_component_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_component_en.md b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_component_en.md new file mode 100644 index 00000000000000..8c819dbc1f6c1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bent_pubmedbert_ner_cell_component_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bent_pubmedbert_ner_cell_component BertForTokenClassification from pruas +author: John Snow Labs +name: bent_pubmedbert_ner_cell_component +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bent_pubmedbert_ner_cell_component` is a English model originally trained by pruas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_cell_component_en_5.2.0_3.0_1699385532471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bent_pubmedbert_ner_cell_component_en_5.2.0_3.0_1699385532471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bent_pubmedbert_ner_cell_component","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bent_pubmedbert_ner_cell_component", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bent_pubmedbert_ner_cell_component| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/pruas/BENT-PubMedBERT-NER-Cell-Component \ No newline at end of file From fe51d44cba2ef0dbc1249ae7671be54250dc45e3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:58:15 +0700 Subject: [PATCH 492/667] Add model 2023-11-07-idrisi_lmr_en_random_typebased_en --- ...11-07-idrisi_lmr_en_random_typebased_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typebased_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typebased_en.md b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typebased_en.md new file mode 100644 index 00000000000000..89eaffa4873ff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_random_typebased_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English idrisi_lmr_en_random_typebased BertForTokenClassification from rsuwaileh +author: John Snow Labs +name: idrisi_lmr_en_random_typebased +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`idrisi_lmr_en_random_typebased` is a English model originally trained by rsuwaileh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_random_typebased_en_5.2.0_3.0_1699385564570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_random_typebased_en_5.2.0_3.0_1699385564570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("idrisi_lmr_en_random_typebased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("idrisi_lmr_en_random_typebased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|idrisi_lmr_en_random_typebased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/rsuwaileh/IDRISI-LMR-EN-random-typebased \ No newline at end of file From 118f2f0a3b1810fe5e3372eee77011163a6923c8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 02:59:15 +0700 Subject: [PATCH 493/667] Add model 2023-11-07-bert_base_cased_literary_ner_en --- ...3-11-07-bert_base_cased_literary_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_literary_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_literary_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_literary_ner_en.md new file mode 100644 index 00000000000000..cfd427dae0fee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_cased_literary_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_cased_literary_ner BertForTokenClassification from compnet-renard +author: John Snow Labs +name: bert_base_cased_literary_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_literary_ner` is a English model originally trained by compnet-renard. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_literary_ner_en_5.2.0_3.0_1699387100983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_literary_ner_en_5.2.0_3.0_1699387100983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_literary_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_cased_literary_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_literary_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/compnet-renard/bert-base-cased-literary-NER \ No newline at end of file From 841cfc1aaa85da9f982e002598303fa94277cfa5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:00:15 +0700 Subject: [PATCH 494/667] Add model 2023-11-07-scibert_ner_en --- .../ahmedlone127/2023-11-07-scibert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-scibert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-scibert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-scibert_ner_en.md new file mode 100644 index 00000000000000..ef26b4f5402bed --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-scibert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English scibert_ner BertForTokenClassification from devanshrj +author: John Snow Labs +name: scibert_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_ner` is a English model originally trained by devanshrj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_ner_en_5.2.0_3.0_1699387144110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_ner_en_5.2.0_3.0_1699387144110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("scibert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("scibert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/devanshrj/scibert-ner \ No newline at end of file From f61bc36724a51deca54c0dbbee88d6aa33fbf178 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:01:16 +0700 Subject: [PATCH 495/667] Add model 2023-11-07-ade_bio_clinicalbert_ner_en --- .../2023-11-07-ade_bio_clinicalbert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ade_bio_clinicalbert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ade_bio_clinicalbert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-ade_bio_clinicalbert_ner_en.md new file mode 100644 index 00000000000000..37686dd7ee6452 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ade_bio_clinicalbert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ade_bio_clinicalbert_ner BertForTokenClassification from commanderstrife +author: John Snow Labs +name: ade_bio_clinicalbert_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ade_bio_clinicalbert_ner` is a English model originally trained by commanderstrife. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ade_bio_clinicalbert_ner_en_5.2.0_3.0_1699386038757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ade_bio_clinicalbert_ner_en_5.2.0_3.0_1699386038757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ade_bio_clinicalbert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ade_bio_clinicalbert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ade_bio_clinicalbert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/commanderstrife/ADE-Bio_ClinicalBERT-NER \ No newline at end of file From 31690b3784dae3cd7e061fec87240fd8ffb4f5bc Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:02:16 +0700 Subject: [PATCH 496/667] Add model 2023-11-07-sindhi_ner_v2_en --- .../2023-11-07-sindhi_ner_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-sindhi_ner_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-sindhi_ner_v2_en.md b/docs/_posts/ahmedlone127/2023-11-07-sindhi_ner_v2_en.md new file mode 100644 index 00000000000000..2afd4a307976cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-sindhi_ner_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English sindhi_ner_v2 BertForTokenClassification from EMBO +author: John Snow Labs +name: sindhi_ner_v2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sindhi_ner_v2` is a English model originally trained by EMBO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sindhi_ner_v2_en_5.2.0_3.0_1699387293106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sindhi_ner_v2_en_5.2.0_3.0_1699387293106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("sindhi_ner_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("sindhi_ner_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sindhi_ner_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/EMBO/sd-ner-v2 \ No newline at end of file From c885dfc823fed2d4e81105526d8efb4df3a01060 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:03:17 +0700 Subject: [PATCH 497/667] Add model 2023-11-07-medical_condition_annotator_en --- ...23-11-07-medical_condition_annotator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-medical_condition_annotator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-medical_condition_annotator_en.md b/docs/_posts/ahmedlone127/2023-11-07-medical_condition_annotator_en.md new file mode 100644 index 00000000000000..a2e266d6a6eb25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-medical_condition_annotator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English medical_condition_annotator BertForTokenClassification from cp500 +author: John Snow Labs +name: medical_condition_annotator +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_condition_annotator` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_condition_annotator_en_5.2.0_3.0_1699387343846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_condition_annotator_en_5.2.0_3.0_1699387343846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("medical_condition_annotator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("medical_condition_annotator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_condition_annotator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.5 MB| + +## References + +https://huggingface.co/cp500/Medical_condition_annotator \ No newline at end of file From f146cac45bbcf299d853a4a1b9f0512419963706 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:04:17 +0700 Subject: [PATCH 498/667] Add model 2023-11-07-clinicalnerpt_laboratory_pt --- .../2023-11-07-clinicalnerpt_laboratory_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_laboratory_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_laboratory_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_laboratory_pt.md new file mode 100644 index 00000000000000..c11901f8700ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_laboratory_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_laboratory BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_laboratory +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_laboratory` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_laboratory_pt_5.2.0_3.0_1699384818446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_laboratory_pt_5.2.0_3.0_1699384818446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_laboratory","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_laboratory", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_laboratory| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-laboratory \ No newline at end of file From d93dee86e33edc4424045589769b07a1dfa57a07 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:05:17 +0700 Subject: [PATCH 499/667] Add model 2023-11-07-biobert_diseases_ner_sschet_en --- ...23-11-07-biobert_diseases_ner_sschet_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_sschet_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_sschet_en.md b/docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_sschet_en.md new file mode 100644 index 00000000000000..58b0eab7410256 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biobert_diseases_ner_sschet_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biobert_diseases_ner_sschet BertForTokenClassification from sschet +author: John Snow Labs +name: biobert_diseases_ner_sschet +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_diseases_ner_sschet` is a English model originally trained by sschet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_diseases_ner_sschet_en_5.2.0_3.0_1699387452087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_diseases_ner_sschet_en_5.2.0_3.0_1699387452087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biobert_diseases_ner_sschet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biobert_diseases_ner_sschet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_diseases_ner_sschet| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/sschet/biobert_diseases_ner \ No newline at end of file From f80fde94ac190b035321fae0ed98b841277c6e1a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:06:17 +0700 Subject: [PATCH 500/667] Add model 2023-11-07-unicausal_tok_baseline_en --- .../2023-11-07-unicausal_tok_baseline_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-unicausal_tok_baseline_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-unicausal_tok_baseline_en.md b/docs/_posts/ahmedlone127/2023-11-07-unicausal_tok_baseline_en.md new file mode 100644 index 00000000000000..6aa83ed26f94d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-unicausal_tok_baseline_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English unicausal_tok_baseline BertForTokenClassification from tanfiona +author: John Snow Labs +name: unicausal_tok_baseline +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unicausal_tok_baseline` is a English model originally trained by tanfiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unicausal_tok_baseline_en_5.2.0_3.0_1699387468133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unicausal_tok_baseline_en_5.2.0_3.0_1699387468133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("unicausal_tok_baseline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("unicausal_tok_baseline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unicausal_tok_baseline| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/tanfiona/unicausal-tok-baseline \ No newline at end of file From b40b8a4a35bfb40bf2a8607ce3871d93fb2794ba Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:07:17 +0700 Subject: [PATCH 501/667] Add model 2023-11-07-chinese_address_ner_en --- .../2023-11-07-chinese_address_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-chinese_address_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-chinese_address_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-chinese_address_ner_en.md new file mode 100644 index 00000000000000..7c9f6dccbb8d76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-chinese_address_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English chinese_address_ner BertForTokenClassification from jiaqianjing +author: John Snow Labs +name: chinese_address_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_address_ner` is a English model originally trained by jiaqianjing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_address_ner_en_5.2.0_3.0_1699386241645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_address_ner_en_5.2.0_3.0_1699386241645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("chinese_address_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("chinese_address_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_address_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/jiaqianjing/chinese-address-ner \ No newline at end of file From 146c86339c0d29f8a1d4c858794500af65762053 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:08:18 +0700 Subject: [PATCH 502/667] Add model 2023-11-07-dbbert_pos_en --- .../ahmedlone127/2023-11-07-dbbert_pos_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-dbbert_pos_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-dbbert_pos_en.md b/docs/_posts/ahmedlone127/2023-11-07-dbbert_pos_en.md new file mode 100644 index 00000000000000..209eb8c7f19cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-dbbert_pos_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English dbbert_pos BertForTokenClassification from colinswaelens +author: John Snow Labs +name: dbbert_pos +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dbbert_pos` is a English model originally trained by colinswaelens. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dbbert_pos_en_5.2.0_3.0_1699387552772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dbbert_pos_en_5.2.0_3.0_1699387552772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("dbbert_pos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("dbbert_pos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dbbert_pos| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/colinswaelens/DBBErt_POS \ No newline at end of file From b2651f3bfdcccaf7645e35fc3ac68be8d06b6f84 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:09:18 +0700 Subject: [PATCH 503/667] Add model 2023-11-07-gbert_legal_ner_de --- .../2023-11-07-gbert_legal_ner_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-gbert_legal_ner_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-gbert_legal_ner_de.md b/docs/_posts/ahmedlone127/2023-11-07-gbert_legal_ner_de.md new file mode 100644 index 00000000000000..78c089ec9674d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-gbert_legal_ner_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German gbert_legal_ner BertForTokenClassification from PaDaS-Lab +author: John Snow Labs +name: gbert_legal_ner +date: 2023-11-07 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gbert_legal_ner` is a German model originally trained by PaDaS-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gbert_legal_ner_de_5.2.0_3.0_1699387731402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gbert_legal_ner_de_5.2.0_3.0_1699387731402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("gbert_legal_ner","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("gbert_legal_ner", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gbert_legal_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|407.0 MB| + +## References + +https://huggingface.co/PaDaS-Lab/gbert-legal-ner \ No newline at end of file From 202a8e0591012f638039d37687ff57b41f4f0682 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:10:19 +0700 Subject: [PATCH 504/667] Add model 2023-11-07-ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en --- ...netuned_500k_adamw_3_epoch_locations_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en.md b/docs/_posts/ahmedlone127/2023-11-07-ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en.md new file mode 100644 index 00000000000000..9cd05f97be7880 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations BertForTokenClassification from poodledude +author: John Snow Labs +name: ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations` is a English model originally trained by poodledude. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en_5.2.0_3.0_1699386946731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations_en_5.2.0_3.0_1699386946731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_test_bert_base_uncased_finetuned_500k_adamw_3_epoch_locations| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/poodledude/ner-test-bert-base-uncased-finetuned-500K-AdamW-3-epoch-locations \ No newline at end of file From 1a2f376cd3136fe706c7e9f4a0ddfb427d9e65c4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:11:19 +0700 Subject: [PATCH 505/667] Add model 2023-11-07-bert_finetuned_history_ner_en --- ...023-11-07-bert_finetuned_history_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_history_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_history_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_history_ner_en.md new file mode 100644 index 00000000000000..b6c96110e4e419 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_history_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_history_ner BertForTokenClassification from QuanAI +author: John Snow Labs +name: bert_finetuned_history_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_history_ner` is a English model originally trained by QuanAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_history_ner_en_5.2.0_3.0_1699387810524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_history_ner_en_5.2.0_3.0_1699387810524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_history_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_history_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_history_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/QuanAI/bert-finetuned-history-ner \ No newline at end of file From 0e1398bd14be621b38a66a83e700bea4a2a779d1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:12:19 +0700 Subject: [PATCH 506/667] Add model 2023-11-07-comp_seqlab_dslim_bert_en --- .../2023-11-07-comp_seqlab_dslim_bert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-comp_seqlab_dslim_bert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-comp_seqlab_dslim_bert_en.md b/docs/_posts/ahmedlone127/2023-11-07-comp_seqlab_dslim_bert_en.md new file mode 100644 index 00000000000000..91b72c2ee0633e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-comp_seqlab_dslim_bert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English comp_seqlab_dslim_bert BertForTokenClassification from uhhlt +author: John Snow Labs +name: comp_seqlab_dslim_bert +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`comp_seqlab_dslim_bert` is a English model originally trained by uhhlt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/comp_seqlab_dslim_bert_en_5.2.0_3.0_1699387870385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/comp_seqlab_dslim_bert_en_5.2.0_3.0_1699387870385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("comp_seqlab_dslim_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("comp_seqlab_dslim_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|comp_seqlab_dslim_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/uhhlt/comp-seqlab-dslim-bert \ No newline at end of file From dd10a8647feb242dbbc1fcec0233456e3da307f5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:13:19 +0700 Subject: [PATCH 507/667] Add model 2023-11-07-bert_finetuned_ner_lamthanhtin2811_en --- ...7-bert_finetuned_ner_lamthanhtin2811_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lamthanhtin2811_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lamthanhtin2811_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lamthanhtin2811_en.md new file mode 100644 index 00000000000000..65e9b1c62b2c99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_lamthanhtin2811_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_lamthanhtin2811 BertForTokenClassification from lamthanhtin2811 +author: John Snow Labs +name: bert_finetuned_ner_lamthanhtin2811 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_lamthanhtin2811` is a English model originally trained by lamthanhtin2811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_lamthanhtin2811_en_5.2.0_3.0_1699387480751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_lamthanhtin2811_en_5.2.0_3.0_1699387480751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_lamthanhtin2811","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_lamthanhtin2811", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_lamthanhtin2811| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lamthanhtin2811/bert-finetuned-ner \ No newline at end of file From 3e87641b144f52a3b103dd372f5c7e8728d5286d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:14:20 +0700 Subject: [PATCH 508/667] Add model 2023-11-07-pico_ner_adapter_en --- .../2023-11-07-pico_ner_adapter_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-pico_ner_adapter_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-pico_ner_adapter_en.md b/docs/_posts/ahmedlone127/2023-11-07-pico_ner_adapter_en.md new file mode 100644 index 00000000000000..ade310de24639d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-pico_ner_adapter_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English pico_ner_adapter BertForTokenClassification from reginaboateng +author: John Snow Labs +name: pico_ner_adapter +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pico_ner_adapter` is a English model originally trained by reginaboateng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pico_ner_adapter_en_5.2.0_3.0_1699388021718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pico_ner_adapter_en_5.2.0_3.0_1699388021718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("pico_ner_adapter","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("pico_ner_adapter", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pico_ner_adapter| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/reginaboateng/pico_ner_adapter \ No newline at end of file From 160ea2d4a9420604d284c351e0c176a3d538b8f2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:15:20 +0700 Subject: [PATCH 509/667] Add model 2023-11-07-personal_noun_detection_german_bert_de --- ...-personal_noun_detection_german_bert_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-personal_noun_detection_german_bert_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-personal_noun_detection_german_bert_de.md b/docs/_posts/ahmedlone127/2023-11-07-personal_noun_detection_german_bert_de.md new file mode 100644 index 00000000000000..9cd79e25c8da67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-personal_noun_detection_german_bert_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German personal_noun_detection_german_bert BertForTokenClassification from CarlaSoe +author: John Snow Labs +name: personal_noun_detection_german_bert +date: 2023-11-07 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personal_noun_detection_german_bert` is a German model originally trained by CarlaSoe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personal_noun_detection_german_bert_de_5.2.0_3.0_1699388076431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personal_noun_detection_german_bert_de_5.2.0_3.0_1699388076431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("personal_noun_detection_german_bert","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("personal_noun_detection_german_bert", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personal_noun_detection_german_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/CarlaSoe/personal-noun-detection-german-bert \ No newline at end of file From 9e03ef245b1e600a13c91a9cde10bf4c264c7be8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:16:20 +0700 Subject: [PATCH 510/667] Add model 2023-11-07-indobertweet_finetuned_ijelid_en --- ...-11-07-indobertweet_finetuned_ijelid_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-indobertweet_finetuned_ijelid_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-indobertweet_finetuned_ijelid_en.md b/docs/_posts/ahmedlone127/2023-11-07-indobertweet_finetuned_ijelid_en.md new file mode 100644 index 00000000000000..69e68ac318b9c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-indobertweet_finetuned_ijelid_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English indobertweet_finetuned_ijelid BertForTokenClassification from fathan +author: John Snow Labs +name: indobertweet_finetuned_ijelid +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobertweet_finetuned_ijelid` is a English model originally trained by fathan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobertweet_finetuned_ijelid_en_5.2.0_3.0_1699387609403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobertweet_finetuned_ijelid_en_5.2.0_3.0_1699387609403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("indobertweet_finetuned_ijelid","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("indobertweet_finetuned_ijelid", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobertweet_finetuned_ijelid| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|411.8 MB| + +## References + +https://huggingface.co/fathan/indobertweet-finetuned-ijelid \ No newline at end of file From 699ad20859fee7f895428e20018024242872fd9a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:17:20 +0700 Subject: [PATCH 511/667] Add model 2023-11-07-german_english_code_switching_identification_en --- ...nglish_code_switching_identification_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-german_english_code_switching_identification_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-german_english_code_switching_identification_en.md b/docs/_posts/ahmedlone127/2023-11-07-german_english_code_switching_identification_en.md new file mode 100644 index 00000000000000..134bdfd852c490 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-german_english_code_switching_identification_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English german_english_code_switching_identification BertForTokenClassification from igorsterner +author: John Snow Labs +name: german_english_code_switching_identification +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`german_english_code_switching_identification` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/german_english_code_switching_identification_en_5.2.0_3.0_1699388184147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/german_english_code_switching_identification_en_5.2.0_3.0_1699388184147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("german_english_code_switching_identification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("german_english_code_switching_identification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|german_english_code_switching_identification| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-identification \ No newline at end of file From 983cdd71283e1e84d4bc99a4aab193b26a1a0ad9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:18:20 +0700 Subject: [PATCH 512/667] Add model 2023-11-07-sindhi_panelization_v2_en --- .../2023-11-07-sindhi_panelization_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-sindhi_panelization_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-sindhi_panelization_v2_en.md b/docs/_posts/ahmedlone127/2023-11-07-sindhi_panelization_v2_en.md new file mode 100644 index 00000000000000..e691e5e2eb55aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-sindhi_panelization_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English sindhi_panelization_v2 BertForTokenClassification from EMBO +author: John Snow Labs +name: sindhi_panelization_v2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sindhi_panelization_v2` is a English model originally trained by EMBO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sindhi_panelization_v2_en_5.2.0_3.0_1699388220382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sindhi_panelization_v2_en_5.2.0_3.0_1699388220382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("sindhi_panelization_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("sindhi_panelization_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sindhi_panelization_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/EMBO/sd-panelization-v2 \ No newline at end of file From 6080d51d63b4f87c042a263193c5b3445aef655b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:19:21 +0700 Subject: [PATCH 513/667] Add model 2023-11-07-indobert_large_p2_finetuned_chunking_id --- ...indobert_large_p2_finetuned_chunking_id.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-indobert_large_p2_finetuned_chunking_id.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-indobert_large_p2_finetuned_chunking_id.md b/docs/_posts/ahmedlone127/2023-11-07-indobert_large_p2_finetuned_chunking_id.md new file mode 100644 index 00000000000000..3ee6fa71b62b7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-indobert_large_p2_finetuned_chunking_id.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Indonesian indobert_large_p2_finetuned_chunking BertForTokenClassification from ageng-anugrah +author: John Snow Labs +name: indobert_large_p2_finetuned_chunking +date: 2023-11-07 +tags: [bert, id, open_source, token_classification, onnx] +task: Named Entity Recognition +language: id +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_large_p2_finetuned_chunking` is a Indonesian model originally trained by ageng-anugrah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_large_p2_finetuned_chunking_id_5.2.0_3.0_1699385766100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_large_p2_finetuned_chunking_id_5.2.0_3.0_1699385766100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("indobert_large_p2_finetuned_chunking","id") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("indobert_large_p2_finetuned_chunking", "id") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_large_p2_finetuned_chunking| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|id| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ageng-anugrah/indobert-large-p2-finetuned-chunking \ No newline at end of file From 2d92b6dfa84a1c234e99b8d1d78b15a5771d44ee Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:20:21 +0700 Subject: [PATCH 514/667] Add model 2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en --- ...ed_abstract_fulltext_ft_ncbi_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en.md b/docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en.md new file mode 100644 index 00000000000000..f43fa7a4dc255f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease BertForTokenClassification from sarahmiller137 +author: John Snow Labs +name: biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease` is a English model originally trained by sarahmiller137. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en_5.2.0_3.0_1699388250102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease_en_5.2.0_3.0_1699388250102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomednlp_pubmedbert_base_uncased_abstract_fulltext_ft_ncbi_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/sarahmiller137/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-ft-ncbi-disease \ No newline at end of file From 7ba4fe4386d4f8299ada9e11cb9a16d06cbad3a2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:21:21 +0700 Subject: [PATCH 515/667] Add model 2023-11-07-bpmn_information_extraction_v2_en --- ...11-07-bpmn_information_extraction_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_v2_en.md b/docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_v2_en.md new file mode 100644 index 00000000000000..276260efedf2c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bpmn_information_extraction_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bpmn_information_extraction_v2 BertForTokenClassification from jtlicardo +author: John Snow Labs +name: bpmn_information_extraction_v2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpmn_information_extraction_v2` is a English model originally trained by jtlicardo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpmn_information_extraction_v2_en_5.2.0_3.0_1699387855385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpmn_information_extraction_v2_en_5.2.0_3.0_1699387855385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bpmn_information_extraction_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bpmn_information_extraction_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpmn_information_extraction_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/jtlicardo/bpmn-information-extraction-v2 \ No newline at end of file From 1e4df0acd5e7d8d478ff19795ed2093982a4867d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:22:21 +0700 Subject: [PATCH 516/667] Add model 2023-11-07-bert_tagalog_base_uncased_sayula_popoluca_tagger_tl --- ..._base_uncased_sayula_popoluca_tagger_tl.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_tagalog_base_uncased_sayula_popoluca_tagger_tl.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_tagalog_base_uncased_sayula_popoluca_tagger_tl.md b/docs/_posts/ahmedlone127/2023-11-07-bert_tagalog_base_uncased_sayula_popoluca_tagger_tl.md new file mode 100644 index 00000000000000..6fb4cf87d53476 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_tagalog_base_uncased_sayula_popoluca_tagger_tl.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Tagalog bert_tagalog_base_uncased_sayula_popoluca_tagger BertForTokenClassification from syke9p3 +author: John Snow Labs +name: bert_tagalog_base_uncased_sayula_popoluca_tagger +date: 2023-11-07 +tags: [bert, tl, open_source, token_classification, onnx] +task: Named Entity Recognition +language: tl +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tagalog_base_uncased_sayula_popoluca_tagger` is a Tagalog model originally trained by syke9p3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_sayula_popoluca_tagger_tl_5.2.0_3.0_1699388445151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_sayula_popoluca_tagger_tl_5.2.0_3.0_1699388445151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_tagalog_base_uncased_sayula_popoluca_tagger","tl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_tagalog_base_uncased_sayula_popoluca_tagger", "tl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tagalog_base_uncased_sayula_popoluca_tagger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|tl| +|Size:|470.3 MB| + +## References + +https://huggingface.co/syke9p3/bert-tagalog-base-uncased-pos-tagger \ No newline at end of file From 4e4a287f7b9ec091f2068c912398abc154f3a402 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:23:22 +0700 Subject: [PATCH 517/667] Add model 2023-11-07-bert_large_portuguese_ner_enamex_pt --- ...-07-bert_large_portuguese_ner_enamex_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_large_portuguese_ner_enamex_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_large_portuguese_ner_enamex_pt.md b/docs/_posts/ahmedlone127/2023-11-07-bert_large_portuguese_ner_enamex_pt.md new file mode 100644 index 00000000000000..9be2e45f68881a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_large_portuguese_ner_enamex_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese bert_large_portuguese_ner_enamex BertForTokenClassification from marcosgg +author: John Snow Labs +name: bert_large_portuguese_ner_enamex +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_portuguese_ner_enamex` is a Portuguese model originally trained by marcosgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_ner_enamex_pt_5.2.0_3.0_1699388439394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_ner_enamex_pt_5.2.0_3.0_1699388439394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_portuguese_ner_enamex","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_large_portuguese_ner_enamex", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_portuguese_ner_enamex| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/marcosgg/bert-large-pt-ner-enamex \ No newline at end of file From e20972bf2b7e9eea8d948372ba737f5b7defd4a2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:24:22 +0700 Subject: [PATCH 518/667] Add model 2023-11-07-biolinkbert_base_finetuned_n2c2_ner_en --- ...-biolinkbert_base_finetuned_n2c2_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biolinkbert_base_finetuned_n2c2_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biolinkbert_base_finetuned_n2c2_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-biolinkbert_base_finetuned_n2c2_ner_en.md new file mode 100644 index 00000000000000..77e71bde34a3c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biolinkbert_base_finetuned_n2c2_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biolinkbert_base_finetuned_n2c2_ner BertForTokenClassification from georgeleung30 +author: John Snow Labs +name: biolinkbert_base_finetuned_n2c2_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biolinkbert_base_finetuned_n2c2_ner` is a English model originally trained by georgeleung30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biolinkbert_base_finetuned_n2c2_ner_en_5.2.0_3.0_1699387642688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biolinkbert_base_finetuned_n2c2_ner_en_5.2.0_3.0_1699387642688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biolinkbert_base_finetuned_n2c2_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biolinkbert_base_finetuned_n2c2_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biolinkbert_base_finetuned_n2c2_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/georgeleung30/BioLinkBERT-base-finetuned-n2c2-ner \ No newline at end of file From 5a958af34f5e544ecb83bd440b939877a09e2215 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:25:22 +0700 Subject: [PATCH 519/667] Add model 2023-11-07-toponym_19thc_english_en --- .../2023-11-07-toponym_19thc_english_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-toponym_19thc_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-toponym_19thc_english_en.md b/docs/_posts/ahmedlone127/2023-11-07-toponym_19thc_english_en.md new file mode 100644 index 00000000000000..d11bd7d01f134f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-toponym_19thc_english_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English toponym_19thc_english BertForTokenClassification from Livingwithmachines +author: John Snow Labs +name: toponym_19thc_english +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toponym_19thc_english` is a English model originally trained by Livingwithmachines. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toponym_19thc_english_en_5.2.0_3.0_1699388663159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toponym_19thc_english_en_5.2.0_3.0_1699388663159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("toponym_19thc_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("toponym_19thc_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toponym_19thc_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/Livingwithmachines/toponym-19thC-en \ No newline at end of file From f078df74dbd77514319013e7f2cb344b1b396604 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:26:23 +0700 Subject: [PATCH 520/667] Add model 2023-11-07-legal_bert_ner_base_cased_ptbr_pt --- ...11-07-legal_bert_ner_base_cased_ptbr_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-legal_bert_ner_base_cased_ptbr_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-legal_bert_ner_base_cased_ptbr_pt.md b/docs/_posts/ahmedlone127/2023-11-07-legal_bert_ner_base_cased_ptbr_pt.md new file mode 100644 index 00000000000000..4c385926fa641f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-legal_bert_ner_base_cased_ptbr_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese legal_bert_ner_base_cased_ptbr BertForTokenClassification from dominguesm +author: John Snow Labs +name: legal_bert_ner_base_cased_ptbr +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_ner_base_cased_ptbr` is a Portuguese model originally trained by dominguesm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_ner_base_cased_ptbr_pt_5.2.0_3.0_1699388720293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_ner_base_cased_ptbr_pt_5.2.0_3.0_1699388720293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("legal_bert_ner_base_cased_ptbr","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("legal_bert_ner_base_cased_ptbr", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_ner_base_cased_ptbr| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|405.9 MB| + +## References + +https://huggingface.co/dominguesm/legal-bert-ner-base-cased-ptbr \ No newline at end of file From e80a4809bcadcfe452a6ca22635b296c67b5b260 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:27:23 +0700 Subject: [PATCH 521/667] Add model 2023-11-07-sindhi_geneprod_roles_v2_en --- .../2023-11-07-sindhi_geneprod_roles_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-sindhi_geneprod_roles_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-sindhi_geneprod_roles_v2_en.md b/docs/_posts/ahmedlone127/2023-11-07-sindhi_geneprod_roles_v2_en.md new file mode 100644 index 00000000000000..6fbbf2293c047c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-sindhi_geneprod_roles_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English sindhi_geneprod_roles_v2 BertForTokenClassification from EMBO +author: John Snow Labs +name: sindhi_geneprod_roles_v2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sindhi_geneprod_roles_v2` is a English model originally trained by EMBO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sindhi_geneprod_roles_v2_en_5.2.0_3.0_1699388297204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sindhi_geneprod_roles_v2_en_5.2.0_3.0_1699388297204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("sindhi_geneprod_roles_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("sindhi_geneprod_roles_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sindhi_geneprod_roles_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/EMBO/sd-geneprod-roles-v2 \ No newline at end of file From 86f6993901cded70a925923e5b890cc643b78056 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:28:23 +0700 Subject: [PATCH 522/667] Add model 2023-11-07-bert_large_cased_ft_ner_maplestory_en --- ...7-bert_large_cased_ft_ner_maplestory_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_large_cased_ft_ner_maplestory_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_large_cased_ft_ner_maplestory_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_large_cased_ft_ner_maplestory_en.md new file mode 100644 index 00000000000000..cce057309575bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_large_cased_ft_ner_maplestory_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_large_cased_ft_ner_maplestory BertForTokenClassification from nxaliao +author: John Snow Labs +name: bert_large_cased_ft_ner_maplestory +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_ft_ner_maplestory` is a English model originally trained by nxaliao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_ft_ner_maplestory_en_5.2.0_3.0_1699388820984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_ft_ner_maplestory_en_5.2.0_3.0_1699388820984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_ft_ner_maplestory","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_large_cased_ft_ner_maplestory", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_ft_ner_maplestory| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/nxaliao/bert-large-cased-ft-ner-maplestory \ No newline at end of file From 4272581117ba78b36544d7dfd4ab96c67cccc517 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:29:23 +0700 Subject: [PATCH 523/667] Add model 2023-11-07-bert_base_portuguese_ner_enamex_pt --- ...1-07-bert_base_portuguese_ner_enamex_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_portuguese_ner_enamex_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_portuguese_ner_enamex_pt.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_portuguese_ner_enamex_pt.md new file mode 100644 index 00000000000000..9569d5d453035f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_portuguese_ner_enamex_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_ner_enamex BertForTokenClassification from marcosgg +author: John Snow Labs +name: bert_base_portuguese_ner_enamex +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_ner_enamex` is a Portuguese model originally trained by marcosgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_ner_enamex_pt_5.2.0_3.0_1699388924890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_ner_enamex_pt_5.2.0_3.0_1699388924890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_portuguese_ner_enamex","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_portuguese_ner_enamex", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_ner_enamex| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|405.9 MB| + +## References + +https://huggingface.co/marcosgg/bert-base-pt-ner-enamex \ No newline at end of file From f09dcd59525a7be0dc2f8b14ac6248d3674f1cfe Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:30:24 +0700 Subject: [PATCH 524/667] Add model 2023-11-07-hebert_medical_ner_fixed_labels_v1_en --- ...7-hebert_medical_ner_fixed_labels_v1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v1_en.md b/docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v1_en.md new file mode 100644 index 00000000000000..9def0193375fa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-hebert_medical_ner_fixed_labels_v1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English hebert_medical_ner_fixed_labels_v1 BertForTokenClassification from cp500 +author: John Snow Labs +name: hebert_medical_ner_fixed_labels_v1 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hebert_medical_ner_fixed_labels_v1` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hebert_medical_ner_fixed_labels_v1_en_5.2.0_3.0_1699388943686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hebert_medical_ner_fixed_labels_v1_en_5.2.0_3.0_1699388943686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("hebert_medical_ner_fixed_labels_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("hebert_medical_ner_fixed_labels_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hebert_medical_ner_fixed_labels_v1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.5 MB| + +## References + +https://huggingface.co/cp500/hebert_medical_ner_fixed_labels_v1 \ No newline at end of file From 8986ab5c6ddb015db2935130123d081cc64ad4b9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:31:24 +0700 Subject: [PATCH 525/667] Add model 2023-11-07-multilingual_arabic_token_classification_model_xx --- ...al_arabic_token_classification_model_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-multilingual_arabic_token_classification_model_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-multilingual_arabic_token_classification_model_xx.md b/docs/_posts/ahmedlone127/2023-11-07-multilingual_arabic_token_classification_model_xx.md new file mode 100644 index 00000000000000..bd8df3c0f44011 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-multilingual_arabic_token_classification_model_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual multilingual_arabic_token_classification_model BertForTokenClassification from Cabooose +author: John Snow Labs +name: multilingual_arabic_token_classification_model +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_arabic_token_classification_model` is a Multilingual model originally trained by Cabooose. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_arabic_token_classification_model_xx_5.2.0_3.0_1699388993826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_arabic_token_classification_model_xx_5.2.0_3.0_1699388993826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("multilingual_arabic_token_classification_model","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("multilingual_arabic_token_classification_model", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_arabic_token_classification_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Cabooose/multilingual_arabic_token_classification_model \ No newline at end of file From ee733ed2e610a878f0b4222f42dfe5d8b0c396a7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:32:24 +0700 Subject: [PATCH 526/667] Add model 2023-11-07-clinicalnerpt_sign_pt --- .../2023-11-07-clinicalnerpt_sign_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_sign_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_sign_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_sign_pt.md new file mode 100644 index 00000000000000..fb184275143c31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_sign_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_sign BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_sign +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_sign` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_sign_pt_5.2.0_3.0_1699388993710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_sign_pt_5.2.0_3.0_1699388993710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_sign","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_sign", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_sign| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-sign \ No newline at end of file From 4ce0c9ac701560f128cf6c8bf9a19df04d419fc8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:33:24 +0700 Subject: [PATCH 527/667] Add model 2023-11-07-bert_base_chinese_finetuned_ner_danielwei0214_zh --- ..._chinese_finetuned_ner_danielwei0214_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_danielwei0214_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_danielwei0214_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_danielwei0214_zh.md new file mode 100644 index 00000000000000..19019210b5ed47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_danielwei0214_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_base_chinese_finetuned_ner_danielwei0214 BertForTokenClassification from Danielwei0214 +author: John Snow Labs +name: bert_base_chinese_finetuned_ner_danielwei0214 +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner_danielwei0214` is a Chinese model originally trained by Danielwei0214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_danielwei0214_zh_5.2.0_3.0_1699389183060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_danielwei0214_zh_5.2.0_3.0_1699389183060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner_danielwei0214","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_chinese_finetuned_ner_danielwei0214", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner_danielwei0214| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/Danielwei0214/bert-base-chinese-finetuned-ner \ No newline at end of file From f67f37a5d6d1bb8ce21ba664ea1fc95ab0b31038 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:34:25 +0700 Subject: [PATCH 528/667] Add model 2023-11-07-hindi_bert_ner_en --- .../2023-11-07-hindi_bert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-hindi_bert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-hindi_bert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-hindi_bert_ner_en.md new file mode 100644 index 00000000000000..5c426a50c0efbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-hindi_bert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English hindi_bert_ner BertForTokenClassification from mirfan899 +author: John Snow Labs +name: hindi_bert_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_bert_ner` is a English model originally trained by mirfan899. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_bert_ner_en_5.2.0_3.0_1699389197767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_bert_ner_en_5.2.0_3.0_1699389197767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("hindi_bert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("hindi_bert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_bert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/mirfan899/hindi-bert-ner \ No newline at end of file From 7cf827a3505e66bf481b4855449581b964e6d590 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:35:25 +0700 Subject: [PATCH 529/667] Add model 2023-11-07-clinicalnerpt_finding_pt --- .../2023-11-07-clinicalnerpt_finding_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_finding_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_finding_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_finding_pt.md new file mode 100644 index 00000000000000..a79fb357e325d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_finding_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_finding BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_finding +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_finding` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_finding_pt_5.2.0_3.0_1699389281791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_finding_pt_5.2.0_3.0_1699389281791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_finding","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_finding", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_finding| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-finding \ No newline at end of file From dde4731072ed20ece40a6689a8afa22e5a04c876 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:36:25 +0700 Subject: [PATCH 530/667] Add model 2023-11-07-mbert_finetuned_ner_en --- .../2023-11-07-mbert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-mbert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-mbert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-mbert_finetuned_ner_en.md new file mode 100644 index 00000000000000..81e650df1f473e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-mbert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English mbert_finetuned_ner BertForTokenClassification from Andrey1989 +author: John Snow Labs +name: mbert_finetuned_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbert_finetuned_ner` is a English model originally trained by Andrey1989. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbert_finetuned_ner_en_5.2.0_3.0_1699386433257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbert_finetuned_ner_en_5.2.0_3.0_1699386433257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("mbert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("mbert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Andrey1989/mbert-finetuned-ner \ No newline at end of file From a9b84f97b7b0db63202299231acf50bd60531f91 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:37:26 +0700 Subject: [PATCH 531/667] Add model 2023-11-07-sindhi_smallmol_roles_v2_en --- .../2023-11-07-sindhi_smallmol_roles_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-sindhi_smallmol_roles_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-sindhi_smallmol_roles_v2_en.md b/docs/_posts/ahmedlone127/2023-11-07-sindhi_smallmol_roles_v2_en.md new file mode 100644 index 00000000000000..b49b0b26f4d9d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-sindhi_smallmol_roles_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English sindhi_smallmol_roles_v2 BertForTokenClassification from EMBO +author: John Snow Labs +name: sindhi_smallmol_roles_v2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sindhi_smallmol_roles_v2` is a English model originally trained by EMBO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sindhi_smallmol_roles_v2_en_5.2.0_3.0_1699387916066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sindhi_smallmol_roles_v2_en_5.2.0_3.0_1699387916066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("sindhi_smallmol_roles_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("sindhi_smallmol_roles_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sindhi_smallmol_roles_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/EMBO/sd-smallmol-roles-v2 \ No newline at end of file From 5e7583773f451d76891717803d23c650e407097b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:38:26 +0700 Subject: [PATCH 532/667] Add model 2023-11-07-idrisi_lmr_en_timebased_typebased_en --- ...07-idrisi_lmr_en_timebased_typebased_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typebased_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typebased_en.md b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typebased_en.md new file mode 100644 index 00000000000000..e209f099cf9564 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typebased_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English idrisi_lmr_en_timebased_typebased BertForTokenClassification from rsuwaileh +author: John Snow Labs +name: idrisi_lmr_en_timebased_typebased +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`idrisi_lmr_en_timebased_typebased` is a English model originally trained by rsuwaileh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_timebased_typebased_en_5.2.0_3.0_1699387252805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_timebased_typebased_en_5.2.0_3.0_1699387252805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("idrisi_lmr_en_timebased_typebased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("idrisi_lmr_en_timebased_typebased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|idrisi_lmr_en_timebased_typebased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/rsuwaileh/IDRISI-LMR-EN-timebased-typebased \ No newline at end of file From 1413034b25c624c8cf9ad2a7789e753db3a4011e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:39:26 +0700 Subject: [PATCH 533/667] Add model 2023-11-07-nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es --- ...isease_competencia2_bert_medical_ner_es.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es.md b/docs/_posts/ahmedlone127/2023-11-07-nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es.md new file mode 100644 index 00000000000000..2247efd28bdd45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Castilian, Spanish nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner BertForTokenClassification from pineiden +author: John Snow Labs +name: nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner +date: 2023-11-07 +tags: [bert, es, open_source, token_classification, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner` is a Castilian, Spanish model originally trained by pineiden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es_5.2.0_3.0_1699389473204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner_es_5.2.0_3.0_1699389473204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nominal_groups_recognition_medical_disease_competencia2_bert_medical_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|407.2 MB| + +## References + +https://huggingface.co/pineiden/nominal-groups-recognition-medical-disease-competencia2-bert-medical-ner \ No newline at end of file From f57583288068c9f83df3a2b7f2cb7cc4eab1a909 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:40:26 +0700 Subject: [PATCH 534/667] Add model 2023-11-07-species_identification_mbert_fine_tuned_train_test_en --- ...fication_mbert_fine_tuned_train_test_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-species_identification_mbert_fine_tuned_train_test_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-species_identification_mbert_fine_tuned_train_test_en.md b/docs/_posts/ahmedlone127/2023-11-07-species_identification_mbert_fine_tuned_train_test_en.md new file mode 100644 index 00000000000000..9b4ada8ef994aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-species_identification_mbert_fine_tuned_train_test_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English species_identification_mbert_fine_tuned_train_test BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: species_identification_mbert_fine_tuned_train_test +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`species_identification_mbert_fine_tuned_train_test` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/species_identification_mbert_fine_tuned_train_test_en_5.2.0_3.0_1699389436904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/species_identification_mbert_fine_tuned_train_test_en_5.2.0_3.0_1699389436904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("species_identification_mbert_fine_tuned_train_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("species_identification_mbert_fine_tuned_train_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|species_identification_mbert_fine_tuned_train_test| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.2 MB| + +## References + +https://huggingface.co/ajtamayoh/Species_Identification_mBERT_fine_tuned_Train_Test \ No newline at end of file From 6b044dfcf219fd1911cd923b1f85b2e0ba63fce2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:41:27 +0700 Subject: [PATCH 535/667] Add model 2023-11-07-postagger_portuguese_pt --- .../2023-11-07-postagger_portuguese_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-postagger_portuguese_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-postagger_portuguese_pt.md b/docs/_posts/ahmedlone127/2023-11-07-postagger_portuguese_pt.md new file mode 100644 index 00000000000000..86394a95fd0825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-postagger_portuguese_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese postagger_portuguese BertForTokenClassification from lisaterumi +author: John Snow Labs +name: postagger_portuguese +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_portuguese` is a Portuguese model originally trained by lisaterumi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_portuguese_pt_5.2.0_3.0_1699386278787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_portuguese_pt_5.2.0_3.0_1699386278787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("postagger_portuguese","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("postagger_portuguese", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_portuguese| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|406.0 MB| + +## References + +https://huggingface.co/lisaterumi/postagger-portuguese \ No newline at end of file From 5a4da886d8b5fd63e2bfa6031f6622efd2a4fcef Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:42:27 +0700 Subject: [PATCH 536/667] Add model 2023-11-07-bert_base_chinese_finetuned_ner_leonadase_en --- ...base_chinese_finetuned_ner_leonadase_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_leonadase_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_leonadase_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_leonadase_en.md new file mode 100644 index 00000000000000..c80998ad774e3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_leonadase_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_ner_leonadase BertForTokenClassification from leonadase +author: John Snow Labs +name: bert_base_chinese_finetuned_ner_leonadase +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner_leonadase` is a English model originally trained by leonadase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_leonadase_en_5.2.0_3.0_1699389681080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_leonadase_en_5.2.0_3.0_1699389681080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner_leonadase","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_chinese_finetuned_ner_leonadase", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner_leonadase| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/leonadase/bert-base-chinese-finetuned-ner \ No newline at end of file From 0e3289a84ed47e0faaadf02c2bd8f738ef3a85d7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:43:27 +0700 Subject: [PATCH 537/667] Add model 2023-11-07-nyt_ingredient_tagger_gte_small_en --- ...1-07-nyt_ingredient_tagger_gte_small_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-nyt_ingredient_tagger_gte_small_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-nyt_ingredient_tagger_gte_small_en.md b/docs/_posts/ahmedlone127/2023-11-07-nyt_ingredient_tagger_gte_small_en.md new file mode 100644 index 00000000000000..4026a220a70ec6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-nyt_ingredient_tagger_gte_small_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English nyt_ingredient_tagger_gte_small BertForTokenClassification from napsternxg +author: John Snow Labs +name: nyt_ingredient_tagger_gte_small +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nyt_ingredient_tagger_gte_small` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nyt_ingredient_tagger_gte_small_en_5.2.0_3.0_1699389758527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nyt_ingredient_tagger_gte_small_en_5.2.0_3.0_1699389758527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("nyt_ingredient_tagger_gte_small","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("nyt_ingredient_tagger_gte_small", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nyt_ingredient_tagger_gte_small| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|113.1 MB| + +## References + +https://huggingface.co/napsternxg/nyt-ingredient-tagger-gte-small \ No newline at end of file From 708c56d2c393cd94114b21091e835bca670b1e48 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:44:27 +0700 Subject: [PATCH 538/667] Add model 2023-11-07-nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx --- ...e_tuned_bert_base_multilingual_cased_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx.md b/docs/_posts/ahmedlone127/2023-11-07-nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx.md new file mode 100644 index 00000000000000..1070f399d5e9d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased BertForTokenClassification from GuCuChiara +author: John Snow Labs +name: nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased` is a Multilingual model originally trained by GuCuChiara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx_5.2.0_3.0_1699389484578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased_xx_5.2.0_3.0_1699389484578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_cic_wfu_distemist_fine_tuned_bert_base_multilingual_cased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/GuCuChiara/NLP-CIC-WFU_DisTEMIST_fine_tuned_bert-base-multilingual-cased \ No newline at end of file From 933901efebc29335a9e13234f896d3c7a47a1347 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:45:28 +0700 Subject: [PATCH 539/667] Add model 2023-11-07-finance_ner_v0_0_9_finetuned_ner_en --- ...-07-finance_ner_v0_0_9_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-finance_ner_v0_0_9_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-finance_ner_v0_0_9_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-finance_ner_v0_0_9_finetuned_ner_en.md new file mode 100644 index 00000000000000..a91647e8b1bbdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-finance_ner_v0_0_9_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English finance_ner_v0_0_9_finetuned_ner BertForTokenClassification from AhmedTaha012 +author: John Snow Labs +name: finance_ner_v0_0_9_finetuned_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finance_ner_v0_0_9_finetuned_ner` is a English model originally trained by AhmedTaha012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finance_ner_v0_0_9_finetuned_ner_en_5.2.0_3.0_1699385195204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finance_ner_v0_0_9_finetuned_ner_en_5.2.0_3.0_1699385195204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("finance_ner_v0_0_9_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("finance_ner_v0_0_9_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finance_ner_v0_0_9_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/AhmedTaha012/finance-ner-v0.0.9-finetuned-ner \ No newline at end of file From 759d262ee9b73765b9576d02a1f8d97e039b8c53 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:46:28 +0700 Subject: [PATCH 540/667] Add model 2023-11-07-bert_base_chinese_finetuned_ner_gyr66_zh --- ...ert_base_chinese_finetuned_ner_gyr66_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_gyr66_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_gyr66_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_gyr66_zh.md new file mode 100644 index 00000000000000..b3ce4d4f3c12e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_finetuned_ner_gyr66_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_base_chinese_finetuned_ner_gyr66 BertForTokenClassification from gyr66 +author: John Snow Labs +name: bert_base_chinese_finetuned_ner_gyr66 +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner_gyr66` is a Chinese model originally trained by gyr66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_gyr66_zh_5.2.0_3.0_1699386946416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_gyr66_zh_5.2.0_3.0_1699386946416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner_gyr66","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_chinese_finetuned_ner_gyr66", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner_gyr66| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/gyr66/bert-base-chinese-finetuned-ner \ No newline at end of file From d00cca22b6db63329bb9c00420cad367affd2dc5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:47:28 +0700 Subject: [PATCH 541/667] Add model 2023-11-07-bert_finetuned_ner_vbhasin_en --- ...023-11-07-bert_finetuned_ner_vbhasin_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_vbhasin_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_vbhasin_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_vbhasin_en.md new file mode 100644 index 00000000000000..c9462e658f0704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_vbhasin_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_vbhasin BertForTokenClassification from vbhasin +author: John Snow Labs +name: bert_finetuned_ner_vbhasin +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_vbhasin` is a English model originally trained by vbhasin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_vbhasin_en_5.2.0_3.0_1699389493005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_vbhasin_en_5.2.0_3.0_1699389493005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_vbhasin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_vbhasin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_vbhasin| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/vbhasin/bert-finetuned-ner \ No newline at end of file From 2116d29ef7d7443b5be10a711257ac186a1f4158 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:48:29 +0700 Subject: [PATCH 542/667] Add model 2023-11-07-bert_finetuned_animacy_en --- .../2023-11-07-bert_finetuned_animacy_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_animacy_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_animacy_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_animacy_en.md new file mode 100644 index 00000000000000..d69b40b07758b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_animacy_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_animacy BertForTokenClassification from andrewt-cam +author: John Snow Labs +name: bert_finetuned_animacy +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_animacy` is a English model originally trained by andrewt-cam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_animacy_en_5.2.0_3.0_1699390063073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_animacy_en_5.2.0_3.0_1699390063073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_animacy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_animacy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_animacy| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/andrewt-cam/bert-finetuned-animacy \ No newline at end of file From eac718384827eab1cc3f448d07d0777e0d44268f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:49:29 +0700 Subject: [PATCH 543/667] Add model 2023-11-07-skill_role_mapper_en --- .../2023-11-07-skill_role_mapper_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-skill_role_mapper_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-skill_role_mapper_en.md b/docs/_posts/ahmedlone127/2023-11-07-skill_role_mapper_en.md new file mode 100644 index 00000000000000..ea3fb5ec6c5734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-skill_role_mapper_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English skill_role_mapper BertForTokenClassification from MehdiHosseiniMoghadam +author: John Snow Labs +name: skill_role_mapper +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`skill_role_mapper` is a English model originally trained by MehdiHosseiniMoghadam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/skill_role_mapper_en_5.2.0_3.0_1699386711647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/skill_role_mapper_en_5.2.0_3.0_1699386711647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("skill_role_mapper","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("skill_role_mapper", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|skill_role_mapper| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.8 MB| + +## References + +https://huggingface.co/MehdiHosseiniMoghadam/skill-role-mapper \ No newline at end of file From f6e0a2e2fddf64b5bb45bed865a6ec4a13ce91fd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:50:29 +0700 Subject: [PATCH 544/667] Add model 2023-11-07-bert_base_ner_058_en --- .../2023-11-07-bert_base_ner_058_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_058_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_058_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_058_en.md new file mode 100644 index 00000000000000..6ae751705349aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_058_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_ner_058 BertForTokenClassification from NguyenVanHieu1605 +author: John Snow Labs +name: bert_base_ner_058 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_ner_058` is a English model originally trained by NguyenVanHieu1605. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_ner_058_en_5.2.0_3.0_1699385797170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_ner_058_en_5.2.0_3.0_1699385797170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_ner_058","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_ner_058", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_ner_058| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/NguyenVanHieu1605/bert-base-ner-058 \ No newline at end of file From cf27f7fc78de32e1c1228c7005dd5ce3f09aeeac Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:51:29 +0700 Subject: [PATCH 545/667] Add model 2023-11-07-multilingual_english_token_classification_model_xx --- ...l_english_token_classification_model_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-multilingual_english_token_classification_model_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-multilingual_english_token_classification_model_xx.md b/docs/_posts/ahmedlone127/2023-11-07-multilingual_english_token_classification_model_xx.md new file mode 100644 index 00000000000000..8cdfd4a47dd3e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-multilingual_english_token_classification_model_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual multilingual_english_token_classification_model BertForTokenClassification from Cabooose +author: John Snow Labs +name: multilingual_english_token_classification_model +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_english_token_classification_model` is a Multilingual model originally trained by Cabooose. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_english_token_classification_model_xx_5.2.0_3.0_1699388733526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_english_token_classification_model_xx_5.2.0_3.0_1699388733526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("multilingual_english_token_classification_model","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("multilingual_english_token_classification_model", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_english_token_classification_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Cabooose/multilingual_english_token_classification_model \ No newline at end of file From 6d221d1c10d950924965c821af826ba4936c74b5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:52:30 +0700 Subject: [PATCH 546/667] Add model 2023-11-07-sayula_popoluca_thai_th --- .../2023-11-07-sayula_popoluca_thai_th.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-sayula_popoluca_thai_th.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-sayula_popoluca_thai_th.md b/docs/_posts/ahmedlone127/2023-11-07-sayula_popoluca_thai_th.md new file mode 100644 index 00000000000000..26c9aad8e99af4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-sayula_popoluca_thai_th.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Thai sayula_popoluca_thai BertForTokenClassification from lunarlist +author: John Snow Labs +name: sayula_popoluca_thai +date: 2023-11-07 +tags: [bert, th, open_source, token_classification, onnx] +task: Named Entity Recognition +language: th +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sayula_popoluca_thai` is a Thai model originally trained by lunarlist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sayula_popoluca_thai_th_5.2.0_3.0_1699388742737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sayula_popoluca_thai_th_5.2.0_3.0_1699388742737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("sayula_popoluca_thai","th") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("sayula_popoluca_thai", "th") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sayula_popoluca_thai| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|th| +|Size:|344.8 MB| + +## References + +https://huggingface.co/lunarlist/pos_thai \ No newline at end of file From e2f8e90e2bae4b1241a1764a13f54f7375d50457 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:53:30 +0700 Subject: [PATCH 547/667] Add model 2023-11-07-idrisi_lmr_en_timebased_typeless_en --- ...-07-idrisi_lmr_en_timebased_typeless_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typeless_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typeless_en.md b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typeless_en.md new file mode 100644 index 00000000000000..ec286ecb76984b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-idrisi_lmr_en_timebased_typeless_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English idrisi_lmr_en_timebased_typeless BertForTokenClassification from rsuwaileh +author: John Snow Labs +name: idrisi_lmr_en_timebased_typeless +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`idrisi_lmr_en_timebased_typeless` is a English model originally trained by rsuwaileh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_timebased_typeless_en_5.2.0_3.0_1699390384142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/idrisi_lmr_en_timebased_typeless_en_5.2.0_3.0_1699390384142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("idrisi_lmr_en_timebased_typeless","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("idrisi_lmr_en_timebased_typeless", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|idrisi_lmr_en_timebased_typeless| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/rsuwaileh/IDRISI-LMR-EN-timebased-typeless \ No newline at end of file From eb1a214b6a44302b010cdde2ffece271ac22bc08 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:54:30 +0700 Subject: [PATCH 548/667] Add model 2023-11-07-jobbert_base_cased_ner_en --- .../2023-11-07-jobbert_base_cased_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-jobbert_base_cased_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-jobbert_base_cased_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-jobbert_base_cased_ner_en.md new file mode 100644 index 00000000000000..3b8b5577a5c4bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-jobbert_base_cased_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English jobbert_base_cased_ner BertForTokenClassification from itsmeboris +author: John Snow Labs +name: jobbert_base_cased_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobbert_base_cased_ner` is a English model originally trained by itsmeboris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobbert_base_cased_ner_en_5.2.0_3.0_1699389113523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobbert_base_cased_ner_en_5.2.0_3.0_1699389113523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("jobbert_base_cased_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("jobbert_base_cased_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobbert_base_cased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|402.2 MB| + +## References + +https://huggingface.co/itsmeboris/jobbert-base-cased-ner \ No newline at end of file From 30cb0361b572b8611645e3ac2c84b680277b9e50 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:55:31 +0700 Subject: [PATCH 549/667] Add model 2023-11-07-chinese_wiki_punctuation_restore_zh --- ...-07-chinese_wiki_punctuation_restore_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-chinese_wiki_punctuation_restore_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-chinese_wiki_punctuation_restore_zh.md b/docs/_posts/ahmedlone127/2023-11-07-chinese_wiki_punctuation_restore_zh.md new file mode 100644 index 00000000000000..64ab390dd3e06e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-chinese_wiki_punctuation_restore_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese chinese_wiki_punctuation_restore BertForTokenClassification from p208p2002 +author: John Snow Labs +name: chinese_wiki_punctuation_restore +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_wiki_punctuation_restore` is a Chinese model originally trained by p208p2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_wiki_punctuation_restore_zh_5.2.0_3.0_1699384662197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_wiki_punctuation_restore_zh_5.2.0_3.0_1699384662197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("chinese_wiki_punctuation_restore","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("chinese_wiki_punctuation_restore", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_wiki_punctuation_restore| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.0 MB| + +## References + +https://huggingface.co/p208p2002/zh-wiki-punctuation-restore \ No newline at end of file From 98f1d5df32cd585b15dcf566546ccb4933476af9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:56:31 +0700 Subject: [PATCH 550/667] Add model 2023-11-07-wikiser_bert_large_en --- .../2023-11-07-wikiser_bert_large_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_large_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_large_en.md b/docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_large_en.md new file mode 100644 index 00000000000000..7d10346d6b3533 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-wikiser_bert_large_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English wikiser_bert_large BertForTokenClassification from taidng +author: John Snow Labs +name: wikiser_bert_large +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikiser_bert_large` is a English model originally trained by taidng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikiser_bert_large_en_5.2.0_3.0_1699387224834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikiser_bert_large_en_5.2.0_3.0_1699387224834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("wikiser_bert_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("wikiser_bert_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikiser_bert_large| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/taidng/wikiser-bert-large \ No newline at end of file From b282d4c5ddf9cf11dc2276b1e0132e2557bc1495 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:57:31 +0700 Subject: [PATCH 551/667] Add model 2023-11-07-bert_finetuned_ner_applemoon_en --- ...3-11-07-bert_finetuned_ner_applemoon_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_applemoon_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_applemoon_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_applemoon_en.md new file mode 100644 index 00000000000000..08d3c0f7adcf3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_applemoon_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_applemoon BertForTokenClassification from Applemoon +author: John Snow Labs +name: bert_finetuned_ner_applemoon +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_applemoon` is a English model originally trained by Applemoon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_applemoon_en_5.2.0_3.0_1699389684774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_applemoon_en_5.2.0_3.0_1699389684774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_applemoon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_applemoon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_applemoon| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Applemoon/bert-finetuned-ner \ No newline at end of file From a4e74cdc95b2323241575475ddaa3c7383ffabde Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:58:32 +0700 Subject: [PATCH 552/667] Add model 2023-11-07-bert_base_uncased_conll2003_hfeng_en --- ...07-bert_base_uncased_conll2003_hfeng_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_conll2003_hfeng_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_conll2003_hfeng_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_conll2003_hfeng_en.md new file mode 100644 index 00000000000000..0e625a4165d22c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_uncased_conll2003_hfeng_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_uncased_conll2003_hfeng BertForTokenClassification from hfeng +author: John Snow Labs +name: bert_base_uncased_conll2003_hfeng +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_conll2003_hfeng` is a English model originally trained by hfeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_conll2003_hfeng_en_5.2.0_3.0_1699389315261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_conll2003_hfeng_en_5.2.0_3.0_1699389315261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_conll2003_hfeng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_uncased_conll2003_hfeng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_conll2003_hfeng| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hfeng/bert_base_uncased_conll2003 \ No newline at end of file From 076d9b2100a7fb99af12e2e0da4bd768c0af8367 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 03:59:32 +0700 Subject: [PATCH 553/667] Add model 2023-11-07-products_ner8_en --- .../2023-11-07-products_ner8_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-products_ner8_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-products_ner8_en.md b/docs/_posts/ahmedlone127/2023-11-07-products_ner8_en.md new file mode 100644 index 00000000000000..64c56a3451d8e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-products_ner8_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English products_ner8 BertForTokenClassification from Atheer174 +author: John Snow Labs +name: products_ner8 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`products_ner8` is a English model originally trained by Atheer174. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/products_ner8_en_5.2.0_3.0_1699386899051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/products_ner8_en_5.2.0_3.0_1699386899051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("products_ner8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("products_ner8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|products_ner8| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Atheer174/Products_NER8 \ No newline at end of file From 8ce03fbaa27bc66e47007ebf259b455e4dcce5dd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:00:32 +0700 Subject: [PATCH 554/667] Add model 2023-11-07-bde_abbrev_batteryonlybert_cased_base_en --- ...de_abbrev_batteryonlybert_cased_base_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bde_abbrev_batteryonlybert_cased_base_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bde_abbrev_batteryonlybert_cased_base_en.md b/docs/_posts/ahmedlone127/2023-11-07-bde_abbrev_batteryonlybert_cased_base_en.md new file mode 100644 index 00000000000000..ff454574f92ef5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bde_abbrev_batteryonlybert_cased_base_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bde_abbrev_batteryonlybert_cased_base BertForTokenClassification from batterydata +author: John Snow Labs +name: bde_abbrev_batteryonlybert_cased_base +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bde_abbrev_batteryonlybert_cased_base` is a English model originally trained by batterydata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bde_abbrev_batteryonlybert_cased_base_en_5.2.0_3.0_1699388506500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bde_abbrev_batteryonlybert_cased_base_en_5.2.0_3.0_1699388506500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bde_abbrev_batteryonlybert_cased_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bde_abbrev_batteryonlybert_cased_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bde_abbrev_batteryonlybert_cased_base| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/batterydata/bde-abbrev-batteryonlybert-cased-base \ No newline at end of file From c5bc6ce1a77eb9182dee4d476f9db24a0d04941b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:01:32 +0700 Subject: [PATCH 555/667] Add model 2023-11-07-bert4ner_base_chinese_zh --- .../2023-11-07-bert4ner_base_chinese_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert4ner_base_chinese_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert4ner_base_chinese_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert4ner_base_chinese_zh.md new file mode 100644 index 00000000000000..dd4cf208c321ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert4ner_base_chinese_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert4ner_base_chinese BertForTokenClassification from shibing624 +author: John Snow Labs +name: bert4ner_base_chinese +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert4ner_base_chinese` is a Chinese model originally trained by shibing624. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert4ner_base_chinese_zh_5.2.0_3.0_1699386449688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert4ner_base_chinese_zh_5.2.0_3.0_1699386449688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert4ner_base_chinese","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert4ner_base_chinese", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert4ner_base_chinese| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/shibing624/bert4ner-base-chinese \ No newline at end of file From 4c8aecc6d0e42c985d90c17321ca2c21a4a9a82c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:02:33 +0700 Subject: [PATCH 556/667] Add model 2023-11-07-emscad_skill_extraction_token_classification_en --- ...kill_extraction_token_classification_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_token_classification_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_token_classification_en.md b/docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_token_classification_en.md new file mode 100644 index 00000000000000..de4fac875af098 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-emscad_skill_extraction_token_classification_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English emscad_skill_extraction_token_classification BertForTokenClassification from Ivo +author: John Snow Labs +name: emscad_skill_extraction_token_classification +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emscad_skill_extraction_token_classification` is a English model originally trained by Ivo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_token_classification_en_5.2.0_3.0_1699389758974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_token_classification_en_5.2.0_3.0_1699389758974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("emscad_skill_extraction_token_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("emscad_skill_extraction_token_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emscad_skill_extraction_token_classification| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ivo/emscad-skill-extraction-token-classification \ No newline at end of file From 0e16bb811561f553c16a47b04f0308e566b511fd Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:03:33 +0700 Subject: [PATCH 557/667] Add model 2023-11-07-bert_finetuned_n2c2_ner_en --- .../2023-11-07-bert_finetuned_n2c2_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_n2c2_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_n2c2_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_n2c2_ner_en.md new file mode 100644 index 00000000000000..b4373409136122 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_n2c2_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_n2c2_ner BertForTokenClassification from georgeleung30 +author: John Snow Labs +name: bert_finetuned_n2c2_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_n2c2_ner` is a English model originally trained by georgeleung30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_n2c2_ner_en_5.2.0_3.0_1699389048043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_n2c2_ner_en_5.2.0_3.0_1699389048043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_n2c2_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_n2c2_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_n2c2_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/georgeleung30/bert-finetuned-n2c2-ner \ No newline at end of file From e1912835869d261e999f14df2feb41bae3c8f907 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:04:33 +0700 Subject: [PATCH 558/667] Add model 2023-11-07-pii_annotator_en --- .../2023-11-07-pii_annotator_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-pii_annotator_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-pii_annotator_en.md b/docs/_posts/ahmedlone127/2023-11-07-pii_annotator_en.md new file mode 100644 index 00000000000000..1b7a565156fdb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-pii_annotator_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English pii_annotator BertForTokenClassification from cp500 +author: John Snow Labs +name: pii_annotator +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pii_annotator` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pii_annotator_en_5.2.0_3.0_1699386518686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pii_annotator_en_5.2.0_3.0_1699386518686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("pii_annotator","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("pii_annotator", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pii_annotator| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.5 MB| + +## References + +https://huggingface.co/cp500/PII_annotator \ No newline at end of file From 5203d528198cd4c727524ff099bde6ab57d81330 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:05:34 +0700 Subject: [PATCH 559/667] Add model 2023-11-07-bert_finetuned_tech_product_name_ner_en --- ...bert_finetuned_tech_product_name_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_tech_product_name_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_tech_product_name_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_tech_product_name_ner_en.md new file mode 100644 index 00000000000000..bc4df13b32f4ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_tech_product_name_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_tech_product_name_ner BertForTokenClassification from ashleyliu31 +author: John Snow Labs +name: bert_finetuned_tech_product_name_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_tech_product_name_ner` is a English model originally trained by ashleyliu31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_tech_product_name_ner_en_5.2.0_3.0_1699383979008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_tech_product_name_ner_en_5.2.0_3.0_1699383979008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_tech_product_name_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_tech_product_name_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_tech_product_name_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ashleyliu31/bert-finetuned-tech-product-name-ner \ No newline at end of file From da871c6c4ff0702ab968fe640743b08a52a0cb05 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:06:34 +0700 Subject: [PATCH 560/667] Add model 2023-11-07-classical_chinese_punctuation_guwen_biaodian_zh --- ...l_chinese_punctuation_guwen_biaodian_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-classical_chinese_punctuation_guwen_biaodian_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-classical_chinese_punctuation_guwen_biaodian_zh.md b/docs/_posts/ahmedlone127/2023-11-07-classical_chinese_punctuation_guwen_biaodian_zh.md new file mode 100644 index 00000000000000..d9877268df554d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-classical_chinese_punctuation_guwen_biaodian_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese classical_chinese_punctuation_guwen_biaodian BertForTokenClassification from raynardj +author: John Snow Labs +name: classical_chinese_punctuation_guwen_biaodian +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classical_chinese_punctuation_guwen_biaodian` is a Chinese model originally trained by raynardj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classical_chinese_punctuation_guwen_biaodian_zh_5.2.0_3.0_1699386868878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classical_chinese_punctuation_guwen_biaodian_zh_5.2.0_3.0_1699386868878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("classical_chinese_punctuation_guwen_biaodian","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("classical_chinese_punctuation_guwen_biaodian", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classical_chinese_punctuation_guwen_biaodian| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/raynardj/classical-chinese-punctuation-guwen-biaodian \ No newline at end of file From d779ade3594be4cf5edae329bfec2a86c3fcdcc8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:07:34 +0700 Subject: [PATCH 561/667] Add model 2023-11-07-rubert_base_massive_ner_ru --- .../2023-11-07-rubert_base_massive_ner_ru.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-rubert_base_massive_ner_ru.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-rubert_base_massive_ner_ru.md b/docs/_posts/ahmedlone127/2023-11-07-rubert_base_massive_ner_ru.md new file mode 100644 index 00000000000000..90ecab1e70c2fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-rubert_base_massive_ner_ru.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Russian rubert_base_massive_ner BertForTokenClassification from 0x7194633 +author: John Snow Labs +name: rubert_base_massive_ner +date: 2023-11-07 +tags: [bert, ru, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ru +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_base_massive_ner` is a Russian model originally trained by 0x7194633. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_base_massive_ner_ru_5.2.0_3.0_1699389487868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_base_massive_ner_ru_5.2.0_3.0_1699389487868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("rubert_base_massive_ner","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("rubert_base_massive_ner", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_base_massive_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ru| +|Size:|664.6 MB| + +## References + +https://huggingface.co/0x7194633/rubert-base-massive-ner \ No newline at end of file From 99a935a92e148def92108017c2054692db4ca0a0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:08:35 +0700 Subject: [PATCH 562/667] Add model 2023-11-07-pashto_sayula_popoluca_en --- .../2023-11-07-pashto_sayula_popoluca_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-pashto_sayula_popoluca_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-pashto_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2023-11-07-pashto_sayula_popoluca_en.md new file mode 100644 index 00000000000000..3c87a81d68aa1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-pashto_sayula_popoluca_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English pashto_sayula_popoluca BertForTokenClassification from ijazulhaq +author: John Snow Labs +name: pashto_sayula_popoluca +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pashto_sayula_popoluca` is a English model originally trained by ijazulhaq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pashto_sayula_popoluca_en_5.2.0_3.0_1699386046291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pashto_sayula_popoluca_en_5.2.0_3.0_1699386046291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("pashto_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("pashto_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pashto_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/ijazulhaq/pashto-pos \ No newline at end of file From 7dda4d516a9f1677891aedb6f4d89e2f5c0f1c5d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:09:35 +0700 Subject: [PATCH 563/667] Add model 2023-11-07-bert_base_multilingual_cased_sayula_popoluca_english_xx --- ...ingual_cased_sayula_popoluca_english_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_sayula_popoluca_english_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_sayula_popoluca_english_xx.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_sayula_popoluca_english_xx.md new file mode 100644 index 00000000000000..b32efb90bdc64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_sayula_popoluca_english_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_sayula_popoluca_english BertForTokenClassification from gbwsolutions +author: John Snow Labs +name: bert_base_multilingual_cased_sayula_popoluca_english +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_sayula_popoluca_english` is a Multilingual model originally trained by gbwsolutions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sayula_popoluca_english_xx_5.2.0_3.0_1699389224592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sayula_popoluca_english_xx_5.2.0_3.0_1699389224592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_sayula_popoluca_english","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_multilingual_cased_sayula_popoluca_english", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_sayula_popoluca_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.2 MB| + +## References + +https://huggingface.co/gbwsolutions/bert-base-multilingual-cased-pos-english \ No newline at end of file From f720fc55bf40caf14c607c434e533b22aa5775a2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:10:35 +0700 Subject: [PATCH 564/667] Add model 2023-11-07-clinicalnerpt_pharmacologic_pt --- ...23-11-07-clinicalnerpt_pharmacologic_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_pharmacologic_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_pharmacologic_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_pharmacologic_pt.md new file mode 100644 index 00000000000000..256430894d0fb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_pharmacologic_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_pharmacologic BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_pharmacologic +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_pharmacologic` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_pharmacologic_pt_5.2.0_3.0_1699388490264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_pharmacologic_pt_5.2.0_3.0_1699388490264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_pharmacologic","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_pharmacologic", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_pharmacologic| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-pharmacologic \ No newline at end of file From 72944691e9f01e9d0d21771139b08d79eafb1d20 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:11:35 +0700 Subject: [PATCH 565/667] Add model 2023-11-07-dark_bert_finetuned_ner_en --- .../2023-11-07-dark_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-dark_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-dark_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-dark_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..ce8c56c2b1286b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-dark_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English dark_bert_finetuned_ner BertForTokenClassification from pulkitkumar13 +author: John Snow Labs +name: dark_bert_finetuned_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dark_bert_finetuned_ner` is a English model originally trained by pulkitkumar13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dark_bert_finetuned_ner_en_5.2.0_3.0_1699387910801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dark_bert_finetuned_ner_en_5.2.0_3.0_1699387910801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("dark_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("dark_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dark_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/pulkitkumar13/dark-bert-finetuned-ner \ No newline at end of file From c8bbcc4a9c08b2dbed58d3496d858241643f285b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:12:36 +0700 Subject: [PATCH 566/667] Add model 2023-11-07-bert_base_chinese_medical_ner_zh --- ...-11-07-bert_base_chinese_medical_ner_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_medical_ner_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_medical_ner_zh.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_medical_ner_zh.md new file mode 100644 index 00000000000000..4acf5c60b0aa9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_chinese_medical_ner_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese bert_base_chinese_medical_ner BertForTokenClassification from iioSnail +author: John Snow Labs +name: bert_base_chinese_medical_ner +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_medical_ner` is a Chinese model originally trained by iioSnail. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_medical_ner_zh_5.2.0_3.0_1699386242094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_medical_ner_zh_5.2.0_3.0_1699386242094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_medical_ner","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_chinese_medical_ner", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_medical_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/iioSnail/bert-base-chinese-medical-ner \ No newline at end of file From e71a65f787d791518bf0c0af76e0eee677c71b39 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:13:36 +0700 Subject: [PATCH 567/667] Add model 2023-11-07-fullstop_indonesian_punctuation_prediction_id --- ...op_indonesian_punctuation_prediction_id.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-fullstop_indonesian_punctuation_prediction_id.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-fullstop_indonesian_punctuation_prediction_id.md b/docs/_posts/ahmedlone127/2023-11-07-fullstop_indonesian_punctuation_prediction_id.md new file mode 100644 index 00000000000000..015e64df521823 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-fullstop_indonesian_punctuation_prediction_id.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Indonesian fullstop_indonesian_punctuation_prediction BertForTokenClassification from Rizkinoor16 +author: John Snow Labs +name: fullstop_indonesian_punctuation_prediction +date: 2023-11-07 +tags: [bert, id, open_source, token_classification, onnx] +task: Named Entity Recognition +language: id +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fullstop_indonesian_punctuation_prediction` is a Indonesian model originally trained by Rizkinoor16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fullstop_indonesian_punctuation_prediction_id_5.2.0_3.0_1699391589605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fullstop_indonesian_punctuation_prediction_id_5.2.0_3.0_1699391589605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("fullstop_indonesian_punctuation_prediction","id") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("fullstop_indonesian_punctuation_prediction", "id") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fullstop_indonesian_punctuation_prediction| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|id| +|Size:|625.5 MB| + +## References + +https://huggingface.co/Rizkinoor16/fullstop-indonesian-punctuation-prediction \ No newline at end of file From 9b25bb0b87781a964c9cf74b9b0ea3f85d6227f2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:14:36 +0700 Subject: [PATCH 568/667] Add model 2023-11-07-russian_damage_trigger_effect_4_en --- ...1-07-russian_damage_trigger_effect_4_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-russian_damage_trigger_effect_4_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-russian_damage_trigger_effect_4_en.md b/docs/_posts/ahmedlone127/2023-11-07-russian_damage_trigger_effect_4_en.md new file mode 100644 index 00000000000000..9e53b52b69aae1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-russian_damage_trigger_effect_4_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English russian_damage_trigger_effect_4 BertForTokenClassification from Lolimorimorf +author: John Snow Labs +name: russian_damage_trigger_effect_4 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`russian_damage_trigger_effect_4` is a English model originally trained by Lolimorimorf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/russian_damage_trigger_effect_4_en_5.2.0_3.0_1699387304257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/russian_damage_trigger_effect_4_en_5.2.0_3.0_1699387304257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("russian_damage_trigger_effect_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("russian_damage_trigger_effect_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|russian_damage_trigger_effect_4| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/Lolimorimorf/russian_damage_trigger_effect_4 \ No newline at end of file From 58717a2a06c58b5cfd4298ca4972d708e0840d3a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:15:37 +0700 Subject: [PATCH 569/667] Add model 2023-11-07-hotel_reviews_en --- .../2023-11-07-hotel_reviews_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-hotel_reviews_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-hotel_reviews_en.md b/docs/_posts/ahmedlone127/2023-11-07-hotel_reviews_en.md new file mode 100644 index 00000000000000..5d01884dae5ea1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-hotel_reviews_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English hotel_reviews BertForTokenClassification from MutazYoune +author: John Snow Labs +name: hotel_reviews +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hotel_reviews` is a English model originally trained by MutazYoune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hotel_reviews_en_5.2.0_3.0_1699387666666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hotel_reviews_en_5.2.0_3.0_1699387666666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("hotel_reviews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("hotel_reviews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hotel_reviews| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.4 MB| + +## References + +https://huggingface.co/MutazYoune/hotel_reviews \ No newline at end of file From f8a58a0add056348d873e27b3b20c4b08e9aaae1 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:21:22 +0700 Subject: [PATCH 570/667] Add model 2023-11-07-macbert_base_chinese_medical_collation_zh --- ...cbert_base_chinese_medical_collation_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medical_collation_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medical_collation_zh.md b/docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medical_collation_zh.md new file mode 100644 index 00000000000000..a0aa6d3b3cda3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medical_collation_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese macbert_base_chinese_medical_collation BertForTokenClassification from 9pinus +author: John Snow Labs +name: macbert_base_chinese_medical_collation +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`macbert_base_chinese_medical_collation` is a Chinese model originally trained by 9pinus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/macbert_base_chinese_medical_collation_zh_5.2.0_3.0_1699392069704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/macbert_base_chinese_medical_collation_zh_5.2.0_3.0_1699392069704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("macbert_base_chinese_medical_collation","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("macbert_base_chinese_medical_collation", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|macbert_base_chinese_medical_collation| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.0 MB| + +## References + +https://huggingface.co/9pinus/macbert-base-chinese-medical-collation \ No newline at end of file From 911f7c9005c363b48e3ccaea353c9e5d76438865 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:22:22 +0700 Subject: [PATCH 571/667] Add model 2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en --- ...ase_uncased_abstract_ft_ncbi_disease_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en.md b/docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en.md new file mode 100644 index 00000000000000..74f5cf4654d392 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease BertForTokenClassification from sarahmiller137 +author: John Snow Labs +name: biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease` is a English model originally trained by sarahmiller137. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en_5.2.0_3.0_1699392082410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease_en_5.2.0_3.0_1699392082410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomednlp_pubmedbert_base_uncased_abstract_ft_ncbi_disease| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/sarahmiller137/BiomedNLP-PubMedBERT-base-uncased-abstract-ft-ncbi-disease \ No newline at end of file From 967c0afcc8c8b693df5a2638f0c19ef36c7f86c2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:46:18 +0700 Subject: [PATCH 572/667] Add model 2023-11-07-bert_uncased_keyword_extractor_en --- ...11-07-bert_uncased_keyword_extractor_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_uncased_keyword_extractor_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_uncased_keyword_extractor_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_uncased_keyword_extractor_en.md new file mode 100644 index 00000000000000..1bd5aa890572d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_uncased_keyword_extractor_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_uncased_keyword_extractor BertForTokenClassification from Azma-AI +author: John Snow Labs +name: bert_uncased_keyword_extractor +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_uncased_keyword_extractor` is a English model originally trained by Azma-AI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_uncased_keyword_extractor_en_5.2.0_3.0_1699393569738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_uncased_keyword_extractor_en_5.2.0_3.0_1699393569738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_uncased_keyword_extractor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_uncased_keyword_extractor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_uncased_keyword_extractor| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Azma-AI/bert-uncased-keyword-extractor \ No newline at end of file From 06eac71975d0cf4ae13789f26b987a6d366f7ca2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:47:18 +0700 Subject: [PATCH 573/667] Add model 2023-11-07-berturk_cased_ner_tr --- .../2023-11-07-berturk_cased_ner_tr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-berturk_cased_ner_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-berturk_cased_ner_tr.md b/docs/_posts/ahmedlone127/2023-11-07-berturk_cased_ner_tr.md new file mode 100644 index 00000000000000..0f3702c9bccd2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-berturk_cased_ner_tr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Turkish berturk_cased_ner BertForTokenClassification from alierenak +author: John Snow Labs +name: berturk_cased_ner +date: 2023-11-07 +tags: [bert, tr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berturk_cased_ner` is a Turkish model originally trained by alierenak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berturk_cased_ner_tr_5.2.0_3.0_1699393575749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berturk_cased_ner_tr_5.2.0_3.0_1699393575749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("berturk_cased_ner","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("berturk_cased_ner", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berturk_cased_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| + +## References + +https://huggingface.co/alierenak/berturk-cased-ner \ No newline at end of file From 64631663faf22da5b1310b1c8e072e9b164bfe7d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 04:48:18 +0700 Subject: [PATCH 574/667] Add model 2023-11-07-autotrain_medicaltokenclassification_1279048948_en --- ...edicaltokenclassification_1279048948_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-autotrain_medicaltokenclassification_1279048948_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-autotrain_medicaltokenclassification_1279048948_en.md b/docs/_posts/ahmedlone127/2023-11-07-autotrain_medicaltokenclassification_1279048948_en.md new file mode 100644 index 00000000000000..49cb5cfdcb28c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-autotrain_medicaltokenclassification_1279048948_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English autotrain_medicaltokenclassification_1279048948 BertForTokenClassification from shreyas-singh +author: John Snow Labs +name: autotrain_medicaltokenclassification_1279048948 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_medicaltokenclassification_1279048948` is a English model originally trained by shreyas-singh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_medicaltokenclassification_1279048948_en_5.2.0_3.0_1699393608127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_medicaltokenclassification_1279048948_en_5.2.0_3.0_1699393608127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("autotrain_medicaltokenclassification_1279048948","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("autotrain_medicaltokenclassification_1279048948", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_medicaltokenclassification_1279048948| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/shreyas-singh/autotrain-MedicalTokenClassification-1279048948 \ No newline at end of file From fd25ef3b91bee48e3768e711a7a4dc43b8cc91b2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:05:55 +0700 Subject: [PATCH 575/667] Add model 2023-11-07-bert_tiny_finetuned_finer_139_full_intel_cpu_en --- ...y_finetuned_finer_139_full_intel_cpu_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_finer_139_full_intel_cpu_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_finer_139_full_intel_cpu_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_finer_139_full_intel_cpu_en.md new file mode 100644 index 00000000000000..8b8340ad36ef8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_tiny_finetuned_finer_139_full_intel_cpu_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_tiny_finetuned_finer_139_full_intel_cpu BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_tiny_finetuned_finer_139_full_intel_cpu +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_finetuned_finer_139_full_intel_cpu` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_finetuned_finer_139_full_intel_cpu_en_5.2.0_3.0_1699394753224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_finetuned_finer_139_full_intel_cpu_en_5.2.0_3.0_1699394753224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_tiny_finetuned_finer_139_full_intel_cpu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_tiny_finetuned_finer_139_full_intel_cpu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_finetuned_finer_139_full_intel_cpu| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/muhtasham/bert-tiny-finetuned-finer-139-full-intel-cpu \ No newline at end of file From fb43b05eb1832e09aab5d004f0a3f714da266fe8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:09:21 +0700 Subject: [PATCH 576/667] Add model 2023-11-07-bde_sayula_popoluca_bert_cased_base_en --- ...-bde_sayula_popoluca_bert_cased_base_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bde_sayula_popoluca_bert_cased_base_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bde_sayula_popoluca_bert_cased_base_en.md b/docs/_posts/ahmedlone127/2023-11-07-bde_sayula_popoluca_bert_cased_base_en.md new file mode 100644 index 00000000000000..d77d445f080c7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bde_sayula_popoluca_bert_cased_base_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bde_sayula_popoluca_bert_cased_base BertForTokenClassification from batterydata +author: John Snow Labs +name: bde_sayula_popoluca_bert_cased_base +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bde_sayula_popoluca_bert_cased_base` is a English model originally trained by batterydata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bde_sayula_popoluca_bert_cased_base_en_5.2.0_3.0_1699394949156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bde_sayula_popoluca_bert_cased_base_en_5.2.0_3.0_1699394949156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bde_sayula_popoluca_bert_cased_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bde_sayula_popoluca_bert_cased_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bde_sayula_popoluca_bert_cased_base| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/batterydata/bde-pos-bert-cased-base \ No newline at end of file From 1750be7134ee05895fcb2d31a58501922aad26c0 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:12:36 +0700 Subject: [PATCH 577/667] Add model 2023-11-07-clinicalnerpt_healthcare_pt --- .../2023-11-07-clinicalnerpt_healthcare_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_healthcare_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_healthcare_pt.md b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_healthcare_pt.md new file mode 100644 index 00000000000000..8b26d9ad3a80c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-clinicalnerpt_healthcare_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_healthcare BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_healthcare +date: 2023-11-07 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_healthcare` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_healthcare_pt_5.2.0_3.0_1699395140561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_healthcare_pt_5.2.0_3.0_1699395140561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_healthcare","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_healthcare", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_healthcare| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-healthcare \ No newline at end of file From 91926b590cb980a4ae9113e09008c715eb2bef58 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:26:55 +0700 Subject: [PATCH 578/667] Add model 2023-11-07-arabert_arabic_ner_en --- .../2023-11-07-arabert_arabic_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-arabert_arabic_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-arabert_arabic_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-arabert_arabic_ner_en.md new file mode 100644 index 00000000000000..2e0904f5914eee --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-arabert_arabic_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English arabert_arabic_ner BertForTokenClassification from PRAli22 +author: John Snow Labs +name: arabert_arabic_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_arabic_ner` is a English model originally trained by PRAli22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_en_5.2.0_3.0_1699396006785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_en_5.2.0_3.0_1699396006785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("arabert_arabic_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("arabert_arabic_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_arabic_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|505.1 MB| + +## References + +https://huggingface.co/PRAli22/arabert_arabic_ner \ No newline at end of file From bf23e25f15795dd60fac231e8f668b9b2b5a6162 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:36:45 +0700 Subject: [PATCH 579/667] Add model 2023-11-07-med_ner_2_en --- .../ahmedlone127/2023-11-07-med_ner_2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-med_ner_2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-med_ner_2_en.md b/docs/_posts/ahmedlone127/2023-11-07-med_ner_2_en.md new file mode 100644 index 00000000000000..2579568a57108e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-med_ner_2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English med_ner_2 BertForTokenClassification from m-aliabbas1 +author: John Snow Labs +name: med_ner_2 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`med_ner_2` is a English model originally trained by m-aliabbas1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/med_ner_2_en_5.2.0_3.0_1699396604225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/med_ner_2_en_5.2.0_3.0_1699396604225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("med_ner_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("med_ner_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|med_ner_2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/m-aliabbas1/med_ner_2 \ No newline at end of file From 7640abbed9567a08e2fd802ba572b276dbf1ad24 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:37:46 +0700 Subject: [PATCH 580/667] Add model 2023-11-07-berttest2_rtwc_en --- .../2023-11-07-berttest2_rtwc_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-berttest2_rtwc_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-berttest2_rtwc_en.md b/docs/_posts/ahmedlone127/2023-11-07-berttest2_rtwc_en.md new file mode 100644 index 00000000000000..bb33b17d93b9ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-berttest2_rtwc_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English berttest2_rtwc BertForTokenClassification from RtwC +author: John Snow Labs +name: berttest2_rtwc +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berttest2_rtwc` is a English model originally trained by RtwC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berttest2_rtwc_en_5.2.0_3.0_1699396604379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berttest2_rtwc_en_5.2.0_3.0_1699396604379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("berttest2_rtwc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("berttest2_rtwc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berttest2_rtwc| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/RtwC/berttest2 \ No newline at end of file From 939eb98d23269fb6225e51ca70eddc37fa460338 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:38:46 +0700 Subject: [PATCH 581/667] Add model 2023-11-07-bert_base_multilingual_cased_finetuned_conll03_spanish_xx --- ...gual_cased_finetuned_conll03_spanish_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_conll03_spanish_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_conll03_spanish_xx.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_conll03_spanish_xx.md new file mode 100644 index 00000000000000..a8a8f26dbceb08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_multilingual_cased_finetuned_conll03_spanish_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_conll03_spanish BertForTokenClassification from dbmdz +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_conll03_spanish +date: 2023-11-07 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_conll03_spanish` is a Multilingual model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_conll03_spanish_xx_5.2.0_3.0_1699396605070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_conll03_spanish_xx_5.2.0_3.0_1699396605070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_finetuned_conll03_spanish","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_multilingual_cased_finetuned_conll03_spanish", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_conll03_spanish| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-multilingual-cased-finetuned-conll03-spanish \ No newline at end of file From fb19f434156cf5c4edf225ad027843a90acab53e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:47:28 +0700 Subject: [PATCH 582/667] Add model 2023-11-07-scibert_finetuned_ner_eeshclusive_en --- ...07-scibert_finetuned_ner_eeshclusive_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-scibert_finetuned_ner_eeshclusive_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-scibert_finetuned_ner_eeshclusive_en.md b/docs/_posts/ahmedlone127/2023-11-07-scibert_finetuned_ner_eeshclusive_en.md new file mode 100644 index 00000000000000..edea0f9ea5de58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-scibert_finetuned_ner_eeshclusive_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English scibert_finetuned_ner_eeshclusive BertForTokenClassification from eeshclusive +author: John Snow Labs +name: scibert_finetuned_ner_eeshclusive +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_finetuned_ner_eeshclusive` is a English model originally trained by eeshclusive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_finetuned_ner_eeshclusive_en_5.2.0_3.0_1699397236557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_finetuned_ner_eeshclusive_en_5.2.0_3.0_1699397236557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("scibert_finetuned_ner_eeshclusive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("scibert_finetuned_ner_eeshclusive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_finetuned_ner_eeshclusive| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/eeshclusive/scibert-finetuned-ner \ No newline at end of file From e413d135e8ebf684467ef9e73937cdc6d468ca57 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 05:50:09 +0700 Subject: [PATCH 583/667] Add model 2023-11-07-bert_ner_4_en --- .../ahmedlone127/2023-11-07-bert_ner_4_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_ner_4_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_ner_4_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_ner_4_en.md new file mode 100644 index 00000000000000..27e12b48ecdbd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_ner_4_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_ner_4 BertForTokenClassification from mpalaval +author: John Snow Labs +name: bert_ner_4 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ner_4` is a English model originally trained by mpalaval. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ner_4_en_5.2.0_3.0_1699397402535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ner_4_en_5.2.0_3.0_1699397402535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_ner_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_ner_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ner_4| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mpalaval/bert-ner-4 \ No newline at end of file From 84949d76ffecefcb651d9cba033c8969b6e23448 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:02:53 +0700 Subject: [PATCH 584/667] Add model 2023-11-07-bert_german_ler_de --- .../2023-11-07-bert_german_ler_de.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_german_ler_de.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_german_ler_de.md b/docs/_posts/ahmedlone127/2023-11-07-bert_german_ler_de.md new file mode 100644 index 00000000000000..b4d5f58ed87864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_german_ler_de.md @@ -0,0 +1,93 @@ +--- +layout: model +title: German bert_german_ler BertForTokenClassification from elenanereiss +author: John Snow Labs +name: bert_german_ler +date: 2023-11-07 +tags: [bert, de, open_source, token_classification, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_german_ler` is a German model originally trained by elenanereiss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_german_ler_de_5.2.0_3.0_1699398163265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_german_ler_de_5.2.0_3.0_1699398163265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_german_ler","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_german_ler", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_german_ler| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|407.0 MB| + +## References + +https://huggingface.co/elenanereiss/bert-german-ler \ No newline at end of file From e4cd8e235e7d57179bebc33cf51c2786760edca9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:03:53 +0700 Subject: [PATCH 585/667] Add model 2023-11-07-bert_finetuned_ner_default_parameters_en --- ...ert_finetuned_ner_default_parameters_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_default_parameters_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_default_parameters_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_default_parameters_en.md new file mode 100644 index 00000000000000..aa8d99a23ceb72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_default_parameters_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_default_parameters BertForTokenClassification from Mabel465 +author: John Snow Labs +name: bert_finetuned_ner_default_parameters +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_default_parameters` is a English model originally trained by Mabel465. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_default_parameters_en_5.2.0_3.0_1699398163108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_default_parameters_en_5.2.0_3.0_1699398163108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_default_parameters","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_default_parameters", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_default_parameters| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Mabel465/bert-finetuned-ner.default_parameters \ No newline at end of file From bbc5c91e23296a07c477ee28e1612607365c3875 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:18:22 +0700 Subject: [PATCH 586/667] Add model 2023-11-07-urdu_bert_ner_en --- .../2023-11-07-urdu_bert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-urdu_bert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-urdu_bert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-urdu_bert_ner_en.md new file mode 100644 index 00000000000000..0c1c2af896f07f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-urdu_bert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English urdu_bert_ner BertForTokenClassification from mirfan899 +author: John Snow Labs +name: urdu_bert_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`urdu_bert_ner` is a English model originally trained by mirfan899. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/urdu_bert_ner_en_5.2.0_3.0_1699399089364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/urdu_bert_ner_en_5.2.0_3.0_1699399089364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("urdu_bert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("urdu_bert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|urdu_bert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/mirfan899/urdu-bert-ner \ No newline at end of file From ebd9ccf4f00d868ddbe765d0c64804e523bbf069 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:21:10 +0700 Subject: [PATCH 587/667] Add model 2023-11-07-gp3_medical_token_classification_en --- ...-07-gp3_medical_token_classification_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-gp3_medical_token_classification_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-gp3_medical_token_classification_en.md b/docs/_posts/ahmedlone127/2023-11-07-gp3_medical_token_classification_en.md new file mode 100644 index 00000000000000..4958b7a0911023 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-gp3_medical_token_classification_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English gp3_medical_token_classification BertForTokenClassification from parsi-ai-nlpclass +author: John Snow Labs +name: gp3_medical_token_classification +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gp3_medical_token_classification` is a English model originally trained by parsi-ai-nlpclass. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gp3_medical_token_classification_en_5.2.0_3.0_1699399263292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gp3_medical_token_classification_en_5.2.0_3.0_1699399263292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("gp3_medical_token_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("gp3_medical_token_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gp3_medical_token_classification| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/parsi-ai-nlpclass/Gp3_medical_token_classification \ No newline at end of file From c2ddc6c32329610a8d922f9af8ff051c766189ba Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:27:31 +0700 Subject: [PATCH 588/667] Add model 2023-11-07-bert_finetuned_ner_rahulmukherji_en --- ...-07-bert_finetuned_ner_rahulmukherji_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_rahulmukherji_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_rahulmukherji_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_rahulmukherji_en.md new file mode 100644 index 00000000000000..2c4c94fa38f3fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_rahulmukherji_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_rahulmukherji BertForTokenClassification from rahulmukherji +author: John Snow Labs +name: bert_finetuned_ner_rahulmukherji +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_rahulmukherji` is a English model originally trained by rahulmukherji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_rahulmukherji_en_5.2.0_3.0_1699399643914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_rahulmukherji_en_5.2.0_3.0_1699399643914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_rahulmukherji","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_rahulmukherji", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_rahulmukherji| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/rahulmukherji/bert-finetuned-ner \ No newline at end of file From 771458e694e9b9edc824962393fed543ae62cae5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:31:21 +0700 Subject: [PATCH 589/667] Add model 2023-11-07-ner_bio_annotated_7_1_en --- .../2023-11-07-ner_bio_annotated_7_1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ner_bio_annotated_7_1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ner_bio_annotated_7_1_en.md b/docs/_posts/ahmedlone127/2023-11-07-ner_bio_annotated_7_1_en.md new file mode 100644 index 00000000000000..6d88251d4dcb75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ner_bio_annotated_7_1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ner_bio_annotated_7_1 BertForTokenClassification from urbija +author: John Snow Labs +name: ner_bio_annotated_7_1 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bio_annotated_7_1` is a English model originally trained by urbija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bio_annotated_7_1_en_5.2.0_3.0_1699399873485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bio_annotated_7_1_en_5.2.0_3.0_1699399873485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_bio_annotated_7_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_bio_annotated_7_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bio_annotated_7_1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/urbija/ner-bio-annotated-7-1 \ No newline at end of file From 1c51c71bbb6b4d350ac2c9f0c32a7c673a03f2ff Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:40:53 +0700 Subject: [PATCH 590/667] Add model 2023-11-07-bert_finetuned_ner_accelerate_sanjay7178_en --- ..._finetuned_ner_accelerate_sanjay7178_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_accelerate_sanjay7178_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_accelerate_sanjay7178_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_accelerate_sanjay7178_en.md new file mode 100644 index 00000000000000..ef7d8472502e4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_finetuned_ner_accelerate_sanjay7178_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_accelerate_sanjay7178 BertForTokenClassification from sanjay7178 +author: John Snow Labs +name: bert_finetuned_ner_accelerate_sanjay7178 +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_accelerate_sanjay7178` is a English model originally trained by sanjay7178. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_accelerate_sanjay7178_en_5.2.0_3.0_1699400446204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_accelerate_sanjay7178_en_5.2.0_3.0_1699400446204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_accelerate_sanjay7178","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_accelerate_sanjay7178", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_accelerate_sanjay7178| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/sanjay7178/bert-finetuned-ner-accelerate \ No newline at end of file From 368cac5070ff85ded43eac66918b93b190816c1f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:46:55 +0700 Subject: [PATCH 591/667] Add model 2023-11-07-macbert_base_chinese_medicine_recognition_zh --- ...rt_base_chinese_medicine_recognition_zh.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medicine_recognition_zh.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medicine_recognition_zh.md b/docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medicine_recognition_zh.md new file mode 100644 index 00000000000000..c783a642b83789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-macbert_base_chinese_medicine_recognition_zh.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Chinese macbert_base_chinese_medicine_recognition BertForTokenClassification from 9pinus +author: John Snow Labs +name: macbert_base_chinese_medicine_recognition +date: 2023-11-07 +tags: [bert, zh, open_source, token_classification, onnx] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`macbert_base_chinese_medicine_recognition` is a Chinese model originally trained by 9pinus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/macbert_base_chinese_medicine_recognition_zh_5.2.0_3.0_1699400808366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/macbert_base_chinese_medicine_recognition_zh_5.2.0_3.0_1699400808366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("macbert_base_chinese_medicine_recognition","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("macbert_base_chinese_medicine_recognition", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|macbert_base_chinese_medicine_recognition| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|381.1 MB| + +## References + +https://huggingface.co/9pinus/macbert-base-chinese-medicine-recognition \ No newline at end of file From 0cdfae70da08ad083661480dd2350579a8a815a9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:51:40 +0700 Subject: [PATCH 592/667] Add model 2023-11-07-bert_base_ner_reptile_5_datasets_en --- ...-07-bert_base_ner_reptile_5_datasets_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_reptile_5_datasets_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_reptile_5_datasets_en.md b/docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_reptile_5_datasets_en.md new file mode 100644 index 00000000000000..08ce5a910f52c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-bert_base_ner_reptile_5_datasets_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_ner_reptile_5_datasets BertForTokenClassification from ai-forever +author: John Snow Labs +name: bert_base_ner_reptile_5_datasets +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_ner_reptile_5_datasets` is a English model originally trained by ai-forever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_ner_reptile_5_datasets_en_5.2.0_3.0_1699401088372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_ner_reptile_5_datasets_en_5.2.0_3.0_1699401088372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_ner_reptile_5_datasets","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_ner_reptile_5_datasets", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_ner_reptile_5_datasets| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/ai-forever/bert-base-NER-reptile-5-datasets \ No newline at end of file From b11bd645d3a35462d3fa50121190612815623695 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:52:41 +0700 Subject: [PATCH 593/667] Add model 2023-11-07-ner_fine_tune_bert_ner_en --- .../2023-11-07-ner_fine_tune_bert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_ner_en.md new file mode 100644 index 00000000000000..62068cfd7d7fd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-ner_fine_tune_bert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ner_fine_tune_bert_ner BertForTokenClassification from cehongw +author: John Snow Labs +name: ner_fine_tune_bert_ner +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_fine_tune_bert_ner` is a English model originally trained by cehongw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_fine_tune_bert_ner_en_5.2.0_3.0_1699401114697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_fine_tune_bert_ner_en_5.2.0_3.0_1699401114697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_fine_tune_bert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_fine_tune_bert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_fine_tune_bert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/cehongw/ner-fine-tune-bert-ner \ No newline at end of file From 99a6d3569ad73010dc9d59221e527ebead691036 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 06:58:11 +0700 Subject: [PATCH 594/667] Add model 2023-11-07-scibert_scivocab_uncased_ner_visbank_en --- ...scibert_scivocab_uncased_ner_visbank_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_ner_visbank_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_ner_visbank_en.md b/docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_ner_visbank_en.md new file mode 100644 index 00000000000000..f3222e01561b8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-07-scibert_scivocab_uncased_ner_visbank_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English scibert_scivocab_uncased_ner_visbank BertForTokenClassification from Yamei +author: John Snow Labs +name: scibert_scivocab_uncased_ner_visbank +date: 2023-11-07 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_scivocab_uncased_ner_visbank` is a English model originally trained by Yamei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_scivocab_uncased_ner_visbank_en_5.2.0_3.0_1699401483817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_scivocab_uncased_ner_visbank_en_5.2.0_3.0_1699401483817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("scibert_scivocab_uncased_ner_visbank","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("scibert_scivocab_uncased_ner_visbank", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_scivocab_uncased_ner_visbank| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/Yamei/scibert_scivocab_uncased_NER_VISBank \ No newline at end of file From 6e541cc077ef545553ff6c19ade7072a29be613e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 07:11:22 +0700 Subject: [PATCH 595/667] Add model 2023-11-08-bulbert_ner_wikiann_en --- .../2023-11-08-bulbert_ner_wikiann_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bulbert_ner_wikiann_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bulbert_ner_wikiann_en.md b/docs/_posts/ahmedlone127/2023-11-08-bulbert_ner_wikiann_en.md new file mode 100644 index 00000000000000..83c39c7211fc72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bulbert_ner_wikiann_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bulbert_ner_wikiann BertForTokenClassification from mor40 +author: John Snow Labs +name: bulbert_ner_wikiann +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bulbert_ner_wikiann` is a English model originally trained by mor40. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bulbert_ner_wikiann_en_5.2.0_3.0_1699402275572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bulbert_ner_wikiann_en_5.2.0_3.0_1699402275572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bulbert_ner_wikiann","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bulbert_ner_wikiann", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bulbert_ner_wikiann| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.1 MB| + +## References + +https://huggingface.co/mor40/BulBERT-ner-wikiann \ No newline at end of file From af02919b3a28f6e4db56319324aafdcf371cd56c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 07:13:23 +0700 Subject: [PATCH 596/667] Add model 2023-11-08-bert_finetuned_ner_louislian2341_en --- ...-08-bert_finetuned_ner_louislian2341_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_louislian2341_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_louislian2341_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_louislian2341_en.md new file mode 100644 index 00000000000000..9262b72cad8e9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_louislian2341_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_louislian2341 BertForTokenClassification from louislian2341 +author: John Snow Labs +name: bert_finetuned_ner_louislian2341 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_louislian2341` is a English model originally trained by louislian2341. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_louislian2341_en_5.2.0_3.0_1699402395910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_louislian2341_en_5.2.0_3.0_1699402395910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_louislian2341","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_louislian2341", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_louislian2341| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/louislian2341/bert-finetuned-ner \ No newline at end of file From fd1182b2094206c91e29560158747bbea1cb159b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 07:34:13 +0700 Subject: [PATCH 597/667] Add model 2023-11-08-guj_sayula_popoluca_tagging_v2_en --- ...11-08-guj_sayula_popoluca_tagging_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-guj_sayula_popoluca_tagging_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-guj_sayula_popoluca_tagging_v2_en.md b/docs/_posts/ahmedlone127/2023-11-08-guj_sayula_popoluca_tagging_v2_en.md new file mode 100644 index 00000000000000..bf3453716cecd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-guj_sayula_popoluca_tagging_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English guj_sayula_popoluca_tagging_v2 BertForTokenClassification from om-ashish-soni +author: John Snow Labs +name: guj_sayula_popoluca_tagging_v2 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`guj_sayula_popoluca_tagging_v2` is a English model originally trained by om-ashish-soni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/guj_sayula_popoluca_tagging_v2_en_5.2.0_3.0_1699403641206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/guj_sayula_popoluca_tagging_v2_en_5.2.0_3.0_1699403641206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("guj_sayula_popoluca_tagging_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("guj_sayula_popoluca_tagging_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|guj_sayula_popoluca_tagging_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.2 MB| + +## References + +https://huggingface.co/om-ashish-soni/guj-pos-tagging-v2 \ No newline at end of file From 97d20b93eeb5c9bd01f7600c0227a9ff0071e7b7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 07:48:52 +0700 Subject: [PATCH 598/667] Add model 2023-11-08-mongolian_bert_base_demo_named_entity_mn --- ...ongolian_bert_base_demo_named_entity_mn.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-mongolian_bert_base_demo_named_entity_mn.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-mongolian_bert_base_demo_named_entity_mn.md b/docs/_posts/ahmedlone127/2023-11-08-mongolian_bert_base_demo_named_entity_mn.md new file mode 100644 index 00000000000000..227535a9d67fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-mongolian_bert_base_demo_named_entity_mn.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Mongolian mongolian_bert_base_demo_named_entity BertForTokenClassification from 2rtl3 +author: John Snow Labs +name: mongolian_bert_base_demo_named_entity +date: 2023-11-08 +tags: [bert, mn, open_source, token_classification, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_bert_base_demo_named_entity` is a Mongolian model originally trained by 2rtl3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_bert_base_demo_named_entity_mn_5.2.0_3.0_1699404521755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_bert_base_demo_named_entity_mn_5.2.0_3.0_1699404521755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("mongolian_bert_base_demo_named_entity","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("mongolian_bert_base_demo_named_entity", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_bert_base_demo_named_entity| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|665.1 MB| + +## References + +https://huggingface.co/2rtl3/mn-bert-base-demo-named-entity \ No newline at end of file From e755802e6d35c440e25c9e60250e7ec99b2917de Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 07:53:19 +0700 Subject: [PATCH 599/667] Add model 2023-11-08-postagger_bio_english_en --- .../2023-11-08-postagger_bio_english_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-postagger_bio_english_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-postagger_bio_english_en.md b/docs/_posts/ahmedlone127/2023-11-08-postagger_bio_english_en.md new file mode 100644 index 00000000000000..df9c0408d58105 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-postagger_bio_english_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English postagger_bio_english BertForTokenClassification from pucpr-br +author: John Snow Labs +name: postagger_bio_english +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_bio_english` is a English model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_bio_english_en_5.2.0_3.0_1699404792031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_bio_english_en_5.2.0_3.0_1699404792031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("postagger_bio_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("postagger_bio_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_bio_english| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/pucpr-br/postagger-bio-english \ No newline at end of file From c0f00244bbd1a9ad52bc9daf4bc6f52d91ea8f40 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 07:54:22 +0700 Subject: [PATCH 600/667] Add model 2023-11-08-finer_139_xtremedistil_l12_h384_en --- ...1-08-finer_139_xtremedistil_l12_h384_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-finer_139_xtremedistil_l12_h384_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-finer_139_xtremedistil_l12_h384_en.md b/docs/_posts/ahmedlone127/2023-11-08-finer_139_xtremedistil_l12_h384_en.md new file mode 100644 index 00000000000000..0bc38cabf43dd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-finer_139_xtremedistil_l12_h384_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English finer_139_xtremedistil_l12_h384 BertForTokenClassification from nbroad +author: John Snow Labs +name: finer_139_xtremedistil_l12_h384 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finer_139_xtremedistil_l12_h384` is a English model originally trained by nbroad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finer_139_xtremedistil_l12_h384_en_5.2.0_3.0_1699404859134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finer_139_xtremedistil_l12_h384_en_5.2.0_3.0_1699404859134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("finer_139_xtremedistil_l12_h384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("finer_139_xtremedistil_l12_h384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finer_139_xtremedistil_l12_h384| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|124.4 MB| + +## References + +https://huggingface.co/nbroad/finer-139-xtremedistil-l12-h384 \ No newline at end of file From f1d82f2c146ea59863befcb08d6113c1a0d9d096 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:20:04 +0700 Subject: [PATCH 601/667] Add model 2023-11-08-biobert_protein_ner_en --- .../2023-11-08-biobert_protein_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-biobert_protein_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-biobert_protein_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-biobert_protein_ner_en.md new file mode 100644 index 00000000000000..b17906b91c640d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-biobert_protein_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biobert_protein_ner BertForTokenClassification from avishvj +author: John Snow Labs +name: biobert_protein_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_protein_ner` is a English model originally trained by avishvj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_protein_ner_en_5.2.0_3.0_1699406360113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_protein_ner_en_5.2.0_3.0_1699406360113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biobert_protein_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biobert_protein_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_protein_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/avishvj/biobert-protein-ner \ No newline at end of file From 26acd70de7d169519e2171989ebf4f7774f7a6fa Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:21:03 +0700 Subject: [PATCH 602/667] Add model 2023-11-08-bert_base_finetuned_ner_en --- .../2023-11-08-bert_base_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_finetuned_ner_en.md new file mode 100644 index 00000000000000..06a05e3c82722e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_finetuned_ner BertForTokenClassification from eeshclusive +author: John Snow Labs +name: bert_base_finetuned_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_ner` is a English model originally trained by eeshclusive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ner_en_5.2.0_3.0_1699406360106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ner_en_5.2.0_3.0_1699406360106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/eeshclusive/bert-base-finetuned-ner \ No newline at end of file From 4df93134be5a8952ea82e8431a7b5d39b18a91ad Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:26:21 +0700 Subject: [PATCH 603/667] Add model 2023-11-08-heb_medical_baseline_en --- .../2023-11-08-heb_medical_baseline_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-heb_medical_baseline_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-heb_medical_baseline_en.md b/docs/_posts/ahmedlone127/2023-11-08-heb_medical_baseline_en.md new file mode 100644 index 00000000000000..a4d77d9503d6b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-heb_medical_baseline_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English heb_medical_baseline BertForTokenClassification from cp500 +author: John Snow Labs +name: heb_medical_baseline +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`heb_medical_baseline` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/heb_medical_baseline_en_5.2.0_3.0_1699406769716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/heb_medical_baseline_en_5.2.0_3.0_1699406769716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("heb_medical_baseline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("heb_medical_baseline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|heb_medical_baseline| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.5 MB| + +## References + +https://huggingface.co/cp500/heb_medical_baseline \ No newline at end of file From 86fdcdffa5306a96ac14158cb1c444d81c24cb47 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:28:35 +0700 Subject: [PATCH 604/667] Add model 2023-11-08-bert_base_uncased_finetuned_ner_sohamtiwari3120_en --- ...ncased_finetuned_ner_sohamtiwari3120_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_uncased_finetuned_ner_sohamtiwari3120_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_uncased_finetuned_ner_sohamtiwari3120_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_uncased_finetuned_ner_sohamtiwari3120_en.md new file mode 100644 index 00000000000000..07ffa5b8fbe22e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_uncased_finetuned_ner_sohamtiwari3120_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_ner_sohamtiwari3120 BertForTokenClassification from sohamtiwari3120 +author: John Snow Labs +name: bert_base_uncased_finetuned_ner_sohamtiwari3120 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_ner_sohamtiwari3120` is a English model originally trained by sohamtiwari3120. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_ner_sohamtiwari3120_en_5.2.0_3.0_1699406908805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_ner_sohamtiwari3120_en_5.2.0_3.0_1699406908805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_finetuned_ner_sohamtiwari3120","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_uncased_finetuned_ner_sohamtiwari3120", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_ner_sohamtiwari3120| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sohamtiwari3120/bert-base-uncased-finetuned-ner \ No newline at end of file From fb5d5d381c5ecc13be24af0374385a6d67102d05 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:43:13 +0700 Subject: [PATCH 605/667] Add model 2023-11-08-bert_base_chinese_finetuned_split_en --- ...08-bert_base_chinese_finetuned_split_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_chinese_finetuned_split_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_chinese_finetuned_split_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_chinese_finetuned_split_en.md new file mode 100644 index 00000000000000..2fdfcab9c9c9dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_chinese_finetuned_split_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_split BertForTokenClassification from zhiguoxu +author: John Snow Labs +name: bert_base_chinese_finetuned_split +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_split` is a English model originally trained by zhiguoxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_split_en_5.2.0_3.0_1699407784326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_split_en_5.2.0_3.0_1699407784326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_split","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_chinese_finetuned_split", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_split| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.2 MB| + +## References + +https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-split \ No newline at end of file From 4cee6a2c157aeca91b9db14be7d9b5a4c82e30d5 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:44:14 +0700 Subject: [PATCH 606/667] Add model 2023-11-08-bert_base_spanish_wwm_uncased_finetuned_ner_en --- ...se_spanish_wwm_uncased_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_spanish_wwm_uncased_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_spanish_wwm_uncased_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_spanish_wwm_uncased_finetuned_ner_en.md new file mode 100644 index 00000000000000..e48428c15828aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_spanish_wwm_uncased_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_spanish_wwm_uncased_finetuned_ner BertForTokenClassification from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_uncased_finetuned_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_uncased_finetuned_ner` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_ner_en_5.2.0_3.0_1699407784218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_ner_en_5.2.0_3.0_1699407784218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_uncased_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_spanish_wwm_uncased_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_uncased_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased-finetuned-ner \ No newline at end of file From 5dcc281173ef6ce1b6bec0f5ce76413e179a994c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:50:36 +0700 Subject: [PATCH 607/667] Add model 2023-11-08-bert_finetuned_ner_heenamir_en --- ...23-11-08-bert_finetuned_ner_heenamir_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_heenamir_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_heenamir_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_heenamir_en.md new file mode 100644 index 00000000000000..8c64a365272bdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_heenamir_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_heenamir BertForTokenClassification from heenamir +author: John Snow Labs +name: bert_finetuned_ner_heenamir +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_heenamir` is a English model originally trained by heenamir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_heenamir_en_5.2.0_3.0_1699408228945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_heenamir_en_5.2.0_3.0_1699408228945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_heenamir","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_heenamir", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_heenamir| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/heenamir/bert-finetuned-ner \ No newline at end of file From 37a138e2686fa8e1d64c6733a3e19031e04d6100 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:56:23 +0700 Subject: [PATCH 608/667] Add model 2023-11-08-multilingual_bengali_token_classification_model_xx --- ...l_bengali_token_classification_model_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-multilingual_bengali_token_classification_model_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-multilingual_bengali_token_classification_model_xx.md b/docs/_posts/ahmedlone127/2023-11-08-multilingual_bengali_token_classification_model_xx.md new file mode 100644 index 00000000000000..5d96c75df5cb76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-multilingual_bengali_token_classification_model_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual multilingual_bengali_token_classification_model BertForTokenClassification from Cabooose +author: John Snow Labs +name: multilingual_bengali_token_classification_model +date: 2023-11-08 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_bengali_token_classification_model` is a Multilingual model originally trained by Cabooose. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_bengali_token_classification_model_xx_5.2.0_3.0_1699408573039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_bengali_token_classification_model_xx_5.2.0_3.0_1699408573039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("multilingual_bengali_token_classification_model","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("multilingual_bengali_token_classification_model", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_bengali_token_classification_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Cabooose/multilingual_bengali_token_classification_model \ No newline at end of file From a18d8ecfdde4b757a7578c6c7c3a6f69ee7a965c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 08:59:41 +0700 Subject: [PATCH 609/667] Add model 2023-11-08-bert_small_finetuned_xglue_ner_en --- ...11-08-bert_small_finetuned_xglue_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_xglue_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_xglue_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_xglue_ner_en.md new file mode 100644 index 00000000000000..116e4499c8528a --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_xglue_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_small_finetuned_xglue_ner BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_small_finetuned_xglue_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_small_finetuned_xglue_ner` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_small_finetuned_xglue_ner_en_5.2.0_3.0_1699408776412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_small_finetuned_xglue_ner_en_5.2.0_3.0_1699408776412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_small_finetuned_xglue_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_small_finetuned_xglue_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_small_finetuned_xglue_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/muhtasham/bert-small-finetuned-xglue-ner \ No newline at end of file From 712aee32b825014de7154a83bb05d6dd5ebbc583 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:02:00 +0700 Subject: [PATCH 610/667] Add model 2023-11-08-bert_finetuned_ner_happy_ditto_en --- ...11-08-bert_finetuned_ner_happy_ditto_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_happy_ditto_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_happy_ditto_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_happy_ditto_en.md new file mode 100644 index 00000000000000..86528e4b03eba0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_happy_ditto_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_happy_ditto BertForTokenClassification from happy-ditto +author: John Snow Labs +name: bert_finetuned_ner_happy_ditto +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_happy_ditto` is a English model originally trained by happy-ditto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_happy_ditto_en_5.2.0_3.0_1699408911205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_happy_ditto_en_5.2.0_3.0_1699408911205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_happy_ditto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_happy_ditto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_happy_ditto| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/happy-ditto/bert-finetuned-ner \ No newline at end of file From a84f74433075bcfe139c07d864432ff9b98b3228 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:04:46 +0700 Subject: [PATCH 611/667] Add model 2023-11-08-bert_mini_finetuned_ner_chinese_en --- ...1-08-bert_mini_finetuned_ner_chinese_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_mini_finetuned_ner_chinese_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_mini_finetuned_ner_chinese_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_mini_finetuned_ner_chinese_en.md new file mode 100644 index 00000000000000..766ab308686938 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_mini_finetuned_ner_chinese_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_mini_finetuned_ner_chinese BertForTokenClassification from IcyKallen +author: John Snow Labs +name: bert_mini_finetuned_ner_chinese +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_finetuned_ner_chinese` is a English model originally trained by IcyKallen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_finetuned_ner_chinese_en_5.2.0_3.0_1699409083770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_finetuned_ner_chinese_en_5.2.0_3.0_1699409083770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_mini_finetuned_ner_chinese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_mini_finetuned_ner_chinese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_finetuned_ner_chinese| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|46.0 MB| + +## References + +https://huggingface.co/IcyKallen/bert-mini-finetuned-ner-chinese \ No newline at end of file From b160c0e3c5bcbcfd9f6644189ebd937b2051ea22 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:11:08 +0700 Subject: [PATCH 612/667] Add model 2023-11-08-nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx --- ...e_tuned_bert_base_multilingual_cased_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx.md b/docs/_posts/ahmedlone127/2023-11-08-nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx.md new file mode 100644 index 00000000000000..85be49bd608e96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased BertForTokenClassification from GuCuChiara +author: John Snow Labs +name: nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased +date: 2023-11-08 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased` is a Multilingual model originally trained by GuCuChiara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx_5.2.0_3.0_1699409457516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased_xx_5.2.0_3.0_1699409457516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hiba_distemist_fine_tuned_bert_base_multilingual_cased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/GuCuChiara/NLP-HIBA_DisTEMIST_fine_tuned_bert-base-multilingual-cased \ No newline at end of file From d1b036f73c0d0d134b34dad2d9ea66887f666bcf Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:18:53 +0700 Subject: [PATCH 613/667] Add model 2023-11-08-bert_tiny_finetuned_finer_en --- ...2023-11-08-bert_tiny_finetuned_finer_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_tiny_finetuned_finer_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_tiny_finetuned_finer_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_tiny_finetuned_finer_en.md new file mode 100644 index 00000000000000..c1806f511a0498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_tiny_finetuned_finer_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_tiny_finetuned_finer BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_tiny_finetuned_finer +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_finetuned_finer` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_finetuned_finer_en_5.2.0_3.0_1699409931982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_finetuned_finer_en_5.2.0_3.0_1699409931982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_tiny_finetuned_finer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_tiny_finetuned_finer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_finetuned_finer| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/muhtasham/bert-tiny-finetuned-finer \ No newline at end of file From bce2d4ef8cb4330e535ee06c7aa8ef9c647d3453 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:27:49 +0700 Subject: [PATCH 614/667] Add model 2023-11-08-multilingual_indonesian_token_classification_model_xx --- ...ndonesian_token_classification_model_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-multilingual_indonesian_token_classification_model_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-multilingual_indonesian_token_classification_model_xx.md b/docs/_posts/ahmedlone127/2023-11-08-multilingual_indonesian_token_classification_model_xx.md new file mode 100644 index 00000000000000..11e6cba2648241 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-multilingual_indonesian_token_classification_model_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual multilingual_indonesian_token_classification_model BertForTokenClassification from Cabooose +author: John Snow Labs +name: multilingual_indonesian_token_classification_model +date: 2023-11-08 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_indonesian_token_classification_model` is a Multilingual model originally trained by Cabooose. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_indonesian_token_classification_model_xx_5.2.0_3.0_1699410457762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_indonesian_token_classification_model_xx_5.2.0_3.0_1699410457762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("multilingual_indonesian_token_classification_model","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("multilingual_indonesian_token_classification_model", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_indonesian_token_classification_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/Cabooose/multilingual_indonesian_token_classification_model \ No newline at end of file From 223145ca8bea2d2acf3a940eb40ffabf27e9e4c6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:28:50 +0700 Subject: [PATCH 615/667] Add model 2023-11-08-rhenus_v1_0_bert_base_multilingual_uncased_xx --- ..._v1_0_bert_base_multilingual_uncased_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-rhenus_v1_0_bert_base_multilingual_uncased_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-rhenus_v1_0_bert_base_multilingual_uncased_xx.md b/docs/_posts/ahmedlone127/2023-11-08-rhenus_v1_0_bert_base_multilingual_uncased_xx.md new file mode 100644 index 00000000000000..7d95272f000254 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-rhenus_v1_0_bert_base_multilingual_uncased_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual rhenus_v1_0_bert_base_multilingual_uncased BertForTokenClassification from DataIntelligenceTeam +author: John Snow Labs +name: rhenus_v1_0_bert_base_multilingual_uncased +date: 2023-11-08 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rhenus_v1_0_bert_base_multilingual_uncased` is a Multilingual model originally trained by DataIntelligenceTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rhenus_v1_0_bert_base_multilingual_uncased_xx_5.2.0_3.0_1699410469287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rhenus_v1_0_bert_base_multilingual_uncased_xx_5.2.0_3.0_1699410469287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("rhenus_v1_0_bert_base_multilingual_uncased","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("rhenus_v1_0_bert_base_multilingual_uncased", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rhenus_v1_0_bert_base_multilingual_uncased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|625.7 MB| + +## References + +https://huggingface.co/DataIntelligenceTeam/rhenus_v1.0_bert-base-multilingual-uncased \ No newline at end of file From f350caa01bfd2aca5c355c61f1e5e6f1692e31d9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:32:22 +0700 Subject: [PATCH 616/667] Add model 2023-11-08-resume_ner_1_en --- .../2023-11-08-resume_ner_1_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-resume_ner_1_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-resume_ner_1_en.md b/docs/_posts/ahmedlone127/2023-11-08-resume_ner_1_en.md new file mode 100644 index 00000000000000..d14296942e75a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-resume_ner_1_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English resume_ner_1 BertForTokenClassification from QuanjieHan +author: John Snow Labs +name: resume_ner_1 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`resume_ner_1` is a English model originally trained by QuanjieHan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/resume_ner_1_en_5.2.0_3.0_1699410735265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/resume_ner_1_en_5.2.0_3.0_1699410735265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("resume_ner_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("resume_ner_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|resume_ner_1| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/QuanjieHan/resume_ner_1 \ No newline at end of file From 294de0a5dffaf9b64fa011dd996c8c3948b8ad2f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:39:22 +0700 Subject: [PATCH 617/667] Add model 2023-11-08-bert_finetuned_ner_joannaandrews_en --- ...-08-bert_finetuned_ner_joannaandrews_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_joannaandrews_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_joannaandrews_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_joannaandrews_en.md new file mode 100644 index 00000000000000..3a79a2031fd549 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_joannaandrews_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_joannaandrews BertForTokenClassification from JoannaAndrews +author: John Snow Labs +name: bert_finetuned_ner_joannaandrews +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_joannaandrews` is a English model originally trained by JoannaAndrews. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_joannaandrews_en_5.2.0_3.0_1699411150741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_joannaandrews_en_5.2.0_3.0_1699411150741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_joannaandrews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_joannaandrews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_joannaandrews| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/JoannaAndrews/bert-finetuned-ner \ No newline at end of file From 2e957209b285221c926035eb6aa18ce9948a2ec4 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:43:59 +0700 Subject: [PATCH 618/667] Add model 2023-11-08-all_15_bert_finetuned_ner_en --- ...2023-11-08-all_15_bert_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-all_15_bert_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-all_15_bert_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-all_15_bert_finetuned_ner_en.md new file mode 100644 index 00000000000000..db05f7e24ab58f --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-all_15_bert_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English all_15_bert_finetuned_ner BertForTokenClassification from leo93 +author: John Snow Labs +name: all_15_bert_finetuned_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_15_bert_finetuned_ner` is a English model originally trained by leo93. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_15_bert_finetuned_ner_en_5.2.0_3.0_1699411430446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_15_bert_finetuned_ner_en_5.2.0_3.0_1699411430446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("all_15_bert_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("all_15_bert_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_15_bert_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/leo93/all-15-bert-finetuned-ner \ No newline at end of file From 602732e9944c69cb4240a8f03c767321814807a6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:52:09 +0700 Subject: [PATCH 619/667] Add model 2023-11-08-biobert_ner_diseases_model_en --- ...023-11-08-biobert_ner_diseases_model_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-biobert_ner_diseases_model_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-biobert_ner_diseases_model_en.md b/docs/_posts/ahmedlone127/2023-11-08-biobert_ner_diseases_model_en.md new file mode 100644 index 00000000000000..1ec12dc2c4b015 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-biobert_ner_diseases_model_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biobert_ner_diseases_model BertForTokenClassification from rjac +author: John Snow Labs +name: biobert_ner_diseases_model +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_ner_diseases_model` is a English model originally trained by rjac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_ner_diseases_model_en_5.2.0_3.0_1699411920789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_ner_diseases_model_en_5.2.0_3.0_1699411920789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biobert_ner_diseases_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biobert_ner_diseases_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_ner_diseases_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/rjac/biobert-ner-diseases-model \ No newline at end of file From 3c3524da3952ed1a49daa20e7b83b2289a310422 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:54:49 +0700 Subject: [PATCH 620/667] Add model 2023-11-08-bert_finetuned_ner_na20b039_en --- ...23-11-08-bert_finetuned_ner_na20b039_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_na20b039_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_na20b039_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_na20b039_en.md new file mode 100644 index 00000000000000..473b2494ca5e0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_na20b039_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_na20b039 BertForTokenClassification from na20b039 +author: John Snow Labs +name: bert_finetuned_ner_na20b039 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_na20b039` is a English model originally trained by na20b039. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_na20b039_en_5.2.0_3.0_1699412078538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_na20b039_en_5.2.0_3.0_1699412078538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_na20b039","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_na20b039", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_na20b039| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/na20b039/bert-finetuned-ner \ No newline at end of file From 6d13a05b59cd217d7002cca0d4bfe501b9732240 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 09:56:05 +0700 Subject: [PATCH 621/667] Add model 2023-11-08-pubmedbert_base_finetuned_n2c2_ner_en --- ...8-pubmedbert_base_finetuned_n2c2_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-pubmedbert_base_finetuned_n2c2_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-pubmedbert_base_finetuned_n2c2_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-pubmedbert_base_finetuned_n2c2_ner_en.md new file mode 100644 index 00000000000000..44531185a15f4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-pubmedbert_base_finetuned_n2c2_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English pubmedbert_base_finetuned_n2c2_ner BertForTokenClassification from georgeleung30 +author: John Snow Labs +name: pubmedbert_base_finetuned_n2c2_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pubmedbert_base_finetuned_n2c2_ner` is a English model originally trained by georgeleung30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pubmedbert_base_finetuned_n2c2_ner_en_5.2.0_3.0_1699412158375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pubmedbert_base_finetuned_n2c2_ner_en_5.2.0_3.0_1699412158375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("pubmedbert_base_finetuned_n2c2_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("pubmedbert_base_finetuned_n2c2_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pubmedbert_base_finetuned_n2c2_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/georgeleung30/PubMedBERT-base-finetuned-n2c2-ner \ No newline at end of file From 78634304401ae59391d23be408be315462f8c2ee Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 10:11:24 +0700 Subject: [PATCH 622/667] Add model 2023-11-08-bert_base_portuguese_cased_harem_selective_samoan_first_ner_en --- ...sed_harem_selective_samoan_first_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_samoan_first_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_samoan_first_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_samoan_first_ner_en.md new file mode 100644 index 00000000000000..d671ad9c6491e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_samoan_first_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_portuguese_cased_harem_selective_samoan_first_ner BertForTokenClassification from jordyvl +author: John Snow Labs +name: bert_base_portuguese_cased_harem_selective_samoan_first_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_harem_selective_samoan_first_ner` is a English model originally trained by jordyvl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_harem_selective_samoan_first_ner_en_5.2.0_3.0_1699413077831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_harem_selective_samoan_first_ner_en_5.2.0_3.0_1699413077831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_portuguese_cased_harem_selective_samoan_first_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_portuguese_cased_harem_selective_samoan_first_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_harem_selective_samoan_first_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/jordyvl/bert-base-portuguese-cased_harem-selective-sm-first-ner \ No newline at end of file From 0dce81618642400e6ad5cb6b0068c87b25d13018 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 10:21:32 +0700 Subject: [PATCH 623/667] Add model 2023-11-08-bert_restore_punctuation_st1992_en --- ...1-08-bert_restore_punctuation_st1992_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_restore_punctuation_st1992_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_restore_punctuation_st1992_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_restore_punctuation_st1992_en.md new file mode 100644 index 00000000000000..052bebcc84cb29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_restore_punctuation_st1992_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_restore_punctuation_st1992 BertForTokenClassification from st1992 +author: John Snow Labs +name: bert_restore_punctuation_st1992 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_restore_punctuation_st1992` is a English model originally trained by st1992. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_restore_punctuation_st1992_en_5.2.0_3.0_1699413684970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_restore_punctuation_st1992_en_5.2.0_3.0_1699413684970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_restore_punctuation_st1992","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_restore_punctuation_st1992", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_restore_punctuation_st1992| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/st1992/bert-restore-punctuation \ No newline at end of file From 93d814d7d86db9d4cb60ea4a67246d5d0bde869b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 10:24:30 +0700 Subject: [PATCH 624/667] Add model 2023-11-08-biomedical_ner_maccrobat_bert_en --- ...-11-08-biomedical_ner_maccrobat_bert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-biomedical_ner_maccrobat_bert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-biomedical_ner_maccrobat_bert_en.md b/docs/_posts/ahmedlone127/2023-11-08-biomedical_ner_maccrobat_bert_en.md new file mode 100644 index 00000000000000..edaebeaa02b42d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-biomedical_ner_maccrobat_bert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biomedical_ner_maccrobat_bert BertForTokenClassification from vineetsharma +author: John Snow Labs +name: biomedical_ner_maccrobat_bert +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomedical_ner_maccrobat_bert` is a English model originally trained by vineetsharma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomedical_ner_maccrobat_bert_en_5.2.0_3.0_1699413863398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomedical_ner_maccrobat_bert_en_5.2.0_3.0_1699413863398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biomedical_ner_maccrobat_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biomedical_ner_maccrobat_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomedical_ner_maccrobat_bert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/vineetsharma/BioMedical_NER-maccrobat-bert \ No newline at end of file From 212c7e1623c6ee2c4e0b4e115f70b79530bf0b4d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 10:36:13 +0700 Subject: [PATCH 625/667] Add model 2023-11-08-bio_clinicalbert_2e5_top10_20testset_en --- ...bio_clinicalbert_2e5_top10_20testset_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bio_clinicalbert_2e5_top10_20testset_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bio_clinicalbert_2e5_top10_20testset_en.md b/docs/_posts/ahmedlone127/2023-11-08-bio_clinicalbert_2e5_top10_20testset_en.md new file mode 100644 index 00000000000000..1c90c9a04bf643 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bio_clinicalbert_2e5_top10_20testset_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bio_clinicalbert_2e5_top10_20testset BertForTokenClassification from alecocc +author: John Snow Labs +name: bio_clinicalbert_2e5_top10_20testset +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bio_clinicalbert_2e5_top10_20testset` is a English model originally trained by alecocc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bio_clinicalbert_2e5_top10_20testset_en_5.2.0_3.0_1699414566133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bio_clinicalbert_2e5_top10_20testset_en_5.2.0_3.0_1699414566133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bio_clinicalbert_2e5_top10_20testset","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bio_clinicalbert_2e5_top10_20testset", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bio_clinicalbert_2e5_top10_20testset| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/alecocc/Bio_ClinicalBERT_2e5_top10_20testset \ No newline at end of file From dd8f7175213175d22895fd3af34485ba88cff860 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 10:52:51 +0700 Subject: [PATCH 626/667] Add model 2023-11-08-bert_finetuned_ner_chinese_people_daily_en --- ...t_finetuned_ner_chinese_people_daily_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_chinese_people_daily_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_chinese_people_daily_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_chinese_people_daily_en.md new file mode 100644 index 00000000000000..6ab925d7f7388d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_chinese_people_daily_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_chinese_people_daily BertForTokenClassification from johnyyhk +author: John Snow Labs +name: bert_finetuned_ner_chinese_people_daily +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_chinese_people_daily` is a English model originally trained by johnyyhk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_chinese_people_daily_en_5.2.0_3.0_1699415561401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_chinese_people_daily_en_5.2.0_3.0_1699415561401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_chinese_people_daily","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_chinese_people_daily", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_chinese_people_daily| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/johnyyhk/bert-finetuned-ner-chinese-people-daily \ No newline at end of file From 953e356ce6d787f83df26689bb95aceaef8177e2 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 10:53:51 +0700 Subject: [PATCH 627/667] Add model 2023-11-08-tamil_ner_model_en --- .../2023-11-08-tamil_ner_model_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-tamil_ner_model_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-tamil_ner_model_en.md b/docs/_posts/ahmedlone127/2023-11-08-tamil_ner_model_en.md new file mode 100644 index 00000000000000..b45557c820efec --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-tamil_ner_model_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English tamil_ner_model BertForTokenClassification from sathishmahi +author: John Snow Labs +name: tamil_ner_model +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamil_ner_model` is a English model originally trained by sathishmahi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamil_ner_model_en_5.2.0_3.0_1699415561393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamil_ner_model_en_5.2.0_3.0_1699415561393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("tamil_ner_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("tamil_ner_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamil_ner_model| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sathishmahi/tamil-ner-model \ No newline at end of file From 3541d38e3287db7186f5966fb3c06bf0d61505d9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:06:19 +0700 Subject: [PATCH 628/667] Add model 2023-11-08-autotrain_re_syn_cleanedtext_bert_55272128958_en --- ..._re_syn_cleanedtext_bert_55272128958_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-autotrain_re_syn_cleanedtext_bert_55272128958_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-autotrain_re_syn_cleanedtext_bert_55272128958_en.md b/docs/_posts/ahmedlone127/2023-11-08-autotrain_re_syn_cleanedtext_bert_55272128958_en.md new file mode 100644 index 00000000000000..915982ba6be71b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-autotrain_re_syn_cleanedtext_bert_55272128958_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English autotrain_re_syn_cleanedtext_bert_55272128958 BertForTokenClassification from sxandie +author: John Snow Labs +name: autotrain_re_syn_cleanedtext_bert_55272128958 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_re_syn_cleanedtext_bert_55272128958` is a English model originally trained by sxandie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_re_syn_cleanedtext_bert_55272128958_en_5.2.0_3.0_1699416369016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_re_syn_cleanedtext_bert_55272128958_en_5.2.0_3.0_1699416369016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("autotrain_re_syn_cleanedtext_bert_55272128958","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("autotrain_re_syn_cleanedtext_bert_55272128958", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_re_syn_cleanedtext_bert_55272128958| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/sxandie/autotrain-re_syn_cleanedtext_bert-55272128958 \ No newline at end of file From a771d6d4a9eb9bd0f34ae92c6670e8419be55bad Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:07:44 +0700 Subject: [PATCH 629/667] Add model 2023-11-08-bert_finetuned_sst2_en --- .../2023-11-08-bert_finetuned_sst2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_sst2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_sst2_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_sst2_en.md new file mode 100644 index 00000000000000..55a295ac682525 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_sst2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_sst2 BertForTokenClassification from asimokby +author: John Snow Labs +name: bert_finetuned_sst2 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_sst2` is a English model originally trained by asimokby. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_sst2_en_5.2.0_3.0_1699416456457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_sst2_en_5.2.0_3.0_1699416456457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_sst2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/asimokby/bert-finetuned-sst2 \ No newline at end of file From b54487e3a0cd30d28dfdc147029b588c9721c948 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:17:36 +0700 Subject: [PATCH 630/667] Add model 2023-11-08-bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx --- ...ngual_cased_finetuned_ner_mayagalvez_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx.md new file mode 100644 index 00000000000000..17dd7bc3a10764 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_ner_mayagalvez BertForTokenClassification from MayaGalvez +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_ner_mayagalvez +date: 2023-11-08 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_ner_mayagalvez` is a Multilingual model originally trained by MayaGalvez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx_5.2.0_3.0_1699417045903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_ner_mayagalvez_xx_5.2.0_3.0_1699417045903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_finetuned_ner_mayagalvez","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_multilingual_cased_finetuned_ner_mayagalvez", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_ner_mayagalvez| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/MayaGalvez/bert-base-multilingual-cased-finetuned-ner \ No newline at end of file From 84ce522026104b7c5c70d229d3ee17aee2da005e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:25:50 +0700 Subject: [PATCH 631/667] Add model 2023-11-08-greek_legal_bert_v2_finetuned_ner_v3_en --- ...greek_legal_bert_v2_finetuned_ner_v3_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-greek_legal_bert_v2_finetuned_ner_v3_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-greek_legal_bert_v2_finetuned_ner_v3_en.md b/docs/_posts/ahmedlone127/2023-11-08-greek_legal_bert_v2_finetuned_ner_v3_en.md new file mode 100644 index 00000000000000..82dbd46202cf3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-greek_legal_bert_v2_finetuned_ner_v3_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English greek_legal_bert_v2_finetuned_ner_v3 BertForTokenClassification from amichailidis +author: John Snow Labs +name: greek_legal_bert_v2_finetuned_ner_v3 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`greek_legal_bert_v2_finetuned_ner_v3` is a English model originally trained by amichailidis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/greek_legal_bert_v2_finetuned_ner_v3_en_5.2.0_3.0_1699417541056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/greek_legal_bert_v2_finetuned_ner_v3_en_5.2.0_3.0_1699417541056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("greek_legal_bert_v2_finetuned_ner_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("greek_legal_bert_v2_finetuned_ner_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|greek_legal_bert_v2_finetuned_ner_v3| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|421.0 MB| + +## References + +https://huggingface.co/amichailidis/greek_legal_bert_v2-finetuned-ner-V3 \ No newline at end of file From fe1a974e7fca4aec9d1c5634ef29c173a15da0c6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:36:45 +0700 Subject: [PATCH 632/667] Add model 2023-11-08-assignment2_attempt10_en --- .../2023-11-08-assignment2_attempt10_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt10_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt10_en.md b/docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt10_en.md new file mode 100644 index 00000000000000..fb6213c446de21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt10_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English assignment2_attempt10 BertForTokenClassification from mpalaval +author: John Snow Labs +name: assignment2_attempt10 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`assignment2_attempt10` is a English model originally trained by mpalaval. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/assignment2_attempt10_en_5.2.0_3.0_1699418198251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/assignment2_attempt10_en_5.2.0_3.0_1699418198251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("assignment2_attempt10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("assignment2_attempt10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|assignment2_attempt10| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mpalaval/assignment2_attempt10 \ No newline at end of file From dc8e168f72fcc038656a42046ce178981d013b4a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:47:51 +0700 Subject: [PATCH 633/667] Add model 2023-11-08-bert_finetuned_ner_suraj_yadav_en --- ...11-08-bert_finetuned_ner_suraj_yadav_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_suraj_yadav_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_suraj_yadav_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_suraj_yadav_en.md new file mode 100644 index 00000000000000..006950e17bea61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_suraj_yadav_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_suraj_yadav BertForTokenClassification from Suraj-Yadav +author: John Snow Labs +name: bert_finetuned_ner_suraj_yadav +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_suraj_yadav` is a English model originally trained by Suraj-Yadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_suraj_yadav_en_5.2.0_3.0_1699418864847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_suraj_yadav_en_5.2.0_3.0_1699418864847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_suraj_yadav","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_suraj_yadav", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_suraj_yadav| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Suraj-Yadav/bert-finetuned-ner \ No newline at end of file From 5bd3231e2258f7545850d9896d5adab052686f03 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 11:48:52 +0700 Subject: [PATCH 634/667] Add model 2023-11-08-bert_finetuned_ner_tw5n14_en --- ...2023-11-08-bert_finetuned_ner_tw5n14_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_tw5n14_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_tw5n14_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_tw5n14_en.md new file mode 100644 index 00000000000000..169fbb4f0187e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_tw5n14_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_tw5n14 BertForTokenClassification from tw5n14 +author: John Snow Labs +name: bert_finetuned_ner_tw5n14 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_tw5n14` is a English model originally trained by tw5n14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tw5n14_en_5.2.0_3.0_1699418917060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tw5n14_en_5.2.0_3.0_1699418917060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_tw5n14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_tw5n14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_tw5n14| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/tw5n14/bert-finetuned-ner \ No newline at end of file From f7268689965baa2566eefe0d97682c41d42074f3 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:02:29 +0700 Subject: [PATCH 635/667] Add model 2023-11-08-shingazidja_sayula_popoluca_en --- ...23-11-08-shingazidja_sayula_popoluca_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-shingazidja_sayula_popoluca_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-shingazidja_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2023-11-08-shingazidja_sayula_popoluca_en.md new file mode 100644 index 00000000000000..e46775a46ef6a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-shingazidja_sayula_popoluca_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English shingazidja_sayula_popoluca BertForTokenClassification from nairaxo +author: John Snow Labs +name: shingazidja_sayula_popoluca +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`shingazidja_sayula_popoluca` is a English model originally trained by nairaxo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/shingazidja_sayula_popoluca_en_5.2.0_3.0_1699419736974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/shingazidja_sayula_popoluca_en_5.2.0_3.0_1699419736974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("shingazidja_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("shingazidja_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|shingazidja_sayula_popoluca| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|752.9 MB| + +## References + +https://huggingface.co/nairaxo/shingazidja-pos \ No newline at end of file From 3ba1f8c0cfdd1438ff8ab72cccb14c8a9fa2e7ca Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:05:00 +0700 Subject: [PATCH 636/667] Add model 2023-11-08-archaeobert_ner_en --- .../2023-11-08-archaeobert_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-archaeobert_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-archaeobert_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-archaeobert_ner_en.md new file mode 100644 index 00000000000000..943d00e0c708dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-archaeobert_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English archaeobert_ner BertForTokenClassification from alexbrandsen +author: John Snow Labs +name: archaeobert_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`archaeobert_ner` is a English model originally trained by alexbrandsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/archaeobert_ner_en_5.2.0_3.0_1699419891519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/archaeobert_ner_en_5.2.0_3.0_1699419891519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("archaeobert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("archaeobert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|archaeobert_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/alexbrandsen/ArchaeoBERT-NER \ No newline at end of file From 82af89a8d4c618c67ef3e7099f0e453aeecbe9df Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:09:09 +0700 Subject: [PATCH 637/667] Add model 2023-11-08-postagger_south_azerbaijani_az --- ...23-11-08-postagger_south_azerbaijani_az.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-postagger_south_azerbaijani_az.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-postagger_south_azerbaijani_az.md b/docs/_posts/ahmedlone127/2023-11-08-postagger_south_azerbaijani_az.md new file mode 100644 index 00000000000000..d8e1b896c54ce3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-postagger_south_azerbaijani_az.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Azerbaijani postagger_south_azerbaijani BertForTokenClassification from language-ml-lab +author: John Snow Labs +name: postagger_south_azerbaijani +date: 2023-11-08 +tags: [bert, az, open_source, token_classification, onnx] +task: Named Entity Recognition +language: az +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_south_azerbaijani` is a Azerbaijani model originally trained by language-ml-lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_south_azerbaijani_az_5.2.0_3.0_1699420138102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_south_azerbaijani_az_5.2.0_3.0_1699420138102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("postagger_south_azerbaijani","az") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("postagger_south_azerbaijani", "az") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_south_azerbaijani| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|az| +|Size:|347.5 MB| + +## References + +https://huggingface.co/language-ml-lab/postagger-azb \ No newline at end of file From 124449756ee6f0ac258d872342d66d9b757ab769 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:11:59 +0700 Subject: [PATCH 638/667] Add model 2023-11-08-bert_portuguese_event_trigger_en --- ...-11-08-bert_portuguese_event_trigger_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_portuguese_event_trigger_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_portuguese_event_trigger_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_portuguese_event_trigger_en.md new file mode 100644 index 00000000000000..07896a2b3bdc43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_portuguese_event_trigger_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_portuguese_event_trigger BertForTokenClassification from lfcc +author: John Snow Labs +name: bert_portuguese_event_trigger +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_portuguese_event_trigger` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_portuguese_event_trigger_en_5.2.0_3.0_1699420311858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_portuguese_event_trigger_en_5.2.0_3.0_1699420311858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_portuguese_event_trigger","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_portuguese_event_trigger", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_portuguese_event_trigger| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/lfcc/bert-portuguese-event-trigger \ No newline at end of file From 3c3bdbf53935699a2a4bb6c77100793f9ea5e543 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:13:20 +0700 Subject: [PATCH 639/667] Add model 2023-11-08-bert_finetuned_ner_erickrribeiro_en --- ...-08-bert_finetuned_ner_erickrribeiro_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_erickrribeiro_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_erickrribeiro_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_erickrribeiro_en.md new file mode 100644 index 00000000000000..b6a5631994cb86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_erickrribeiro_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_erickrribeiro BertForTokenClassification from erickrribeiro +author: John Snow Labs +name: bert_finetuned_ner_erickrribeiro +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_erickrribeiro` is a English model originally trained by erickrribeiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_erickrribeiro_en_5.2.0_3.0_1699420390755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_erickrribeiro_en_5.2.0_3.0_1699420390755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_erickrribeiro","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_erickrribeiro", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_erickrribeiro| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/erickrribeiro/bert-finetuned-ner \ No newline at end of file From e73393c9e7bae50463f6a86019dd1bac41c6181b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:27:56 +0700 Subject: [PATCH 640/667] Add model 2023-11-08-ner_resume_en --- .../ahmedlone127/2023-11-08-ner_resume_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-ner_resume_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-ner_resume_en.md b/docs/_posts/ahmedlone127/2023-11-08-ner_resume_en.md new file mode 100644 index 00000000000000..7f606c07586f4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-ner_resume_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English ner_resume BertForTokenClassification from momo22 +author: John Snow Labs +name: ner_resume +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_resume` is a English model originally trained by momo22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_resume_en_5.2.0_3.0_1699421268824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_resume_en_5.2.0_3.0_1699421268824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("ner_resume","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("ner_resume", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_resume| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/momo22/ner_resume \ No newline at end of file From 5e38c95478386b1a3391bef0d39f8a8134d7c38a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:31:56 +0700 Subject: [PATCH 641/667] Add model 2023-11-08-bert_finetuned_ner_roverandom95_en --- ...1-08-bert_finetuned_ner_roverandom95_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_roverandom95_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_roverandom95_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_roverandom95_en.md new file mode 100644 index 00000000000000..62bab92b7c9111 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_roverandom95_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_roverandom95 BertForTokenClassification from Roverandom95 +author: John Snow Labs +name: bert_finetuned_ner_roverandom95 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_roverandom95` is a English model originally trained by Roverandom95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_roverandom95_en_5.2.0_3.0_1699421456125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_roverandom95_en_5.2.0_3.0_1699421456125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_roverandom95","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_roverandom95", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_roverandom95| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/Roverandom95/bert-finetuned-ner \ No newline at end of file From 0f329b1379f8b90d5f5bbf429490e2e43792f844 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:33:44 +0700 Subject: [PATCH 642/667] Add model 2023-11-08-vietnamese_ner_v1_4_0a2_en --- .../2023-11-08-vietnamese_ner_v1_4_0a2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-vietnamese_ner_v1_4_0a2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-vietnamese_ner_v1_4_0a2_en.md b/docs/_posts/ahmedlone127/2023-11-08-vietnamese_ner_v1_4_0a2_en.md new file mode 100644 index 00000000000000..50a2ef9235f8d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-vietnamese_ner_v1_4_0a2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English vietnamese_ner_v1_4_0a2 BertForTokenClassification from rain1024 +author: John Snow Labs +name: vietnamese_ner_v1_4_0a2 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vietnamese_ner_v1_4_0a2` is a English model originally trained by rain1024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vietnamese_ner_v1_4_0a2_en_5.2.0_3.0_1699421616560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vietnamese_ner_v1_4_0a2_en_5.2.0_3.0_1699421616560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("vietnamese_ner_v1_4_0a2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("vietnamese_ner_v1_4_0a2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vietnamese_ner_v1_4_0a2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|428.8 MB| + +## References + +https://huggingface.co/rain1024/vietnamese-ner-v1.4.0a2 \ No newline at end of file From 98f58cd89845e66563fcd8fa880ceeaf2c69653d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:40:46 +0700 Subject: [PATCH 643/667] Add model 2023-11-08-scibert_scivocab_uncased_finetuned_ner_sschet_en --- ...civocab_uncased_finetuned_ner_sschet_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-scibert_scivocab_uncased_finetuned_ner_sschet_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-scibert_scivocab_uncased_finetuned_ner_sschet_en.md b/docs/_posts/ahmedlone127/2023-11-08-scibert_scivocab_uncased_finetuned_ner_sschet_en.md new file mode 100644 index 00000000000000..e6a26e516b8259 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-scibert_scivocab_uncased_finetuned_ner_sschet_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English scibert_scivocab_uncased_finetuned_ner_sschet BertForTokenClassification from sschet +author: John Snow Labs +name: scibert_scivocab_uncased_finetuned_ner_sschet +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_scivocab_uncased_finetuned_ner_sschet` is a English model originally trained by sschet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_scivocab_uncased_finetuned_ner_sschet_en_5.2.0_3.0_1699422037617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_scivocab_uncased_finetuned_ner_sschet_en_5.2.0_3.0_1699422037617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("scibert_scivocab_uncased_finetuned_ner_sschet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("scibert_scivocab_uncased_finetuned_ner_sschet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_scivocab_uncased_finetuned_ner_sschet| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/sschet/scibert_scivocab_uncased-finetuned-ner \ No newline at end of file From 90f8efbab13b285c214b80cad89e8abb1c955c59 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 12:46:29 +0700 Subject: [PATCH 644/667] Add model 2023-11-08-bert_small_finetuned_wnut17_ner_en --- ...1-08-bert_small_finetuned_wnut17_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_wnut17_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_wnut17_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_wnut17_ner_en.md new file mode 100644 index 00000000000000..7c1e04120c620d --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_small_finetuned_wnut17_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_small_finetuned_wnut17_ner BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_small_finetuned_wnut17_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_small_finetuned_wnut17_ner` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_small_finetuned_wnut17_ner_en_5.2.0_3.0_1699422386690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_small_finetuned_wnut17_ner_en_5.2.0_3.0_1699422386690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_small_finetuned_wnut17_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_small_finetuned_wnut17_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_small_finetuned_wnut17_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/muhtasham/bert-small-finetuned-wnut17-ner \ No newline at end of file From f0cd2030ad0b5c0f62ba5bb1a18118cfe27e6940 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:02:03 +0700 Subject: [PATCH 645/667] Add model 2023-11-08-clinicalnerpt_quantitative_pt --- ...023-11-08-clinicalnerpt_quantitative_pt.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-clinicalnerpt_quantitative_pt.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-clinicalnerpt_quantitative_pt.md b/docs/_posts/ahmedlone127/2023-11-08-clinicalnerpt_quantitative_pt.md new file mode 100644 index 00000000000000..3efab3787f39e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-clinicalnerpt_quantitative_pt.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Portuguese clinicalnerpt_quantitative BertForTokenClassification from pucpr +author: John Snow Labs +name: clinicalnerpt_quantitative +date: 2023-11-08 +tags: [bert, pt, open_source, token_classification, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalnerpt_quantitative` is a Portuguese model originally trained by pucpr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalnerpt_quantitative_pt_5.2.0_3.0_1699423311702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalnerpt_quantitative_pt_5.2.0_3.0_1699423311702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("clinicalnerpt_quantitative","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("clinicalnerpt_quantitative", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalnerpt_quantitative| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.8 MB| + +## References + +https://huggingface.co/pucpr/clinicalnerpt-quantitative \ No newline at end of file From 505c72be8d97072899b2037436f029aa32b3b85b Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:08:47 +0700 Subject: [PATCH 646/667] Add model 2023-11-08-bert_finetuned_ner_mie_zhz_en --- ...023-11-08-bert_finetuned_ner_mie_zhz_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_mie_zhz_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_mie_zhz_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_mie_zhz_en.md new file mode 100644 index 00000000000000..b2d4c603a3dcce --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_mie_zhz_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_mie_zhz BertForTokenClassification from mie-zhz +author: John Snow Labs +name: bert_finetuned_ner_mie_zhz +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_mie_zhz` is a English model originally trained by mie-zhz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mie_zhz_en_5.2.0_3.0_1699423718558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mie_zhz_en_5.2.0_3.0_1699423718558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_mie_zhz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_mie_zhz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_mie_zhz| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mie-zhz/bert-finetuned-ner \ No newline at end of file From 7ffdda5f3ac34bd415d45e59c7016cde870b54e9 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:16:03 +0700 Subject: [PATCH 647/667] Add model 2023-11-08-bert_multilingual_finetuned_history_ner_sub_ontology_xx --- ...l_finetuned_history_ner_sub_ontology_xx.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_multilingual_finetuned_history_ner_sub_ontology_xx.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_multilingual_finetuned_history_ner_sub_ontology_xx.md b/docs/_posts/ahmedlone127/2023-11-08-bert_multilingual_finetuned_history_ner_sub_ontology_xx.md new file mode 100644 index 00000000000000..9c4768d91d544e --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_multilingual_finetuned_history_ner_sub_ontology_xx.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Multilingual bert_multilingual_finetuned_history_ner_sub_ontology BertForTokenClassification from QuanAI +author: John Snow Labs +name: bert_multilingual_finetuned_history_ner_sub_ontology +date: 2023-11-08 +tags: [bert, xx, open_source, token_classification, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multilingual_finetuned_history_ner_sub_ontology` is a Multilingual model originally trained by QuanAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multilingual_finetuned_history_ner_sub_ontology_xx_5.2.0_3.0_1699424152414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multilingual_finetuned_history_ner_sub_ontology_xx_5.2.0_3.0_1699424152414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_multilingual_finetuned_history_ner_sub_ontology","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_multilingual_finetuned_history_ner_sub_ontology", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multilingual_finetuned_history_ner_sub_ontology| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/QuanAI/bert-multilingual-finetuned-history-ner-sub-ontology \ No newline at end of file From 5c5c332004ff652fce8b045ca020654e659b255e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:27:34 +0700 Subject: [PATCH 648/667] Add model 2023-11-08-bert_finetuned_ner_accelerate_atajti_en --- ...bert_finetuned_ner_accelerate_atajti_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_atajti_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_atajti_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_atajti_en.md new file mode 100644 index 00000000000000..a09156078a89c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_atajti_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_accelerate_atajti BertForTokenClassification from atajti +author: John Snow Labs +name: bert_finetuned_ner_accelerate_atajti +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_accelerate_atajti` is a English model originally trained by atajti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_accelerate_atajti_en_5.2.0_3.0_1699424845393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_accelerate_atajti_en_5.2.0_3.0_1699424845393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_accelerate_atajti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_accelerate_atajti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_accelerate_atajti| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/atajti/bert-finetuned-ner-accelerate \ No newline at end of file From effbea9838c1d63a89945484475847d23b93282e Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:36:54 +0700 Subject: [PATCH 649/667] Add model 2023-11-08-porttagger_news_base_en --- .../2023-11-08-porttagger_news_base_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-porttagger_news_base_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-porttagger_news_base_en.md b/docs/_posts/ahmedlone127/2023-11-08-porttagger_news_base_en.md new file mode 100644 index 00000000000000..08d3f887d65bcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-porttagger_news_base_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English porttagger_news_base BertForTokenClassification from Emanuel +author: John Snow Labs +name: porttagger_news_base +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`porttagger_news_base` is a English model originally trained by Emanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/porttagger_news_base_en_5.2.0_3.0_1699425406664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/porttagger_news_base_en_5.2.0_3.0_1699425406664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("porttagger_news_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("porttagger_news_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|porttagger_news_base| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/Emanuel/porttagger-news-base \ No newline at end of file From e5120a32c9e748ffc8bfa0aaa4449075508e7738 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:42:41 +0700 Subject: [PATCH 650/667] Add model 2023-11-08-klue_bert_base_ner_kluedata_en --- ...23-11-08-klue_bert_base_ner_kluedata_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-klue_bert_base_ner_kluedata_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-klue_bert_base_ner_kluedata_en.md b/docs/_posts/ahmedlone127/2023-11-08-klue_bert_base_ner_kluedata_en.md new file mode 100644 index 00000000000000..7fc0ad81cb0772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-klue_bert_base_ner_kluedata_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English klue_bert_base_ner_kluedata BertForTokenClassification from datasciathlete +author: John Snow Labs +name: klue_bert_base_ner_kluedata +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`klue_bert_base_ner_kluedata` is a English model originally trained by datasciathlete. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/klue_bert_base_ner_kluedata_en_5.2.0_3.0_1699425754476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/klue_bert_base_ner_kluedata_en_5.2.0_3.0_1699425754476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("klue_bert_base_ner_kluedata","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("klue_bert_base_ner_kluedata", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|klue_bert_base_ner_kluedata| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/datasciathlete/KLUE-BERT-BASE-NER-kluedata \ No newline at end of file From 077e0d2968ffa91d4c3c20d6ecb5184d514804f8 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 13:44:58 +0700 Subject: [PATCH 651/667] Add model 2023-11-08-darija_ner_ar --- .../ahmedlone127/2023-11-08-darija_ner_ar.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-darija_ner_ar.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-darija_ner_ar.md b/docs/_posts/ahmedlone127/2023-11-08-darija_ner_ar.md new file mode 100644 index 00000000000000..d44606f73e1d05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-darija_ner_ar.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Arabic darija_ner BertForTokenClassification from hananour +author: John Snow Labs +name: darija_ner +date: 2023-11-08 +tags: [bert, ar, open_source, token_classification, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`darija_ner` is a Arabic model originally trained by hananour. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/darija_ner_ar_5.2.0_3.0_1699425887867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/darija_ner_ar_5.2.0_3.0_1699425887867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("darija_ner","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("darija_ner", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|darija_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/hananour/darija-ner \ No newline at end of file From ed28433107fdf1c10f57385a5b243149ca3a5818 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:13:20 +0700 Subject: [PATCH 652/667] Add model 2023-11-08-bert_for_job_descr_parsing_en --- ...023-11-08-bert_for_job_descr_parsing_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_for_job_descr_parsing_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_for_job_descr_parsing_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_for_job_descr_parsing_en.md new file mode 100644 index 00000000000000..fa3f0ae65999a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_for_job_descr_parsing_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_for_job_descr_parsing BertForTokenClassification from jfriduss +author: John Snow Labs +name: bert_for_job_descr_parsing +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_for_job_descr_parsing` is a English model originally trained by jfriduss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_for_job_descr_parsing_en_5.2.0_3.0_1699427593459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_for_job_descr_parsing_en_5.2.0_3.0_1699427593459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_for_job_descr_parsing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_for_job_descr_parsing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_for_job_descr_parsing| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|402.2 MB| + +## References + +https://huggingface.co/jfriduss/bert_for_job_descr_parsing \ No newline at end of file From 091e486c267d27b88bc2b9463abb27958f2c99f6 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:19:41 +0700 Subject: [PATCH 653/667] Add model 2023-11-08-rubert_tiny2_finetuned_ner_en --- ...023-11-08-rubert_tiny2_finetuned_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-rubert_tiny2_finetuned_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-rubert_tiny2_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-rubert_tiny2_finetuned_ner_en.md new file mode 100644 index 00000000000000..7b45bacbf50925 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-rubert_tiny2_finetuned_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English rubert_tiny2_finetuned_ner BertForTokenClassification from Evolett +author: John Snow Labs +name: rubert_tiny2_finetuned_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_finetuned_ner` is a English model originally trained by Evolett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_finetuned_ner_en_5.2.0_3.0_1699427978197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_finetuned_ner_en_5.2.0_3.0_1699427978197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("rubert_tiny2_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("rubert_tiny2_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_finetuned_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|109.1 MB| + +## References + +https://huggingface.co/Evolett/rubert-tiny2-finetuned-ner \ No newline at end of file From 96ff4009bd091555b2380e68e9800e50f654f483 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:29:40 +0700 Subject: [PATCH 654/667] Add model 2023-11-08-assignment2_attempt11_en --- .../2023-11-08-assignment2_attempt11_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt11_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt11_en.md b/docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt11_en.md new file mode 100644 index 00000000000000..eef01be57d98b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-assignment2_attempt11_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English assignment2_attempt11 BertForTokenClassification from mpalaval +author: John Snow Labs +name: assignment2_attempt11 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`assignment2_attempt11` is a English model originally trained by mpalaval. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/assignment2_attempt11_en_5.2.0_3.0_1699428569679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/assignment2_attempt11_en_5.2.0_3.0_1699428569679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("assignment2_attempt11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("assignment2_attempt11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|assignment2_attempt11| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mpalaval/assignment2_attempt11 \ No newline at end of file From 3c6375a3eab0c7b4cc08197ed530bd39ff4389ea Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:30:40 +0700 Subject: [PATCH 655/667] Add model 2023-11-08-political_entity_recognizer_en --- ...23-11-08-political_entity_recognizer_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-political_entity_recognizer_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-political_entity_recognizer_en.md b/docs/_posts/ahmedlone127/2023-11-08-political_entity_recognizer_en.md new file mode 100644 index 00000000000000..7e5f3fddc9e755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-political_entity_recognizer_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English political_entity_recognizer BertForTokenClassification from nlplab +author: John Snow Labs +name: political_entity_recognizer +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`political_entity_recognizer` is a English model originally trained by nlplab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/political_entity_recognizer_en_5.2.0_3.0_1699428569605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/political_entity_recognizer_en_5.2.0_3.0_1699428569605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("political_entity_recognizer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("political_entity_recognizer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|political_entity_recognizer| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/nlplab/political-entity-recognizer \ No newline at end of file From 90299351a26677ceb69ab5942dfb90d04ed2086d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:40:15 +0700 Subject: [PATCH 656/667] Add model 2023-11-08-adres_ner_v2_bert_128k_tr --- .../2023-11-08-adres_ner_v2_bert_128k_tr.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-adres_ner_v2_bert_128k_tr.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-adres_ner_v2_bert_128k_tr.md b/docs/_posts/ahmedlone127/2023-11-08-adres_ner_v2_bert_128k_tr.md new file mode 100644 index 00000000000000..5eb6db6fde19ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-adres_ner_v2_bert_128k_tr.md @@ -0,0 +1,93 @@ +--- +layout: model +title: Turkish adres_ner_v2_bert_128k BertForTokenClassification from deprem-ml +author: John Snow Labs +name: adres_ner_v2_bert_128k +date: 2023-11-08 +tags: [bert, tr, open_source, token_classification, onnx] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adres_ner_v2_bert_128k` is a Turkish model originally trained by deprem-ml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adres_ner_v2_bert_128k_tr_5.2.0_3.0_1699429202686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adres_ner_v2_bert_128k_tr_5.2.0_3.0_1699429202686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("adres_ner_v2_bert_128k","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("adres_ner_v2_bert_128k", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adres_ner_v2_bert_128k| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|689.0 MB| + +## References + +https://huggingface.co/deprem-ml/adres_ner_v2_bert_128k \ No newline at end of file From 3d3a326edbbc17f2856633422bceb38f36f9bf5a Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:44:32 +0700 Subject: [PATCH 657/667] Add model 2023-11-08-nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en --- ...nt_tagger_gte_small_l3_ingredient_v2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en.md b/docs/_posts/ahmedlone127/2023-11-08-nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en.md new file mode 100644 index 00000000000000..ba1354c8e2296b --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English nyt_ingredient_tagger_gte_small_l3_ingredient_v2 BertForTokenClassification from napsternxg +author: John Snow Labs +name: nyt_ingredient_tagger_gte_small_l3_ingredient_v2 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nyt_ingredient_tagger_gte_small_l3_ingredient_v2` is a English model originally trained by napsternxg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en_5.2.0_3.0_1699429469946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nyt_ingredient_tagger_gte_small_l3_ingredient_v2_en_5.2.0_3.0_1699429469946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("nyt_ingredient_tagger_gte_small_l3_ingredient_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("nyt_ingredient_tagger_gte_small_l3_ingredient_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nyt_ingredient_tagger_gte_small_l3_ingredient_v2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|64.7 MB| + +## References + +https://huggingface.co/napsternxg/nyt-ingredient-tagger-gte-small-L3-ingredient-v2 \ No newline at end of file From 3fee7a462c9d86eabb5ae04f543a8fa2f3e37c2d Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:45:32 +0700 Subject: [PATCH 658/667] Add model 2023-11-08-bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en --- ...arem_selective_lowc_samoan_first_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en.md new file mode 100644 index 00000000000000..34d314af8445a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner BertForTokenClassification from jordyvl +author: John Snow Labs +name: bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner` is a English model originally trained by jordyvl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en_5.2.0_3.0_1699429469981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner_en_5.2.0_3.0_1699429469981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_harem_selective_lowc_samoan_first_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jordyvl/bert-base-portuguese-cased_harem-selective-lowC-sm-first-ner \ No newline at end of file From 6e765f8f2c2961193a623a61217d398b9464e9f7 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:46:32 +0700 Subject: [PATCH 659/667] Add model 2023-11-08-bert4ner_base_uncased_en --- .../2023-11-08-bert4ner_base_uncased_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert4ner_base_uncased_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert4ner_base_uncased_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert4ner_base_uncased_en.md new file mode 100644 index 00000000000000..4a7200f0817d0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert4ner_base_uncased_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert4ner_base_uncased BertForTokenClassification from shibing624 +author: John Snow Labs +name: bert4ner_base_uncased +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert4ner_base_uncased` is a English model originally trained by shibing624. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert4ner_base_uncased_en_5.2.0_3.0_1699429469899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert4ner_base_uncased_en_5.2.0_3.0_1699429469899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert4ner_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert4ner_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert4ner_base_uncased| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shibing624/bert4ner-base-uncased \ No newline at end of file From a1022976fdb879f29da2856ff1a06d25dd5602ff Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 14:49:52 +0700 Subject: [PATCH 660/667] Add model 2023-11-08-biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en --- ..._proteinstructure_ner_v3_1_pdbeurope_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en.md b/docs/_posts/ahmedlone127/2023-11-08-biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en.md new file mode 100644 index 00000000000000..69b3af490c4bca --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope BertForTokenClassification from PDBEurope +author: John Snow Labs +name: biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope` is a English model originally trained by PDBEurope. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en_5.2.0_3.0_1699429783769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope_en_5.2.0_3.0_1699429783769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomednlp_pubmedbert_proteinstructure_ner_v3_1_pdbeurope| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PDBEurope/BiomedNLP-PubMedBERT-ProteinStructure-NER-v3.1 \ No newline at end of file From e469ede6c688e07731d2760a47c8d7b494483d28 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:07:54 +0700 Subject: [PATCH 661/667] Add model 2023-11-08-jobbert_en --- .../ahmedlone127/2023-11-08-jobbert_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-jobbert_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-jobbert_en.md b/docs/_posts/ahmedlone127/2023-11-08-jobbert_en.md new file mode 100644 index 00000000000000..085fcd7a2df1c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-jobbert_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English jobbert BertForTokenClassification from Andrei95 +author: John Snow Labs +name: jobbert +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobbert` is a English model originally trained by Andrei95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobbert_en_5.2.0_3.0_1699430866914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobbert_en_5.2.0_3.0_1699430866914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("jobbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("jobbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobbert| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Andrei95/jobbert \ No newline at end of file From 0803bf85d66b05373393f47df6b129a448f34b89 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:11:43 +0700 Subject: [PATCH 662/667] Add model 2023-11-08-bert_finetuned_ner_accelerate_loganathanspr_en --- ...netuned_ner_accelerate_loganathanspr_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_loganathanspr_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_loganathanspr_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_loganathanspr_en.md new file mode 100644 index 00000000000000..87a6b0e78664ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_finetuned_ner_accelerate_loganathanspr_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_finetuned_ner_accelerate_loganathanspr BertForTokenClassification from loganathanspr +author: John Snow Labs +name: bert_finetuned_ner_accelerate_loganathanspr +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_accelerate_loganathanspr` is a English model originally trained by loganathanspr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_accelerate_loganathanspr_en_5.2.0_3.0_1699431095812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_accelerate_loganathanspr_en_5.2.0_3.0_1699431095812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_accelerate_loganathanspr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_finetuned_ner_accelerate_loganathanspr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_accelerate_loganathanspr| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/loganathanspr/bert-finetuned-ner-accelerate \ No newline at end of file From 1267849efbc227a48d48ddecaecce5fe6bbc0b40 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:16:54 +0700 Subject: [PATCH 663/667] Add model 2023-11-08-11_711_project_2_en --- .../2023-11-08-11_711_project_2_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-11_711_project_2_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-11_711_project_2_en.md b/docs/_posts/ahmedlone127/2023-11-08-11_711_project_2_en.md new file mode 100644 index 00000000000000..5e34a0960c0fd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-11_711_project_2_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English 11_711_project_2 BertForTokenClassification from yitengm +author: John Snow Labs +name: 11_711_project_2 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`11_711_project_2` is a English model originally trained by yitengm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/11_711_project_2_en_5.2.0_3.0_1699431406724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/11_711_project_2_en_5.2.0_3.0_1699431406724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("11_711_project_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("11_711_project_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|11_711_project_2| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/yitengm/11-711-project-2 \ No newline at end of file From 4e5b4b524c1ada97167ad3c898d615eaf34b0d73 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:47:57 +0700 Subject: [PATCH 664/667] Add model 2023-11-08-tagged_one_100v7_ner_model_3epochs_augmented_en --- ...ne_100v7_ner_model_3epochs_augmented_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-tagged_one_100v7_ner_model_3epochs_augmented_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-tagged_one_100v7_ner_model_3epochs_augmented_en.md b/docs/_posts/ahmedlone127/2023-11-08-tagged_one_100v7_ner_model_3epochs_augmented_en.md new file mode 100644 index 00000000000000..b69dada0dc7ec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-tagged_one_100v7_ner_model_3epochs_augmented_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English tagged_one_100v7_ner_model_3epochs_augmented BertForTokenClassification from DOOGLAK +author: John Snow Labs +name: tagged_one_100v7_ner_model_3epochs_augmented +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tagged_one_100v7_ner_model_3epochs_augmented` is a English model originally trained by DOOGLAK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tagged_one_100v7_ner_model_3epochs_augmented_en_5.2.0_3.0_1699433269132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tagged_one_100v7_ner_model_3epochs_augmented_en_5.2.0_3.0_1699433269132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("tagged_one_100v7_ner_model_3epochs_augmented","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("tagged_one_100v7_ner_model_3epochs_augmented", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tagged_one_100v7_ner_model_3epochs_augmented| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/DOOGLAK/Tagged_One_100v7_NER_Model_3Epochs_AUGMENTED \ No newline at end of file From 3298a9ceb026111ad8c9d51555a54ffd976ffe02 Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:54:34 +0700 Subject: [PATCH 665/667] Add model 2023-11-08-bert_german_ner_en --- .../2023-11-08-bert_german_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-bert_german_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-bert_german_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-bert_german_ner_en.md new file mode 100644 index 00000000000000..1fa29a312b4bfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-bert_german_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English bert_german_ner BertForTokenClassification from lunesco +author: John Snow Labs +name: bert_german_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_german_ner` is a English model originally trained by lunesco. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_german_ner_en_5.2.0_3.0_1699433664820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_german_ner_en_5.2.0_3.0_1699433664820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("bert_german_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("bert_german_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_german_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/lunesco/bert-german-ner \ No newline at end of file From 42c76656747533a292e43b7b0b2064d4f503f48c Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:55:35 +0700 Subject: [PATCH 666/667] Add model 2023-11-08-multibertbestmodeloct11_en --- .../2023-11-08-multibertbestmodeloct11_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-multibertbestmodeloct11_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-multibertbestmodeloct11_en.md b/docs/_posts/ahmedlone127/2023-11-08-multibertbestmodeloct11_en.md new file mode 100644 index 00000000000000..1d8bab19bac297 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-multibertbestmodeloct11_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English multibertbestmodeloct11 BertForTokenClassification from Tommert25 +author: John Snow Labs +name: multibertbestmodeloct11 +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multibertbestmodeloct11` is a English model originally trained by Tommert25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multibertbestmodeloct11_en_5.2.0_3.0_1699433704627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multibertbestmodeloct11_en_5.2.0_3.0_1699433704627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("multibertbestmodeloct11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("multibertbestmodeloct11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multibertbestmodeloct11| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|625.5 MB| + +## References + +https://huggingface.co/Tommert25/MultiBERTBestModelOct11 \ No newline at end of file From 3ef1b011f985dd9c75b6661adfb9b47d2ae4a21f Mon Sep 17 00:00:00 2001 From: ahmedlone127 Date: Wed, 8 Nov 2023 15:56:35 +0700 Subject: [PATCH 667/667] Add model 2023-11-08-v4_combined_ner_en --- .../2023-11-08-v4_combined_ner_en.md | 93 +++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/_posts/ahmedlone127/2023-11-08-v4_combined_ner_en.md diff --git a/docs/_posts/ahmedlone127/2023-11-08-v4_combined_ner_en.md b/docs/_posts/ahmedlone127/2023-11-08-v4_combined_ner_en.md new file mode 100644 index 00000000000000..4c360708a8a913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2023-11-08-v4_combined_ner_en.md @@ -0,0 +1,93 @@ +--- +layout: model +title: English v4_combined_ner BertForTokenClassification from cp500 +author: John Snow Labs +name: v4_combined_ner +date: 2023-11-08 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.2.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`v4_combined_ner` is a English model originally trained by cp500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/v4_combined_ner_en_5.2.0_3.0_1699433727244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/v4_combined_ner_en_5.2.0_3.0_1699433727244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = BertForTokenClassification.pretrained("v4_combined_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) + +``` +```scala + + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = BertForTokenClassification + .pretrained("v4_combined_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) + + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|v4_combined_ner| +|Compatibility:|Spark NLP 5.2.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|690.5 MB| + +## References + +https://huggingface.co/cp500/v4_combined_ner \ No newline at end of file