diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_base_ner_demo_mn.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_base_ner_demo_mn.md new file mode 100644 index 00000000000000..6969d699d5524f --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_base_ner_demo_mn.md @@ -0,0 +1,98 @@ +--- +layout: model +title: Mongolian RobertaForTokenClassification Base Cased model (from onon214) +author: John Snow Labs +name: roberta_token_classifier_base_ner_demo +date: 2023-03-01 +tags: [mn, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: mn +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `roberta-base-ner-demo` is a Mongolian model originally trained by `onon214`. + +## Predicted Entities + +`MISC`, `LOC`, `PER`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_base_ner_demo_mn_4.3.0_3.0_1677703536380.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_base_ner_demo_mn_4.3.0_3.0_1677703536380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_base_ner_demo","mn") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_base_ner_demo","mn") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_base_ner_demo| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|466.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/onon214/roberta-base-ner-demo \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_bertin_base_ner_conll2002_es.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_bertin_base_ner_conll2002_es.md new file mode 100644 index 00000000000000..9d7c2ee82acf6f --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_bertin_base_ner_conll2002_es.md @@ -0,0 +1,98 @@ +--- +layout: model +title: Spanish RobertaForTokenClassification Base Cased model (from bertin-project) +author: John Snow Labs +name: roberta_token_classifier_bertin_base_ner_conll2002 +date: 2023-03-01 +tags: [es, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: es +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bertin-base-ner-conll2002-es` is a Spanish model originally trained by `bertin-project`. + +## Predicted Entities + +`MISC`, `LOC`, `PER`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_bertin_base_ner_conll2002_es_4.3.0_3.0_1677703750308.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_bertin_base_ner_conll2002_es_4.3.0_3.0_1677703750308.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_bertin_base_ner_conll2002","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_bertin_base_ner_conll2002","es") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_bertin_base_ner_conll2002| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|426.2 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/bertin-project/bertin-base-ner-conll2002-es \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_bertin_base_pos_conll2002_es.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_bertin_base_pos_conll2002_es.md new file mode 100644 index 00000000000000..89d0b248c61bae --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_bertin_base_pos_conll2002_es.md @@ -0,0 +1,98 @@ +--- +layout: model +title: Spanish RobertaForTokenClassification Base Cased model (from bertin-project) +author: John Snow Labs +name: roberta_token_classifier_bertin_base_pos_conll2002 +date: 2023-03-01 +tags: [es, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: es +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `bertin-base-pos-conll2002-es` is a Spanish model originally trained by `bertin-project`. + +## Predicted Entities + +`DA`, `VAM`, `I`, `VSM`, `PP`, `VSS`, `DI`, `AQ`, `Y`, `VMN`, `Fit`, `Fg`, `Fia`, `Fpa`, `Fat`, `VSN`, `Fpt`, `DD`, `VAP`, `SP`, `NP`, `Fh`, `VAI`, `CC`, `Fd`, `VMG`, `NC`, `PX`, `DE`, `Fz`, `PN`, `Fx`, `Faa`, `Fs`, `Fe`, `VSP`, `DP`, `VAS`, `VSG`, `PT`, `Ft`, `VAN`, `PI`, `P0`, `RG`, `RN`, `CS`, `DN`, `VMI`, `Fp`, `Fc`, `PR`, `VSI`, `AO`, `VMM`, `PD`, `VMS`, `DT`, `Z`, `VMP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_bertin_base_pos_conll2002_es_4.3.0_3.0_1677703697571.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_bertin_base_pos_conll2002_es_4.3.0_3.0_1677703697571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_bertin_base_pos_conll2002","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_bertin_base_pos_conll2002","es") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_bertin_base_pos_conll2002| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|426.4 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/bertin-project/bertin-base-pos-conll2002-es \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_fullstop_catalan_punctuation_prediction_ca.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_fullstop_catalan_punctuation_prediction_ca.md new file mode 100644 index 00000000000000..8fad7e95812988 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_fullstop_catalan_punctuation_prediction_ca.md @@ -0,0 +1,99 @@ +--- +layout: model +title: Catalan RobertaForTokenClassification Cased model (from softcatala) +author: John Snow Labs +name: roberta_token_classifier_fullstop_catalan_punctuation_prediction +date: 2023-03-01 +tags: [ca, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: ca +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `fullstop-catalan-punctuation-prediction` is a Catalan model originally trained by `softcatala`. + +## Predicted Entities + +`.`, `?`, `-`, `:`, `,`, `0` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_fullstop_catalan_punctuation_prediction_ca_4.3.0_3.0_1677703587592.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_fullstop_catalan_punctuation_prediction_ca_4.3.0_3.0_1677703587592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_fullstop_catalan_punctuation_prediction","ca") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_fullstop_catalan_punctuation_prediction","ca") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_fullstop_catalan_punctuation_prediction| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ca| +|Size:|457.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/softcatala/fullstop-catalan-punctuation-prediction +- https://github.com/oliverguhr/fullstop-deep-punctuation-prediction \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_large_ontonotes5_la.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_large_ontonotes5_la.md new file mode 100644 index 00000000000000..ec808727ef17d0 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_large_ontonotes5_la.md @@ -0,0 +1,102 @@ +--- +layout: model +title: Latin RobertaForTokenClassification Large Cased model (from tner) +author: John Snow Labs +name: roberta_token_classifier_large_ontonotes5 +date: 2023-03-01 +tags: [la, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: la +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `roberta-large-ontonotes5` is a Latin model originally trained by `tner`. + +## Predicted Entities + +`NORP`, `FAC`, `QUANTITY`, `LOC`, `EVENT`, `CARDINAL`, `LANGUAGE`, `GPE`, `ORG`, `TIME`, `PERSON`, `WORK_OF_ART`, `DATE`, `PRODUCT`, `PERCENT`, `LAW`, `ORDINAL`, `MONEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_large_ontonotes5_la_4.3.0_3.0_1677703467254.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_large_ontonotes5_la_4.3.0_3.0_1677703467254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_large_ontonotes5","la") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_large_ontonotes5","la") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_large_ontonotes5| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|la| +|Size:|1.3 GB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/tner/roberta-large-ontonotes5 +- https://github.com/asahi417/tner +- https://github.com/asahi417/tner +- https://aclanthology.org/2021.eacl-demos.7/ +- https://paperswithcode.com/sota?task=Token+Classification&dataset=tner%2Fontonotes5 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_slovakbert_ner_sk.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_slovakbert_ner_sk.md new file mode 100644 index 00000000000000..5248488114fb7c --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_slovakbert_ner_sk.md @@ -0,0 +1,99 @@ +--- +layout: model +title: Slovak RobertaForTokenClassification Cased model (from crabz) +author: John Snow Labs +name: roberta_token_classifier_slovakbert_ner +date: 2023-03-01 +tags: [sk, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: sk +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `slovakbert-ner` is a Slovak model originally trained by `crabz`. + +## Predicted Entities + +`4`, `2`, `6`, `1`, `0`, `5`, `3` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_slovakbert_ner_sk_4.3.0_3.0_1677703644531.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_slovakbert_ner_sk_4.3.0_3.0_1677703644531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_slovakbert_ner","sk") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_slovakbert_ner","sk") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_slovakbert_ner| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sk| +|Size:|439.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/crabz/slovakbert-ner +- https://paperswithcode.com/sota?task=Token+Classification&dataset=wikiann \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_ticker_en.md b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_ticker_en.md new file mode 100644 index 00000000000000..08f33aa7298bf4 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-01-roberta_token_classifier_ticker_en.md @@ -0,0 +1,99 @@ +--- +layout: model +title: English RobertaForTokenClassification Cased model (from Jean-Baptiste) +author: John Snow Labs +name: roberta_token_classifier_ticker +date: 2023-03-01 +tags: [en, open_source, roberta, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `roberta-ticker` is a English model originally trained by `Jean-Baptiste`. + +## Predicted Entities + +`TICKER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_ticker_en_4.3.0_3.0_1677703811345.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_token_classifier_ticker_en_4.3.0_3.0_1677703811345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_ticker","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RobertaForTokenClassification.pretrained("roberta_token_classifier_ticker","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_token_classifier_ticker| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|465.3 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Jean-Baptiste/roberta-ticker +- https://www.kaggle.com/omermetinn/tweets-about-the-top-companies-from-2015-to-2020 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_tok_classifier_typo_detector_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_tok_classifier_typo_detector_en.md new file mode 100644 index 00000000000000..75f60372aea163 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_tok_classifier_typo_detector_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from m3hrdadfi) +author: John Snow Labs +name: distilbert_tok_classifier_typo_detector +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `typo-detector-distilbert-en` is a English model originally trained by `m3hrdadfi`. + +## Predicted Entities + +`TYPO` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_tok_classifier_typo_detector_en_4.3.1_3.0_1677881945749.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_tok_classifier_typo_detector_en_4.3.1_3.0_1677881945749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_tok_classifier_typo_detector","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_tok_classifier_typo_detector","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_tok_classifier_typo_detector| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/m3hrdadfi/typo-detector-distilbert-en +- https://github.com/neuspell/neuspell +- https://github.com/m3hrdadfi/typo-detector/issues \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_all_903429540_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_all_903429540_en.md new file mode 100644 index 00000000000000..6e1d616d4da9c4 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_all_903429540_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_company_all_903429540 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-company_all-903429540` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`Company`, `OOV` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_company_all_903429540_en_4.3.1_3.0_1677881697961.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_company_all_903429540_en_4.3.1_3.0_1677881697961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_company_all_903429540","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_company_all_903429540","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_company_all_903429540| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-company_all-903429540 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_all_903429548_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_all_903429548_en.md new file mode 100644 index 00000000000000..e94525a4e80546 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_all_903429548_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_company_all_903429548 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-company_all-903429548` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`Company`, `OOV` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_company_all_903429548_en_4.3.0_3.0_1677881042384.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_company_all_903429548_en_4.3.0_3.0_1677881042384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_company_all_903429548","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_company_all_903429548","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_company_all_903429548| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-company_all-903429548 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_vs_all_902129475_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_vs_all_902129475_en.md new file mode 100644 index 00000000000000..b77016890e8dc0 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_company_vs_all_902129475_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_company_vs_all_902129475 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-company_vs_all-902129475` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`Company`, `OOV` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_company_vs_all_902129475_en_4.3.1_3.0_1677881724840.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_company_vs_all_902129475_en_4.3.1_3.0_1677881724840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_company_vs_all_902129475","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_company_vs_all_902129475","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_company_vs_all_902129475| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-company_vs_all-902129475 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824209_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824209_en.md new file mode 100644 index 00000000000000..c50a7878f1ef57 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824209_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from Lucifermorningstar011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_final_784824209 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-final-784824209` is a English model originally trained by `Lucifermorningstar011`. + +## Predicted Entities + +`9`, `0` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_final_784824209_en_4.3.1_3.0_1677881842322.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_final_784824209_en_4.3.1_3.0_1677881842322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_final_784824209","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_final_784824209","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_final_784824209| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Lucifermorningstar011/autotrain-final-784824209 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824211_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824211_en.md new file mode 100644 index 00000000000000..0b105e1eeffb7a --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824211_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from Lucifermorningstar011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_final_784824211 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-final-784824211` is a English model originally trained by `Lucifermorningstar011`. + +## Predicted Entities + +`9`, `0` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_final_784824211_en_4.3.1_3.0_1677881778499.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_final_784824211_en_4.3.1_3.0_1677881778499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_final_784824211","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_final_784824211","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_final_784824211| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Lucifermorningstar011/autotrain-final-784824211 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824218_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824218_en.md new file mode 100644 index 00000000000000..bdadf57ff1ce35 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_final_784824218_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from Lucifermorningstar011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_final_784824218 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-final-784824218` is a English model originally trained by `Lucifermorningstar011`. + +## Predicted Entities + +`9`, `0` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_final_784824218_en_4.3.1_3.0_1677881805603.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_final_784824218_en_4.3.1_3.0_1677881805603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_final_784824218","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_final_784824218","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_final_784824218| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Lucifermorningstar011/autotrain-final-784824218 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_job_all_903929564_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_job_all_903929564_en.md new file mode 100644 index 00000000000000..3479e60f58ecaa --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_job_all_903929564_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_job_all_903929564 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-job_all-903929564` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`Job`, `OOV` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_job_all_903929564_en_4.3.0_3.0_1677880986885.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_job_all_903929564_en_4.3.0_3.0_1677880986885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_job_all_903929564","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_job_all_903929564","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_job_all_903929564| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-job_all-903929564 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_all_904029569_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_all_904029569_en.md new file mode 100644 index 00000000000000..5b7fb68fbd4e4b --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_all_904029569_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_name_all_904029569 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-name_all-904029569` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`OOV`, `Name` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_name_all_904029569_en_4.3.1_3.0_1677881644846.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_name_all_904029569_en_4.3.1_3.0_1677881644846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_name_all_904029569","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_name_all_904029569","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_name_all_904029569| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-name_all-904029569 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_all_904029577_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_all_904029577_en.md new file mode 100644 index 00000000000000..64e3cd3aeebc58 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_all_904029577_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_name_all_904029577 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-name_all-904029577` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`OOV`, `Name` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_name_all_904029577_en_4.3.1_3.0_1677881580824.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_name_all_904029577_en_4.3.1_3.0_1677881580824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_name_all_904029577","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_name_all_904029577","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_name_all_904029577| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-name_all-904029577 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_vsv_all_901529445_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_vsv_all_901529445_en.md new file mode 100644 index 00000000000000..2bebe97cdbe4d7 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_name_vsv_all_901529445_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ismail-lucifer011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_name_vsv_all_901529445 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-name_vsv_all-901529445` is a English model originally trained by `ismail-lucifer011`. + +## Predicted Entities + +`OOV`, `Name` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_name_vsv_all_901529445_en_4.3.1_3.0_1677881751372.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_name_vsv_all_901529445_en_4.3.1_3.0_1677881751372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_name_vsv_all_901529445","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_name_vsv_all_901529445","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_name_vsv_all_901529445| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.5 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ismail-lucifer011/autotrain-name_vsv_all-901529445 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_ner_778023879_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_ner_778023879_en.md new file mode 100644 index 00000000000000..ecc8b6342a454d --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_autotrain_ner_778023879_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from Lucifermorningstar011) +author: John Snow Labs +name: distilbert_token_classifier_autotrain_ner_778023879 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `autotrain-ner-778023879` is a English model originally trained by `Lucifermorningstar011`. + +## Predicted Entities + +`9`, `0` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_ner_778023879_en_4.3.1_3.0_1677881870073.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_autotrain_ner_778023879_en_4.3.1_3.0_1677881870073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_ner_778023879","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_autotrain_ner_778023879","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_autotrain_ner_778023879| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.1 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Lucifermorningstar011/autotrain-ner-778023879 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_ner_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_ner_en.md new file mode 100644 index 00000000000000..9bc15b04ba6f41 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_ner_en.md @@ -0,0 +1,99 @@ +--- +layout: model +title: English DistilBertForTokenClassification Base Cased model (from 51la5) +author: John Snow Labs +name: distilbert_token_classifier_base_ner +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `distilbert-base-NER` is a English model originally trained by `51la5`. + +## Predicted Entities + +`LOC`, `ORG`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_base_ner_en_4.3.0_3.0_1677881358175.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_base_ner_en_4.3.0_3.0_1677881358175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_base_ner","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_base_ner","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_base_ner| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/51la5/distilbert-base-NER +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_uncased_finetuned_conll2003_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_uncased_finetuned_conll2003_en.md new file mode 100644 index 00000000000000..c912201b136a3a --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_uncased_finetuned_conll2003_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Base Uncased model (from Datasaur) +author: John Snow Labs +name: distilbert_token_classifier_base_uncased_finetuned_conll2003 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `distilbert-base-uncased-finetuned-conll2003` is a English model originally trained by `Datasaur`. + +## Predicted Entities + +`LOC`, `ORG`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_base_uncased_finetuned_conll2003_en_4.3.1_3.0_1677881552803.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_base_uncased_finetuned_conll2003_en_4.3.1_3.0_1677881552803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_base_uncased_finetuned_conll2003","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_base_uncased_finetuned_conll2003","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_base_uncased_finetuned_conll2003| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Datasaur/distilbert-base-uncased-finetuned-conll2003 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_uncased_ft_conll2003_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_uncased_ft_conll2003_en.md new file mode 100644 index 00000000000000..7a5d56c6c5e512 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_base_uncased_ft_conll2003_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English DistilBertForTokenClassification Base Uncased model (from sarahmiller137) +author: John Snow Labs +name: distilbert_token_classifier_base_uncased_ft_conll2003 +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `distilbert-base-uncased-ft-conll2003` is a English model originally trained by `sarahmiller137`. + +## Predicted Entities + +`LOC`, `ORG`, `PER`, `MISC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_base_uncased_ft_conll2003_en_4.3.0_3.0_1677881411947.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_base_uncased_ft_conll2003_en_4.3.0_3.0_1677881411947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_base_uncased_ft_conll2003","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_base_uncased_ft_conll2003","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_base_uncased_ft_conll2003| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/sarahmiller137/distilbert-base-uncased-ft-conll2003 +- https://aclanthology.org/W03-0419 +- https://paperswithcode.com/sota?task=Token+Classification&dataset=conll2003 \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_cpener_test_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_cpener_test_en.md new file mode 100644 index 00000000000000..f5c4c03dfdb616 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_cpener_test_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from Neurona) +author: John Snow Labs +name: distilbert_token_classifier_cpener_test +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `cpener-test` is a English model originally trained by `Neurona`. + +## Predicted Entities + +`cpe_version`, `cpe_product`, `cpe_vendor` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_cpener_test_en_4.3.0_3.0_1677881384855.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_cpener_test_en_4.3.0_3.0_1677881384855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_cpener_test","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_cpener_test","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_cpener_test| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/Neurona/cpener-test \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_icelandic_ner_distilbert_is.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_icelandic_ner_distilbert_is.md new file mode 100644 index 00000000000000..58d0659def5f19 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_icelandic_ner_distilbert_is.md @@ -0,0 +1,101 @@ +--- +layout: model +title: Icelandic DistilBertForTokenClassification Cased model (from m3hrdadfi) +author: John Snow Labs +name: distilbert_token_classifier_icelandic_ner_distilbert +date: 2023-03-03 +tags: [is, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: is +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `icelandic-ner-distilbert` is a Icelandic model originally trained by `m3hrdadfi`. + +## Predicted Entities + +`Money`, `Date`, `Time`, `Percent`, `Miscellaneous`, `Location`, `Person`, `Organization` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_icelandic_ner_distilbert_is_4.3.1_3.0_1677881983874.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_icelandic_ner_distilbert_is_4.3.1_3.0_1677881983874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_icelandic_ner_distilbert","is") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_icelandic_ner_distilbert","is") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_icelandic_ner_distilbert| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|is| +|Size:|505.8 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/m3hrdadfi/icelandic-ner-distilbert +- http://hdl.handle.net/20.500.12537/42 +- https://en.ru.is/ +- https://github.com/m3hrdadfi/icelandic-ner/issues \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_inspec_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_inspec_en.md new file mode 100644 index 00000000000000..c685b655bfaf1d --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_inspec_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ml6team) +author: John Snow Labs +name: distilbert_token_classifier_keyphrase_extraction_inspec +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyphrase-extraction-distilbert-inspec` is a English model originally trained by `ml6team`. + +## Predicted Entities + +`KEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_keyphrase_extraction_inspec_en_4.3.0_3.0_1677880864789.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_keyphrase_extraction_inspec_en_4.3.0_3.0_1677880864789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_keyphrase_extraction_inspec","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_keyphrase_extraction_inspec","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_keyphrase_extraction_inspec| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ml6team/keyphrase-extraction-distilbert-inspec +- https://dl.acm.org/doi/10.3115/1119355.1119383 +- https://paperswithcode.com/sota?task=Keyphrase+Extraction&dataset=inspec \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_kptimes_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_kptimes_en.md new file mode 100644 index 00000000000000..71c4413072f19d --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_kptimes_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ml6team) +author: John Snow Labs +name: distilbert_token_classifier_keyphrase_extraction_kptimes +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyphrase-extraction-distilbert-kptimes` is a English model originally trained by `ml6team`. + +## Predicted Entities + +`KEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_keyphrase_extraction_kptimes_en_4.3.1_3.0_1677881468043.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_keyphrase_extraction_kptimes_en_4.3.1_3.0_1677881468043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_keyphrase_extraction_kptimes","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_keyphrase_extraction_kptimes","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_keyphrase_extraction_kptimes| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ml6team/keyphrase-extraction-distilbert-kptimes +- https://arxiv.org/abs/1911.12559 +- https://paperswithcode.com/sota?task=Keyphrase+Extraction&dataset=kptimes \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_openkp_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_openkp_en.md new file mode 100644 index 00000000000000..b574553191b647 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_keyphrase_extraction_openkp_en.md @@ -0,0 +1,101 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from ml6team) +author: John Snow Labs +name: distilbert_token_classifier_keyphrase_extraction_openkp +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `keyphrase-extraction-distilbert-openkp` is a English model originally trained by `ml6team`. + +## Predicted Entities + +`KEY` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_keyphrase_extraction_openkp_en_4.3.0_3.0_1677880905122.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_keyphrase_extraction_openkp_en_4.3.0_3.0_1677880905122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_keyphrase_extraction_openkp","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_keyphrase_extraction_openkp","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_keyphrase_extraction_openkp| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/ml6team/keyphrase-extraction-distilbert-openkp +- https://github.com/microsoft/OpenKP +- https://arxiv.org/abs/1911.02671 +- https://paperswithcode.com/sota?task=Keyphrase+Extraction&dataset=openkp \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_ner_roles_openapi_en.md b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_ner_roles_openapi_en.md new file mode 100644 index 00000000000000..afdf917e9c0d35 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-distilbert_token_classifier_ner_roles_openapi_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English DistilBertForTokenClassification Cased model (from f2io) +author: John Snow Labs +name: distilbert_token_classifier_ner_roles_openapi +date: 2023-03-03 +tags: [en, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.3.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `ner-roles-openapi` is a English model originally trained by `f2io`. + +## Predicted Entities + +``, `LOC`, `OR`, `PRG`, `ROLE`, `ORG`, `PER`, `ENTITY`, `MISC`, `ACTION` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_ner_roles_openapi_en_4.3.0_3.0_1677881330949.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_token_classifier_ner_roles_openapi_en_4.3.0_3.0_1677881330949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_ner_roles_openapi","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_token_classifier_ner_roles_openapi","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_token_classifier_ner_roles_openapi| +|Compatibility:|Spark NLP 4.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/f2io/ner-roles-openapi \ No newline at end of file diff --git a/docs/_posts/gokhanturer/2023-03-03-dtilbert_token_classifier_typo_detector_is.md b/docs/_posts/gokhanturer/2023-03-03-dtilbert_token_classifier_typo_detector_is.md new file mode 100644 index 00000000000000..65cdafd31ad780 --- /dev/null +++ b/docs/_posts/gokhanturer/2023-03-03-dtilbert_token_classifier_typo_detector_is.md @@ -0,0 +1,99 @@ +--- +layout: model +title: Icelandic DistilBertForTokenClassification Cased model (from m3hrdadfi) +author: John Snow Labs +name: dtilbert_token_classifier_typo_detector +date: 2023-03-03 +tags: [is, open_source, distilbert, token_classification, ner, tensorflow] +task: Named Entity Recognition +language: is +edition: Spark NLP 4.3.1 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `typo-detector-distilbert-is` is a Icelandic model originally trained by `m3hrdadfi`. + +## Predicted Entities + +`TYPO` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dtilbert_token_classifier_typo_detector_is_4.3.1_3.0_1677881909024.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dtilbert_token_classifier_typo_detector_is_4.3.1_3.0_1677881909024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCols(["text"]) \ + .setOutputCols("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +tokenClassifier = DistilBertForTokenClassification.pretrained("dtilbert_token_classifier_typo_detector","is") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("dtilbert_token_classifier_typo_detector","is") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dtilbert_token_classifier_typo_detector| +|Compatibility:|Spark NLP 4.3.1+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|is| +|Size:|505.7 MB| +|Case sensitive:|true| +|Max sentence length:|128| + +## References + +- https://huggingface.co/m3hrdadfi/typo-detector-distilbert-is +- https://github.com/m3hrdadfi/typo-detector/issues \ No newline at end of file