-
Notifications
You must be signed in to change notification settings - Fork 717
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* 2022-09-19-legre_indemnifications_en (#12758) * Add model 2022-09-19-legre_indemnifications_en * Add model 2022-09-19-legner_bert_indemnifications_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * 2022-09-20-legclf_cuad_confidentiality_clause_en (#12770) * Add model 2022-09-20-legclf_cuad_confidentiality_clause_en * Add model 2022-09-20-legclf_cuad_indemnifications_clause_en * Add model 2022-09-20-legclf_cuad_licenses_clause_en * Add model 2022-09-20-legclf_cuad_obligations_clause_en * Add model 2022-09-20-legclf_cuad_whereas_clause_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_licenses_clause_en (#12827) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_indemnifications_clause_en (#12828) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
- Loading branch information
1 parent
8bf66a2
commit 34e0555
Showing
2 changed files
with
227 additions
and
0 deletions.
There are no files selected for viewing
114 changes: 114 additions & 0 deletions
114
docs/_posts/josejuanmartinez/2022-09-27-legclf_cuad_indemnifications_clause_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
--- | ||
layout: model | ||
title: Legal Indemnifications Clause Binary Classifier | ||
author: John Snow Labs | ||
name: legclf_cuad_indemnifications_clause | ||
date: 2022-09-27 | ||
tags: [cuad, indemnifications, en, licensed] | ||
task: Text Classification | ||
language: en | ||
edition: Spark NLP for Legal 1.0.0 | ||
spark_version: 3.0 | ||
supported: true | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This model is a Binary Classifier (True, False) for the `indemnifications` clause type. To use this model, make sure you provide enough context as an input. Adding Sentence Splitters to the pipeline will make the model see only sentences, not the whole text, so it's better to skip it, unless you want to do Binary Classification as sentence level. | ||
|
||
If you have big legal documents, and you want to look for clauses, we recommend you to split the documents using any of the techniques available in our Spark NLP for Legal Workshop Tokenization & Splitting Tutorial (link [here](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Legal/1.Tokenization_Splitting.ipynb)), namely: | ||
- Paragraph splitting (by multiline); | ||
- Splitting by headers / subheaders; | ||
- etc. | ||
|
||
Take into consideration the embeddings of this model allows up to 512 tokens. If you have more than that, consider splitting in smaller pieces (you can also check the same tutorial link provided above). | ||
|
||
This model can be combined with any of the other 200+ Legal Clauses Classifiers you will find in Models Hub, getting as an output a series of True/False values for each of the legal clause model you have added. | ||
|
||
## Predicted Entities | ||
|
||
`other`, `indemnifications` | ||
|
||
{:.btn-box} | ||
[Live Demo](https://demo.johnsnowlabs.com/finance/CLASSIFY_LEGAL_CLAUSES/){:.button.button-orange} | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legclf_cuad_indemnifications_clause_en_1.0.0_3.0_1664272531526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
documentAssembler = DocumentAssembler() \ | ||
.setInputCol("clause_text") \ | ||
.setOutputCol("document") | ||
|
||
embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en") \ | ||
.setInputCols("document") \ | ||
.setOutputCol("sentence_embeddings") | ||
|
||
docClassifier = legal.ClassifierDLModel.pretrained("legclf_cuad_indemnifications_clause", "en", "legal/models")\ | ||
.setInputCols(["sentence_embeddings"])\ | ||
.setOutputCol("category") | ||
|
||
nlpPipeline = Pipeline(stages=[ | ||
documentAssembler, | ||
embeddings, | ||
docClassifier]) | ||
|
||
df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("clause_text") | ||
model = nlpPipeline.fit(df) | ||
result = model.transform(df) | ||
``` | ||
|
||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
+-------+ | ||
| result| | ||
+-------+ | ||
|[indemnifications]| | ||
|[other]| | ||
|[other]| | ||
|[indemnifications]| | ||
|
||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|legclf_cuad_indemnifications_clause| | ||
|Type:|legal| | ||
|Compatibility:|Spark NLP for Legal 1.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Input Labels:|[sentence_embeddings]| | ||
|Output Labels:|[category]| | ||
|Language:|en| | ||
|Size:|21.9 MB| | ||
|
||
## References | ||
|
||
In-house annotations on CUAD dataset | ||
|
||
## Benchmarking | ||
|
||
```bash | ||
precision recall f1-score support | ||
|
||
indemnifications 1.00 0.83 0.91 12 | ||
other 0.83 1.00 0.91 10 | ||
|
||
accuracy 0.91 22 | ||
macro avg 0.92 0.92 0.91 22 | ||
weighted avg 0.92 0.91 0.91 22 | ||
``` |
113 changes: 113 additions & 0 deletions
113
docs/_posts/josejuanmartinez/2022-09-27-legclf_cuad_licenses_clause_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
--- | ||
layout: model | ||
title: Legal Licenses Clause Binary Classifier | ||
author: John Snow Labs | ||
name: legclf_cuad_licenses_clause | ||
date: 2022-09-27 | ||
tags: [en, licensed] | ||
task: Text Classification | ||
language: en | ||
edition: Spark NLP for Legal 1.0.0 | ||
spark_version: 3.0 | ||
supported: true | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This model is a Binary Classifier (True, False) for the `licenses` clause type. To use this model, make sure you provide enough context as an input. Adding Sentence Splitters to the pipeline will make the model see only sentences, not the whole text, so it's better to skip it, unless you want to do Binary Classification as sentence level. | ||
|
||
If you have big legal documents, and you want to look for clauses, we recommend you to split the documents using any of the techniques available in our Spark NLP for Legal Workshop Tokenization & Splitting Tutorial (link [here](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Legal/1.Tokenization_Splitting.ipynb)), namely: | ||
- Paragraph splitting (by multiline); | ||
- Splitting by headers / subheaders; | ||
- etc. | ||
|
||
Take into consideration the embeddings of this model allows up to 512 tokens. If you have more than that, consider splitting in smaller pieces (you can also check the same tutorial link provided above). | ||
|
||
This model can be combined with any of the other 200+ Legal Clauses Classifiers you will find in Models Hub, getting as an output a series of True/False values for each of the legal clause model you have added. | ||
|
||
## Predicted Entities | ||
|
||
`other`, `licenses` | ||
|
||
{:.btn-box} | ||
[Live Demo](https://demo.johnsnowlabs.com/finance/CLASSIFY_LEGAL_CLAUSES/){:.button.button-orange} | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legclf_cuad_licenses_clause_en_1.0.0_3.0_1664272270378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
documentAssembler = DocumentAssembler() \ | ||
.setInputCol("clause_text") \ | ||
.setOutputCol("document") | ||
|
||
embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en") \ | ||
.setInputCols("document") \ | ||
.setOutputCol("sentence_embeddings") | ||
|
||
docClassifier = legal.ClassifierDLModel.pretrained("legclf_cuad_licenses_clause", "en", "legal/models")\ | ||
.setInputCols(["sentence_embeddings"])\ | ||
.setOutputCol("category") | ||
|
||
nlpPipeline = Pipeline(stages=[ | ||
documentAssembler, | ||
embeddings, | ||
docClassifier]) | ||
|
||
df = spark.createDataFrame([["YOUR TEXT HERE"]]).toDF("clause_text") | ||
model = nlpPipeline.fit(df) | ||
result = model.transform(df) | ||
``` | ||
|
||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
+-------+ | ||
| result| | ||
+-------+ | ||
|[licenses]| | ||
|[other]| | ||
|[other]| | ||
|[licenses]| | ||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|legclf_cuad_licenses_clause| | ||
|Type:|legal| | ||
|Compatibility:|Spark NLP for Legal 1.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Input Labels:|[sentence_embeddings]| | ||
|Output Labels:|[category]| | ||
|Language:|en| | ||
|Size:|22.0 MB| | ||
|
||
## References | ||
|
||
In-house annotations on CUAD dataset | ||
|
||
## Benchmarking | ||
|
||
```bash | ||
precision recall f1-score support | ||
|
||
licenses 1.00 0.60 0.75 10 | ||
other 0.84 1.00 0.91 21 | ||
|
||
accuracy 0.87 31 | ||
macro avg 0.92 0.80 0.83 31 | ||
weighted avg 0.89 0.87 0.86 31 | ||
``` |