-
Notifications
You must be signed in to change notification settings - Fork 717
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* 2022-09-19-legre_indemnifications_en (#12758) * Add model 2022-09-19-legre_indemnifications_en * Add model 2022-09-19-legner_bert_indemnifications_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * 2022-09-20-legclf_cuad_confidentiality_clause_en (#12770) * Add model 2022-09-20-legclf_cuad_confidentiality_clause_en * Add model 2022-09-20-legclf_cuad_indemnifications_clause_en * Add model 2022-09-20-legclf_cuad_licenses_clause_en * Add model 2022-09-20-legclf_cuad_obligations_clause_en * Add model 2022-09-20-legclf_cuad_whereas_clause_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_licenses_clause_en (#12827) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_indemnifications_clause_en (#12828) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legner_bert_indemnifications_en (#12831) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legassertion_time_en (#12832) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
- Loading branch information
1 parent
5d3024a
commit 9bd64ae
Showing
2 changed files
with
258 additions
and
0 deletions.
There are no files selected for viewing
114 changes: 114 additions & 0 deletions
114
docs/_posts/josejuanmartinez/2022-09-27-legassertion_time_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
--- | ||
layout: model | ||
title: Temporality / Certainty Assertion Status | ||
author: John Snow Labs | ||
name: legassertion_time | ||
date: 2022-09-27 | ||
tags: [en, licensed] | ||
task: Assertion Status | ||
language: en | ||
edition: Spark NLP for Legal 1.0.0 | ||
spark_version: 3.0 | ||
supported: true | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This is an Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your legal documents | ||
|
||
## Predicted Entities | ||
|
||
`PRESENT`, `PAST`, `FUTURE`, `POSSIBLE` | ||
|
||
{:.btn-box} | ||
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGASSERTION_TEMPORALITY){:.button.button-orange} | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legassertion_time_en_1.0.0_3.0_1664274039847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
# YOUR NER HERE | ||
# ... | ||
embeddings = BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \ | ||
.setInputCols(["sentence", "token"]) \ | ||
.setOutputCol("embeddings") | ||
|
||
chunk_converter = ChunkConverter() \ | ||
.setInputCols(["entity"]) \ | ||
.setOutputCol("ner_chunk") | ||
|
||
assertion = leg.AssertionDLModel.pretrained("legassertion_time", "en", "legal/models")\ | ||
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \ | ||
.setOutputCol("assertion") | ||
|
||
nlpPipeline = Pipeline(stages=[ | ||
documentAssembler, | ||
tokenizer, | ||
embeddings, | ||
ner, | ||
chunk_converter, | ||
assertion | ||
]) | ||
|
||
empty_data = spark.createDataFrame([[""]]).toDF("text") | ||
|
||
model = nlpPipeline.fit(empty_data) | ||
|
||
lp = LightPipeline(model) | ||
|
||
texts = ["The subsidiaries of Atlantic Inc will participate in a merging operation", | ||
"The Conditions and Warranties of this agreement might be modified"] | ||
|
||
lp.annotate(texts) | ||
``` | ||
|
||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
chunk,begin,end,entity_type,assertion | ||
Atlantic Inc,20,31,ORG,FUTURE | ||
|
||
chunk,begin,end,entity_type,assertion | ||
Conditions and Warranties,4,28,DOC,POSSIBLE | ||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|legassertion_time| | ||
|Compatibility:|Spark NLP for Legal 1.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Input Labels:|[document, doc_chunk, embeddings]| | ||
|Output Labels:|[assertion]| | ||
|Language:|en| | ||
|Size:|2.2 MB| | ||
|
||
## References | ||
|
||
In-house annotations on financial and legal corpora | ||
|
||
## Benchmarking | ||
|
||
```bash | ||
label tp fp fn prec rec f1 | ||
PRESENT 201 11 16 0.9481132 0.92626727 0.937063 | ||
POSSIBLE 171 3 6 0.98275864 0.9661017 0.9743589 | ||
FUTURE 119 6 4 0.952 0.96747965 0.95967746 | ||
PAST 270 16 10 0.9440559 0.96428573 0.9540636 | ||
tp: 761 fp: 36 fn: 36 labels: 4 | ||
Macro-average prec: 0.9567319, rec: 0.9560336, f1: 0.95638263 | ||
Micro-average prec: 0.9548306, rec: 0.9548306, f1: 0.9548306 | ||
``` |
144 changes: 144 additions & 0 deletions
144
docs/_posts/josejuanmartinez/2022-09-27-legner_bert_indemnifications_en.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
--- | ||
layout: model | ||
title: Legal Indemnification NER (Bert, base) | ||
author: John Snow Labs | ||
name: legner_bert_indemnifications | ||
date: 2022-09-27 | ||
tags: [indemnifications, en, licensed] | ||
task: Named Entity Recognition | ||
language: en | ||
edition: Spark NLP for Legal 1.0.0 | ||
spark_version: 3.0 | ||
supported: true | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Indemnification clauses. | ||
|
||
## Predicted Entities | ||
|
||
`INDEMNIFICATION`, `INDEMNIFICATION_SUBJECT`, `INDEMNIFICATION_ACTION`, `INDEMNIFICATION_INDIRECT_OBJECT` | ||
|
||
{:.btn-box} | ||
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange} | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_bert_indemnifications_en_1.0.0_3.0_1664273651991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
```python | ||
documentAssembler = DocumentAssembler()\ | ||
.setInputCol("text")\ | ||
.setOutputCol("document") | ||
|
||
sentencizer = SentenceDetectorDLModel\ | ||
.pretrained("sentence_detector_dl", "en") \ | ||
.setInputCols(["document"])\ | ||
.setOutputCol("sentence") | ||
|
||
tokenizer = Tokenizer()\ | ||
.setInputCols(["sentence"])\ | ||
.setOutputCol("token") | ||
|
||
tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\ | ||
.setInputCols("token", "sentence")\ | ||
.setOutputCol("label")\ | ||
.setCaseSensitive(True) | ||
|
||
ner_converter = NerConverter()\ | ||
.setInputCols(["sentence","token","label"])\ | ||
.setOutputCol("ner_chunk") | ||
|
||
nlpPipeline = Pipeline(stages=[ | ||
documentAssembler, | ||
sentencizer, | ||
tokenizer, | ||
tokenClassifier, | ||
ner_converter | ||
]) | ||
|
||
empty_data = spark.createDataFrame([[""]]).toDF("text") | ||
|
||
model = nlpPipeline.fit(empty_data) | ||
|
||
text='''The Company shall protect and indemnify the Supplier against any damages, losses or costs whatsoever''' | ||
|
||
data = spark.createDataFrame([[text]]).toDF("text") | ||
model = nlpPipeline.fit(data) | ||
lmodel = LightPipeline(model) | ||
res = lmodel.annotate(text) | ||
``` | ||
|
||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
+----------+---------------------------------+ | ||
| token| ner_label| | ||
+----------+---------------------------------+ | ||
| The| O| | ||
| Company| O| | ||
| shall| B-INDEMNIFICATION_ACTION| | ||
| protect| I-INDEMNIFICATION_ACTION| | ||
| and| O| | ||
| indemnify| B-INDEMNIFICATION_ACTION| | ||
| the| O| | ||
| Supplier|B-INDEMNIFICATION_INDIRECT_OBJECT| | ||
| against| O| | ||
| any| O| | ||
| damages| B-INDEMNIFICATION| | ||
| ,| O| | ||
| losses| B-INDEMNIFICATION| | ||
| or| O| | ||
| costs| B-INDEMNIFICATION| | ||
|whatsoever| O| | ||
+----------+---------------------------------+ | ||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|legner_bert_indemnifications| | ||
|Compatibility:|Spark NLP for Legal 1.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Input Labels:|[sentence, token]| | ||
|Output Labels:|[ner]| | ||
|Language:|en| | ||
|Size:|412.2 MB| | ||
|Case sensitive:|true| | ||
|Max sentence length:|128| | ||
|
||
## References | ||
|
||
In-house annotated examples from CUAD legal dataset | ||
|
||
## Benchmarking | ||
|
||
```bash | ||
precision recall f1-score support | ||
|
||
B-INDEMNIFICATION 0.91 0.89 0.90 36 | ||
B-INDEMNIFICATION_ACTION 0.92 0.71 0.80 17 | ||
B-INDEMNIFICATION_INDIRECT_OBJECT 0.88 0.88 0.88 40 | ||
B-INDEMNIFICATION_SUBJECT 0.71 0.56 0.63 9 | ||
I-INDEMNIFICATION 0.88 0.78 0.82 9 | ||
I-INDEMNIFICATION_ACTION 0.81 0.87 0.84 15 | ||
I-INDEMNIFICATION_INDIRECT_OBJECT 1.00 0.53 0.69 17 | ||
O 0.97 0.91 0.94 510 | ||
|
||
accuracy 0.88 654 | ||
macro avg 0.71 0.61 0.81 654 | ||
weighted avg 0.95 0.88 0.91 654 | ||
``` |