Skip to content

Commit

Permalink
Models hub legal (#12835)
Browse files Browse the repository at this point in the history
* 2022-09-19-legre_indemnifications_en (#12758)

* Add model 2022-09-19-legre_indemnifications_en

* Add model 2022-09-19-legner_bert_indemnifications_en

Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>

* 2022-09-20-legclf_cuad_confidentiality_clause_en (#12770)

* Add model 2022-09-20-legclf_cuad_confidentiality_clause_en

* Add model 2022-09-20-legclf_cuad_indemnifications_clause_en

* Add model 2022-09-20-legclf_cuad_licenses_clause_en

* Add model 2022-09-20-legclf_cuad_obligations_clause_en

* Add model 2022-09-20-legclf_cuad_whereas_clause_en

Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>

* Add model 2022-09-27-legclf_cuad_licenses_clause_en (#12827)

Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>

* Add model 2022-09-27-legclf_cuad_indemnifications_clause_en (#12828)

Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>

* Add model 2022-09-27-legner_bert_indemnifications_en (#12831)

Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>

* Add model 2022-09-27-legassertion_time_en (#12832)

Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>

Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
  • Loading branch information
josejuanmartinez and jsl-models authored Sep 27, 2022
1 parent 5d3024a commit 9bd64ae
Show file tree
Hide file tree
Showing 2 changed files with 258 additions and 0 deletions.
114 changes: 114 additions & 0 deletions docs/_posts/josejuanmartinez/2022-09-27-legassertion_time_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
layout: model
title: Temporality / Certainty Assertion Status
author: John Snow Labs
name: legassertion_time
date: 2022-09-27
tags: [en, licensed]
task: Assertion Status
language: en
edition: Spark NLP for Legal 1.0.0
spark_version: 3.0
supported: true
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is an Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your legal documents

## Predicted Entities

`PRESENT`, `PAST`, `FUTURE`, `POSSIBLE`

{:.btn-box}
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGASSERTION_TEMPORALITY){:.button.button-orange}
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legassertion_time_en_1.0.0_3.0_1664274039847.zip){:.button.button-orange.button-orange-trans.arr.button-icon}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
# YOUR NER HERE
# ...
embeddings = BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")

chunk_converter = ChunkConverter() \
.setInputCols(["entity"]) \
.setOutputCol("ner_chunk")

assertion = leg.AssertionDLModel.pretrained("legassertion_time", "en", "legal/models")\
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")

nlpPipeline = Pipeline(stages=[
documentAssembler,
tokenizer,
embeddings,
ner,
chunk_converter,
assertion
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

lp = LightPipeline(model)

texts = ["The subsidiaries of Atlantic Inc will participate in a merging operation",
"The Conditions and Warranties of this agreement might be modified"]

lp.annotate(texts)
```

</div>

## Results

```bash
chunk,begin,end,entity_type,assertion
Atlantic Inc,20,31,ORG,FUTURE

chunk,begin,end,entity_type,assertion
Conditions and Warranties,4,28,DOC,POSSIBLE
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legassertion_time|
|Compatibility:|Spark NLP for Legal 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[document, doc_chunk, embeddings]|
|Output Labels:|[assertion]|
|Language:|en|
|Size:|2.2 MB|

## References

In-house annotations on financial and legal corpora

## Benchmarking

```bash
label tp fp fn prec rec f1
PRESENT 201 11 16 0.9481132 0.92626727 0.937063
POSSIBLE 171 3 6 0.98275864 0.9661017 0.9743589
FUTURE 119 6 4 0.952 0.96747965 0.95967746
PAST 270 16 10 0.9440559 0.96428573 0.9540636
tp: 761 fp: 36 fn: 36 labels: 4
Macro-average prec: 0.9567319, rec: 0.9560336, f1: 0.95638263
Micro-average prec: 0.9548306, rec: 0.9548306, f1: 0.9548306
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
layout: model
title: Legal Indemnification NER (Bert, base)
author: John Snow Labs
name: legner_bert_indemnifications
date: 2022-09-27
tags: [indemnifications, en, licensed]
task: Named Entity Recognition
language: en
edition: Spark NLP for Legal 1.0.0
spark_version: 3.0
supported: true
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Indemnification clauses.

## Predicted Entities

`INDEMNIFICATION`, `INDEMNIFICATION_SUBJECT`, `INDEMNIFICATION_ACTION`, `INDEMNIFICATION_INDIRECT_OBJECT`

{:.btn-box}
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_bert_indemnifications_en_1.0.0_3.0_1664273651991.zip){:.button.button-orange.button-orange-trans.arr.button-icon}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentencizer = SentenceDetectorDLModel\
.pretrained("sentence_detector_dl", "en") \
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)

ner_converter = NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")

nlpPipeline = Pipeline(stages=[
documentAssembler,
sentencizer,
tokenizer,
tokenClassifier,
ner_converter
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text='''The Company shall protect and indemnify the Supplier against any damages, losses or costs whatsoever'''

data = spark.createDataFrame([[text]]).toDF("text")
model = nlpPipeline.fit(data)
lmodel = LightPipeline(model)
res = lmodel.annotate(text)
```

</div>

## Results

```bash
+----------+---------------------------------+
| token| ner_label|
+----------+---------------------------------+
| The| O|
| Company| O|
| shall| B-INDEMNIFICATION_ACTION|
| protect| I-INDEMNIFICATION_ACTION|
| and| O|
| indemnify| B-INDEMNIFICATION_ACTION|
| the| O|
| Supplier|B-INDEMNIFICATION_INDIRECT_OBJECT|
| against| O|
| any| O|
| damages| B-INDEMNIFICATION|
| ,| O|
| losses| B-INDEMNIFICATION|
| or| O|
| costs| B-INDEMNIFICATION|
|whatsoever| O|
+----------+---------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legner_bert_indemnifications|
|Compatibility:|Spark NLP for Legal 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[ner]|
|Language:|en|
|Size:|412.2 MB|
|Case sensitive:|true|
|Max sentence length:|128|

## References

In-house annotated examples from CUAD legal dataset

## Benchmarking

```bash
precision recall f1-score support

B-INDEMNIFICATION 0.91 0.89 0.90 36
B-INDEMNIFICATION_ACTION 0.92 0.71 0.80 17
B-INDEMNIFICATION_INDIRECT_OBJECT 0.88 0.88 0.88 40
B-INDEMNIFICATION_SUBJECT 0.71 0.56 0.63 9
I-INDEMNIFICATION 0.88 0.78 0.82 9
I-INDEMNIFICATION_ACTION 0.81 0.87 0.84 15
I-INDEMNIFICATION_INDIRECT_OBJECT 1.00 0.53 0.69 17
O 0.97 0.91 0.94 510

accuracy 0.88 654
macro avg 0.71 0.61 0.81 654
weighted avg 0.95 0.88 0.91 654
```

0 comments on commit 9bd64ae

Please sign in to comment.