TypedDependencyParser returning <no-type> as dep type #2775

albertoandreottiATgmail · 2021-04-19T14:45:32Z

Description

TypedDependencyParser is apparently not producing right outputs, according to experiments from this notebook,

https://colab.research.google.com/drive/1PF8PQfvH1qMmk630rQZST4SJx_EtGGAC?usp=sharing#scrollTo=RysvWpG7hUdk

What I've found out so far,
a) this is not a serialization issue.
b) this is not only happening in 3.0.x.
c) the original code found here, https://github.com/shentianxiao/RBGParser/tree/labeling
d) the algorithm uses an internal structure which is sparsely filled, so most likely the training was not enough to cover all cases.

Next action check if more training helps to improve the situation.

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

Spark NLP version sparknlp.version():
Apache NLP version spark.version:
Java version java -version:
Setup and installation (Pypi, Conda, Maven, etc.):
Operating System and version:
Link to your project (if any):

The text was updated successfully, but these errors were encountered:

albertoandreottiATgmail · 2021-04-20T00:27:24Z

More info, @danilojsl , @maziyarpanahi , @vkocaman
a) the problem is not related to different tokenization in conll training data compared to our tokenizer.
b) the problem is not related to a different POS input to the parser.
c) the problem is not related to OOV, words never seen during training.
d) narrowing down things as much as possible we get,
"He reports that he feels well and denies any problems, or pain." --> fails!
"He reports that he feels well and denies any problems." --> succeeds!

Will check if the problem is different parsings coming out of DependencyParserModel.
Ideas?

albertoandreottiATgmail · 2021-04-21T14:57:07Z

Some additional updates here, I suspect the punctuation is the problem,

"he denies problems or pain" -> works

"he denies problems, or pain" -> fails

probably a mismatch in the encoding of the labels between test & training datasets

albertoandreottiATgmail · 2021-04-23T19:46:59Z

More progress on this one, it seems that there's a mismatch between the contents of the map that the model uses to represent POS and lemmas,

TypedDependencyParserModel.dependencyPipe.getDictionariesSet.getDictionaries

between a model that has just been trained and a model that has been loaded from disk.
So it seems the serialization is dropping some content in those dictionaries.
Haven't looked into detail into the serialization process, but it seems dictionaries are converted to Strings with a comma as a separator,

{cpos=DT=41,feat=Degree=Pos=55,cpos=CD=31,cpos=''=47,#TO........

So maybe there's a confusion between the actual value and the separator. It was very suspicious that the map was missing the entry cpos=,=46. And some others as well.
I tried adding back some of the missing content during deserialization,

  private def deserializeDictionaries(dictionariesValues: List[(TObjectIntHashMap[_], Int, Boolean)]): DictionarySet = {

    val dictionarySet = getDictionarySetInstance

    dictionariesValues.zipWithIndex.foreach { case (dictionaryValue, index) =>
      // TODO this is not a fix! - values taken from training
      if (index == 0)
          dictionaryValue._1.asInstanceOf[TObjectIntHashMap[String]].put("cpos=,", 46)

      if (index == 1) {
        dictionaryValue._1.asInstanceOf[TObjectIntHashMap[String]].put("form=,", 39)
        dictionaryValue._1.asInstanceOf[TObjectIntHashMap[String]].put("lemma=,", 58)
      }

But it seems that was not enough, the problem persisted.
What to do next?

Try the problematic sentence on a model that hasn't gone through the serialization process. This means train a new model a try the problematic sentence on it, don't use pretrained(). If the error is gone it means we have more serialization problems.
In general the model has a lot of commented code, TODOs, and things that are disabled. I would work more on enabling these things.
Investigate discrepancies in the sentence representation between training and inference.

github-actions · 2022-02-15T00:12:39Z

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

luca-martial · 2023-02-21T11:03:43Z

@sillystring13 thanks for reporting, I'm reopening the issue - please describe how you've been able to replicate it (library version, code, issue description)

w2o-hbrashear · 2023-02-23T18:14:16Z

Issue: DependencyParserModel.pretrained('dependency_conllu') returns dependency_type=<no-type> on some input with punctuation

Steps to Reproduce

I used the code from the display notebook: https://github.com/JohnSnowLabs/spark-nlp-display/blob/main/tutorials/Spark_NLP_Display.ipynb
You reproduce this by changing code in cell 5

text = "he denies problems or pain" works
{'dependency': [Annotation(dependency, 0, 1, denies, {'head': '2', 'head.begin': '3', 'head.end': '8', 'sentence': '0'}, []), Annotation(dependency, 3, 8, problems, {'head': '3', 'head.begin': '10', 'head.end': '17', 'sentence': '0'}, []), Annotation(dependency, 10, 17, ROOT, {'head': '0', 'head.begin': '-1', 'head.end': '-1', 'sentence': '0'}, []), Annotation(dependency, 19, 20, pain, {'head': '5', 'head.begin': '22', 'head.end': '25', 'sentence': '0'}, []), Annotation(dependency, 22, 25, problems, {'head': '3', 'head.begin': '10', 'head.end': '17', 'sentence': '0'}, [])], 'dependency_type': [Annotation(labeled_dependency, 0, 1, nsubj, {'sentence': '0'}, []), Annotation(labeled_dependency, 3, 8, parataxis, {'sentence': '0'}, []), Annotation(labeled_dependency, 10, 17, root, {'sentence': '0'}, []), Annotation(labeled_dependency, 19, 20, compound, {'sentence': '0'}, []), Annotation(labeled_dependency, 22, 25, amod, {'sentence': '0'}, [])], 'document': [Annotation(document, 0, 25, he denies problems or pain, {}, [])], 'pos': [Annotation(pos, 0, 1, PRP, {'word': 'he', 'sentence': '0'}, []), Annotation(pos, 3, 8, VBZ, {'word': 'denies', 'sentence': '0'}, []), Annotation(pos, 10, 17, NNS, {'word': 'problems', 'sentence': '0'}, []), Annotation(pos, 19, 20, CC, {'word': 'or', 'sentence': '0'}, []), Annotation(pos, 22, 25, NN, {'word': 'pain', 'sentence': '0'}, [])], 'token': [Annotation(token, 0, 1, he, {'sentence': '0'}, []), Annotation(token, 3, 8, denies, {'sentence': '0'}, []), Annotation(token, 10, 17, problems, {'sentence': '0'}, []), Annotation(token, 19, 20, or, {'sentence': '0'}, []), Annotation(token, 22, 25, pain, {'sentence': '0'}, [])]}
text = "he denies problems, or pain" fails with the <no-type> label on everything
{'dependency': [Annotation(dependency, 0, 1, denies, {'head': '2', 'head.begin': '3', 'head.end': '8', 'sentence': '0'}, []), Annotation(dependency, 3, 8, problems, {'head': '3', 'head.begin': '10', 'head.end': '17', 'sentence': '0'}, []), Annotation(dependency, 10, 17, ROOT, {'head': '0', 'head.begin': '-1', 'head.end': '-1', 'sentence': '0'}, []), Annotation(dependency, 18, 18, pain, {'head': '6', 'head.begin': '23', 'head.end': '26', 'sentence': '0'}, []), Annotation(dependency, 20, 21, pain, {'head': '6', 'head.begin': '23', 'head.end': '26', 'sentence': '0'}, []), Annotation(dependency, 23, 26, problems, {'head': '3', 'head.begin': '10', 'head.end': '17', 'sentence': '0'}, [])], 'dependency_type': [Annotation(labeled_dependency, 0, 1, <no-type>, {'sentence': '0'}, []), Annotation(labeled_dependency, 3, 8, <no-type>, {'sentence': '0'}, []), Annotation(labeled_dependency, 10, 17, <no-type>, {'sentence': '0'}, []), Annotation(labeled_dependency, 18, 18, <no-type>, {'sentence': '0'}, []), Annotation(labeled_dependency, 20, 21, <no-type>, {'sentence': '0'}, []), Annotation(labeled_dependency, 23, 26, <no-type>, {'sentence': '0'}, [])], 'document': [Annotation(document, 0, 26, he denies problems, or pain, {}, [])], 'pos': [Annotation(pos, 0, 1, PRP, {'word': 'he', 'sentence': '0'}, []), Annotation(pos, 3, 8, VBZ, {'word': 'denies', 'sentence': '0'}, []), Annotation(pos, 10, 17, NNS, {'word': 'problems', 'sentence': '0'}, []), Annotation(pos, 18, 18, ,, {'word': ',', 'sentence': '0'}, []), Annotation(pos, 20, 21, CC, {'word': 'or', 'sentence': '0'}, []), Annotation(pos, 23, 26, NN, {'word': 'pain', 'sentence': '0'}, [])], 'token': [Annotation(token, 0, 1, he, {'sentence': '0'}, []), Annotation(token, 3, 8, denies, {'sentence': '0'}, []), Annotation(token, 10, 17, problems, {'sentence': '0'}, []), Annotation(token, 18, 18, ,, {'sentence': '0'}, []), Annotation(token, 20, 21, or, {'sentence': '0'}, []), Annotation(token, 23, 26, pain, {'sentence': '0'}, [])]}

Your Environment

Spark NLP version sparknlp.version(): 4.2.7
Apache NLP version spark.version: 3.3.1
Java version java -version: openjdk version "1.8.0_345" OpenJDK Runtime Environment (Zulu 8.64.0.19-CA-linux64) (build 1.8.0_345-b01) OpenJDK 64-Bit Server VM (Zulu 8.64.0.19-CA-linux64) (build 25.345-b01, mixed mode)`
Setup and installation (Pypi, Conda, Maven, etc.):
spark-nlp-display==4.1.0
johnsnowlabs-for-databricks==4.3.2
jsl.jar
dbfs:/FileStore/johnsnowlabs/libs/spark-nlp-jsl-4.3.0.jar
dbfs:/FileStore/johnsnowlabs/libs/spark_nlp_jsl-4.3.0-py3-none-any.whl
dbfs:/FileStore/johnsnowlabs/libs/spark_nlp_jsl-4.3.0-py3-none-any.whl
assembly.jar
dbfs:/FileStore/johnsnowlabs/libs/spark-ocr-assembly-4.3.0.jar
dbfs:/FileStore/johnsnowlabs/libs/spark_ocr-4.3.0-py3-none-any.whl
dbfs:/FileStore/johnsnowlabs/libs/spark_ocr-4.3.0-py3-none-any.whl
spark-nlp==4.3.0
com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.0
Operating System and version: DataBricks 9.1 LTS ML (includes Apache Spark 3.1.2, Scala 2.12) - Fresh install on GCP from the DB Installer
Link to your project (if any):

albertoandreottiATgmail assigned maziyarpanahi and albertoandreottiATgmail and unassigned maziyarpanahi Apr 19, 2021

github-actions bot added the Stale label Feb 15, 2022

github-actions bot closed this as completed Feb 20, 2022

luca-martial reopened this Feb 21, 2023

github-actions bot removed the Stale label Feb 22, 2023

danilojsl mentioned this issue Mar 10, 2023

SPARKNLP-750 DependencyParserModel Outputs All Chunks as <no-type> #13620

Merged

10 tasks

maziyarpanahi linked a pull request Mar 14, 2023 that will close this issue

release/432-release-candidate #13648

Merged

maziyarpanahi closed this as completed in #13648 Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypedDependencyParser returning <no-type> as dep type #2775

TypedDependencyParser returning <no-type> as dep type #2775

albertoandreottiATgmail commented Apr 19, 2021 •

edited

Loading

albertoandreottiATgmail commented Apr 20, 2021 •

edited

Loading

albertoandreottiATgmail commented Apr 21, 2021

albertoandreottiATgmail commented Apr 23, 2021

github-actions bot commented Feb 15, 2022

luca-martial commented Feb 21, 2023

w2o-hbrashear commented Feb 23, 2023

TypedDependencyParser returning <no-type> as dep type #2775

TypedDependencyParser returning <no-type> as dep type #2775

Comments

albertoandreottiATgmail commented Apr 19, 2021 • edited Loading

Description

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

albertoandreottiATgmail commented Apr 20, 2021 • edited Loading

albertoandreottiATgmail commented Apr 21, 2021

albertoandreottiATgmail commented Apr 23, 2021

github-actions bot commented Feb 15, 2022

luca-martial commented Feb 21, 2023

w2o-hbrashear commented Feb 23, 2023

Steps to Reproduce

Your Environment

albertoandreottiATgmail commented Apr 19, 2021 •

edited

Loading

albertoandreottiATgmail commented Apr 20, 2021 •

edited

Loading