-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypedDependencyParser returning <no-type> as dep type #2775
Comments
More info, @danilojsl , @maziyarpanahi , @vkocaman Will check if the problem is different parsings coming out of DependencyParserModel. |
Some additional updates here, I suspect the punctuation is the problem, "he denies problems or pain" -> works "he denies problems, or pain" -> fails probably a mismatch in the encoding of the labels between test & training datasets |
More progress on this one, it seems that there's a mismatch between the contents of the map that the model uses to represent POS and lemmas, TypedDependencyParserModel.dependencyPipe.getDictionariesSet.getDictionaries between a model that has just been trained and a model that has been loaded from disk. {cpos=DT=41,feat=Degree=Pos=55,cpos=CD=31,cpos=''=47,#TO........ So maybe there's a confusion between the actual value and the separator. It was very suspicious that the map was missing the entry cpos=,=46. And some others as well.
But it seems that was not enough, the problem persisted.
|
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days |
@sillystring13 thanks for reporting, I'm reopening the issue - please describe how you've been able to replicate it (library version, code, issue description) |
Issue: DependencyParserModel.pretrained('dependency_conllu') returns dependency_type= Steps to ReproduceI used the code from the display notebook: https://github.com/JohnSnowLabs/spark-nlp-display/blob/main/tutorials/Spark_NLP_Display.ipynb
Your Environment
|
Description
TypedDependencyParser is apparently not producing right outputs, according to experiments from this notebook,
https://colab.research.google.com/drive/1PF8PQfvH1qMmk630rQZST4SJx_EtGGAC?usp=sharing#scrollTo=RysvWpG7hUdk
What I've found out so far,
a) this is not a serialization issue.
b) this is not only happening in 3.0.x.
c) the original code found here, https://github.com/shentianxiao/RBGParser/tree/labeling
d) the algorithm uses an internal structure which is sparsely filled, so most likely the training was not enough to cover all cases.
Next action check if more training helps to improve the situation.
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce
Context
Your Environment
sparknlp.version()
:spark.version
:java -version
:The text was updated successfully, but these errors were encountered: