Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-Shot NER gives wrong entities with labels #14159

Closed
1 task done
RakshaRaoKaraya opened this issue Feb 2, 2024 · 12 comments · Fixed by #14190
Closed
1 task done

Zero-Shot NER gives wrong entities with labels #14159

RakshaRaoKaraya opened this issue Feb 2, 2024 · 12 comments · Fixed by #14190
Assignees
Labels

Comments

@RakshaRaoKaraya
Copy link

RakshaRaoKaraya commented Feb 2, 2024

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

@marz @josejuanmartinez

What are you working on?

I was trying out the official example: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.6.ZeroShot_Clinical_NER.ipynb. However, the results do not match with what I see in my notebook.
image
Seems like the row number of each entity is pushed by +1.

Current Behavior

The only difference that I have with the official documentation is that I am using the Roberta model without the clinical.
zero_shot_ner = ZeroShotNerModel.pretrained("zero_shot_ner_roberta", "en")
.setEntityDefinitions(
{
"PROBLEM": ["What is the disease?", "What is his symptom?", "What is her disease?", "What is his disease?",
"What is the problem?" ,"What does a patient suffer", 'What was the reason that the patient is admitted to the clinic?'],
"DRUG": ["Which drug?", "Which is the drug?", "What is the drug?", "Which drug does he use?", "Which drug does she use?", "Which drug do I use?", "Which drug is prescribed for a symptom?"],
"ADMISSION_DATE": ["When did patient admitted to a clinic?"],
"PATIENT_AGE": ["How old is the patient?",'What is the gae of the patient?']
})
.setInputCols(["sentence", "token"])
.setOutputCol("zero_shot_ner")
.setPredictionThreshold(0.1) # default 0.01

Expected Behavior

I would like to see the same results as given in the official documentation

Steps To Reproduce

https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.6.ZeroShot_Clinical_NER.ipynb

Spark NLP version and Apache Spark

Spark NLP version: 5.2.3
Apache Spark version: 3.4.1

Type of Spark Application

Python Application

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

@RakshaRaoKaraya
Copy link
Author

Can someone please help me with this issue?

@RakshaRaoKaraya
Copy link
Author

RakshaRaoKaraya commented Feb 2, 2024

@maziyarpanahi, @josejuanmartinez : I have tried the example in https://sparknlp.org/2023/02/08/zero_shot_ner_roberta_en.html . It still gives me the same issue. It picks the next entity and gives it the label. The labels are right, but the entities are wrong.
image

@maziyarpanahi
Copy link
Member

@RakshaRaoKaraya

There seems to be an issue with NerConverter and ZeroShotNER. As you can see, the ZeroShot detected the entities and their labels correctly: https://colab.research.google.com/drive/1xX_9Rrnvxb7O-C9ICvTs9rSmA-rLSB_E?usp=sharing

result.select("zero_shot_ner.result").show(1, False)
+-------------------------------------------------------------------------------------+
|result                                                                               |
+-------------------------------------------------------------------------------------+
|[O, O, O, O, O, O, O, O, O, O, O, O, O, B-NAME, O, O, O, O, O, O, O, O, O, O, B-CITY]|
+-------------------------------------------------------------------------------------+

result.select("zero_shot_ner").show(1, False)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|zero_shot_ner                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{named_entity, 0, 5, O, {sentence -> 0, word -> Hellen}, []}, {named_entity, 7, 11, O, {sentence -> 0, word -> works}, []}, {named_entity, 13, 14, O, {sentence -> 0, word -> in}, []}, {named_entity, 16, 21, O, {sentence -> 0, word -> London}, []}, {named_entity, 22, 22, O, {sentence -> 0, word -> ,}, []}, {named_entity, 24, 28, O, {sentence -> 0, word -> Paris}, []}, {named_entity, 30, 32, O, {sentence -> 0, word -> and}, []}, {named_entity, 34, 39, O, {sentence -> 0, word -> Berlin}, []}, {named_entity, 40, 40, O, {sentence -> 0, word -> .}, []}, {named_entity, 42, 43, O, {sentence -> 0, word -> My}, []}, {named_entity, 45, 48, O, {sentence -> 0, word -> name}, []}, {named_entity, 50, 51, O, {sentence -> 0, word -> is}, []}, {named_entity, 53, 57, O, {sentence -> 0, word -> Clara}, []}, {named_entity, 58, 58, B-NAME, {sentence -> 0, word -> ,, confidence -> 0.94376206, question -> What is my name?}, []}, {named_entity, 60, 60, O, {sentence -> 0, word -> I}, []}, {named_entity, 62, 65, O, {sentence -> 0, word -> live}, []}, {named_entity, 67, 68, O, {sentence -> 0, word -> in}, []}, {named_entity, 70, 72, O, {sentence -> 0, word -> New}, []}, {named_entity, 74, 77, O, {sentence -> 0, word -> York}, []}, {named_entity, 79, 81, O, {sentence -> 0, word -> and}, []}, {named_entity, 83, 88, O, {sentence -> 0, word -> Hellen}, []}, {named_entity, 90, 94, O, {sentence -> 0, word -> lives}, []}, {named_entity, 96, 97, O, {sentence -> 0, word -> in}, []}, {named_entity, 99, 103, O, {sentence -> 0, word -> Paris}, []}, {named_entity, 104, 104, B-CITY, {sentence -> 0, word -> ., confidence -> 0.3440236, question -> Which is the city?}, []}]|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
only showing top 1 row

But it failed to put them together via NerConverter. Seems to be a bug, we'll take a look and fix it for the next release.

@maziyarpanahi maziyarpanahi added bug and removed question labels Feb 4, 2024
@RakshaRaoKaraya
Copy link
Author

@maziyarpanahi : Thank you very much for your reply. Would you be able to give me a tentative date when this could be resolved? Or is there any other NER models that I could use meanwhile to detected entities based on Question Answer Model?

@maziyarpanahi
Copy link
Member

You are very welcome. We have many NER models both based on our own architecture or all-in-one from Hugging Face. (trained by us)

They are not however zero-shot, meaning the labels are pre-defined. You can have a look for the meantime while we fix this small part:

@RakshaRaoKaraya
Copy link
Author

@maziyarpanahi : Thank you for the suggestions. I was hoping to get a question answer type of NER. Even though I am capturing pre-defined entities, I need to capture answers to specific questions in the text. I think Zero-shot would help better. I didn't come across any other model that does that.

@RakshaRaoKaraya
Copy link
Author

Hello @maziyarpanahi : Could you please tell me when can I expect a resolution of this bug?

@maziyarpanahi
Copy link
Member

Hello @maziyarpanahi : Could you please tell me when can I expect a resolution of this bug?

Hi, we are still investigating this issue. I'll update here once we have a fix for it

@RakshaRaoKaraya
Copy link
Author

@maziyarpanahi : Thank you so much for your response. I will wait for the resolution then.

@RakshaRaoKaraya
Copy link
Author

Hello @maziyarpanahi : Thank you for looking into this issue. Is the model good to use now in python environment? Do I need to change my Spark NLP version for this to work?

@maziyarpanahi
Copy link
Member

Hello @maziyarpanahi : Thank you for looking into this issue. Is the model good to use now in python environment? Do I need to change my Spark NLP version for this to work?

You are welcome. You must have Spark NLP 5.3.0 and everything would work without any change.

@RakshaRaoKaraya
Copy link
Author

Works perfectly well now. Thank you once again 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants