-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more information to outputs (add instance indices for error types) #72
Conversation
…ious, and missed entities
Hi @jackboyla, and thanks a lot for you collaboration. Please don't use Going back to the original issue:
I see that the question was to have a way to find which instances were marked under correct, incorrect, spurious, etc. Maybe it's also useful to have a function that prints a nice output to the console with those entities, just as indices, but optionally if the text is given text surface strings themselves. What do you think @ivyleavedtoadflax ? By the way, late happy New Year to you all :) |
Hi @jackboyla, thanks for your updates - I've left a few comments, mostly regarding variable names. I fear that this code starts to become a bit spaghetti; that's the reason I'm so strict. By the way, did you run the code quality checks locally? |
Hey @davidsbatista, thanks for the feedback! I appreciate the strictness, it's in danger of getting messy 😅 I will run the code quality checks now, I just had one more question: The current implementation will provide output like:
Each value in an index list represents an instance of an entity that was predicted correctly/erroneously. You can see for I can add this information so each element is a tuple
Do you think this addition would be valuable? |
Sorry, I am a bit confused about how this can happen? How can a predicted entity be correct twice under the same evaluation scenario. Maybe you can show me an example? As mentioned earlier, I haven't looked at and touched the code in a while. I thought that the index represents each instance in the prediction list, as shown below:
Regarding this suggestion, I think this starts to become really convoluted. The end user just wants a nice report, in this case, the entities themselves or the offsets of the entities in the document which were incorrect under each scenario. Please, see the two functions of the below:
If you run those functions, you will see a nice report, see the folder examples, to run in example that cause this functions. We can use the format that you proposed as some intermediate representation, but I don't think that should ever be exposed to the user. You can rely on that intermediate representation to output a nice report to the console or to a file. Sorry if this seems more work than you had initially thought, but I think it's better if we tackle this in a structured and clean way to avoid having more and more spaghetti code. |
Thanks for taking the time to give this feedback 😄 I think I didn't make it clear in my last comment. Here's an example: We have predictions for 3 separate instances:
We see that for instance at index
This tells the user that 2 I then proposed:
This would include the position of the predictions within the instance where the error occurred:
With regards to the above, I do agree that it becomes too convoluted, and it would be impossible to print this nicely when dealing with many instances and entities. Looking at it from the end-user's point of view, they just want an output that shows them at what instances the NER model failed. If there is an error in the instance, the user should look at the entire instance. Regarding the printing, I can do it in the style of I hope this makes it a bit clearer what I'm trying to say 😃 |
Hi @davidsbatista, I've added print functions for both overall evaluation indices and per-entity indices -- Here's an example:
You have the option to provide
which will return:
or do not add
On an per-entity level, we can use:
Again, the
|
Hello @jackboyla and thanks once again for your efforts. This seems to be in line with what I envision as a report summary. I will approve the workflow so that the call quality checks can be run and we can start reviewing the code together. |
@ivyleavedtoadflax it seems the coverage badge is giving issues again. Do you have any idea why? Maybe we can disable it or try to find a replacement, why do you think? |
@ivyleavedtoadflax shall I just open a PR to disable the coverage badge? |
@ivyleavedtoadflax bump |
Hey sorry for the slow reply. Let's go with the easy option of disabling for now to unblock this PR. Are you ok to put in a PR? 🙏 |
@jackboyla merge the |
done! :) @ivyleavedtoadflax do you know if I have permissions to make a new release? |
great stuff thank you! 😄 I just realised I forgot to include in the README how this changes things.
Additionally, I wanted to include how users can pretty print this new info:
@davidsbatista would it be possible for you to add this to the README so we don't have to revert the whole PR? |
feel free open another PR |
ok cool #74 |
Hi 👋 I would like
nervaluate
's results to show me the examples where my NER model makes mistakes, i.e false positive/false negatives. This functionality has already been mentioned in issue #68.I have made some simple additions to
evaluate.py
, but this will break a lot of tests. I'm happy to adapt them, but I want to first check if we are happy with this form of additional information before I continue 😄Here's an example of how the changes affect the output:
Our overall results will look like:
The relevant indices will also be added on the per-tag output.