Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fix label mismatch of evaluation and validation with large dataset in semantic segmentation #1851

Merged
merged 9 commits into from
Mar 7, 2023

Conversation

supersoob
Copy link
Contributor

@supersoob supersoob commented Mar 3, 2023

Issue
(1) 'otx eval' (inference step & final evaluation score after training) shows us very low score for models trained with dataset with multiple classes(>10)
(2) The order of classes written in validation log is wrong with dataset with multiple classes(>10)
(3) [TODO] Semantic segmentation always produces lower evaluation score compared to the best validation score -> (Found solution but need some discussion)

Root Causes
(1)(2) : When converting segmentation mask to otx 2d numpy array(mapping to class index), it str-sorts the labels with id key.

labels = sorted(labels) # type: ignore

image
Due to this, when num_classes is above 10, the order is newly made like this. That caused mismatch with unsorted label schema.json which is saved in order of initial dataset_meta.json. Plus, it caused mismatch in actual prediction order from model head output.

(3) : Two factors in this issue. Background ignored and soft_threshold. The final evaluation score does not include the background label score, which never be the same with the best validation score. soft_threshold is set to 0.5 in default. It ignores the max score in prediction below the soft_threshold. But in eval hook(validation), soft_threshold is not considered which is same as threshold 0.0.

Solution
(1) : label dictionary are sorted before evaluation in inference.py rather than removing the sorted(labels) in mask_from_annnotation because anomalib also uses this func and can be affected to Geti which generates random bytes id.
(2) : project_labels are sorted before converting gt mask to otx mask and self.CLASSES are realigned with sorted one.
(3) : To be updated in other PR (need to discuss)

Checklists

  • [WIP] Unit test code
  • [WIP] CLI Test

@supersoob supersoob requested a review from a team as a code owner March 3, 2023 05:01
@github-actions github-actions bot added ALGO Any changes in OTX Algo Tasks implementation API Any changes in OTX API labels Mar 3, 2023
@supersoob supersoob requested review from JihwanEom and kprokofi March 3, 2023 05:03
@sungmanc
Copy link
Contributor

sungmanc commented Mar 3, 2023

@ashwinvaidya17 , could you double check the effect of mask_from_annnotation function ? As Soobee said, the sort logic looks weird.

@sungchul2 , maybe you also have knowledge about the segmentation issue, do you have any idea or comments?

sungmanc
sungmanc previously approved these changes Mar 3, 2023
Copy link
Contributor

@sungmanc sungmanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice work. BTW, unit-test checking is needed

@jaegukhyun jaegukhyun added this to the 1.1.0 milestone Mar 3, 2023
sungchul2
sungchul2 previously approved these changes Mar 3, 2023
Copy link
Contributor

@sungchul2 sungchul2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
But it would be better to handle it in mask_from_annnotation if there is a way to solve it in mask_from_annnotation.

@kprokofi
Copy link
Collaborator

kprokofi commented Mar 3, 2023

Unfortunately, I tested your branch and nothing changed
otx eval outputs much lower score than expected
Many classes aren't predicted at all during training:
image
otx eval:
image

I'm looking now also in that problem, but I think sorting classes didn't solve this issue

@kprokofi
Copy link
Collaborator

kprokofi commented Mar 3, 2023

during otx eval it seems like there is no labels in dataset at all:
image

@supersoob
Copy link
Contributor Author

supersoob commented Mar 5, 2023

Unfortunately, I tested your branch and nothing changed otx eval outputs much lower score than expected Many classes aren't predicted at all during training: image otx eval: image

I'm looking now also in that problem, but I think sorting classes didn't solve this issue

@kprokofi Hmm.. My VOC dataset is working well. Could you share your model checkpoint and validation dataset? FYI, this will be completely solved when my follow-up PR is merged (please see (3) in this PR description -> bg label is not still included and soft threshold is 0.5 in default)

@supersoob
Copy link
Contributor Author

supersoob commented Mar 5, 2023

during otx eval it seems like there is no labels in dataset at all: image

And it's natural that MPASegDataset for test dataset does not include any labels except for bg because otx eval cli passes validation dataset to infer function with_empty_annotations(). Score measurement is conducted with dataset with actual annotations in otx task side not in mpa.

@supersoob supersoob dismissed stale reviews from sungchul2 and sungmanc via 509d392 March 6, 2023 06:47
@github-actions github-actions bot added the TEST Any changes in tests label Mar 6, 2023
@codecov-commenter
Copy link

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.02 ⚠️

Comparison is base (4f1a47c) 80.52% compared to head (509d392) 80.51%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1851      +/-   ##
===========================================
- Coverage    80.52%   80.51%   -0.02%     
===========================================
  Files          477      477              
  Lines        32813    32834      +21     
===========================================
+ Hits         26423    26435      +12     
- Misses        6390     6399       +9     
Impacted Files Coverage Δ
otx/api/usecases/evaluation/basic_operations.py 98.30% <ø> (ø)
...rithms/segmentation/adapters/mmseg/data/dataset.py 88.88% <100.00%> (+0.29%) ⬆️
otx/algorithms/segmentation/tasks/inference.py 87.82% <100.00%> (ø)
otx/cli/tools/find.py 86.44% <0.00%> (-4.64%) ⬇️
otx/cli/tools/build.py 90.24% <0.00%> (-2.26%) ⬇️
...hms/detection/adapters/mmdet/utils/config_utils.py 93.29% <0.00%> (-1.22%) ⬇️
.../api/usecases/exportable_code/streamer/streamer.py 89.40% <0.00%> (-0.67%) ⬇️
otx/cli/manager/config_manager.py 84.98% <0.00%> (-0.28%) ⬇️
otx/algorithms/anomaly/tasks/inference.py 82.51% <0.00%> (-0.27%) ⬇️
otx/algorithms/common/tasks/training_base.py 50.57% <0.00%> (-0.15%) ⬇️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kprokofi
Copy link
Collaborator

kprokofi commented Mar 6, 2023

@supersoob thank you for this update. Testing cityscapes I found out mismatch in labels caused by:
image
There is no need to shift labels in that case
Could you please remove it? (I mean this "k+1") - simple delete this line.
training_extensions/otx/core/data/adapter/segmentation_dataset_adapter.py

@sungmanc
Copy link
Contributor

sungmanc commented Mar 7, 2023

@supersoob thank you for this update. Testing cityscapes I found out mismatch in labels caused by: image There is no need to shift labels in that case Could you please remove it? (I mean this "k+1") - simple delete this line. training_extensions/otx/core/data/adapter/segmentation_dataset_adapter.py

Currently, Cityscapes dataset is not supported due to the background label issue.

@supersoob
Copy link
Contributor Author

supersoob commented Mar 7, 2023

@supersoob thank you for this update. Testing cityscapes I found out mismatch in labels caused by: image There is no need to shift labels in that case Could you please remove it? (I mean this "k+1") - simple delete this line. training_extensions/otx/core/data/adapter/segmentation_dataset_adapter.py

Thank you for testing with various dataset. But unfortunately, every background insertion in otx is root cause for running cityscapes. Even if I shift the label, cityscape won't work as long as I don't remove all bg label in otx which is a very huge job. And that solution affects to voc and other dataset with bg. I plan to stabilize voc first and then look at cityscapes.. I hope to merge this PR if you don't have performance problem with voc and custom seg with >10 labels(with backgrounds)

Copy link
Collaborator

@kprokofi kprokofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it now! Thank you for solving labels mismatch problem

@kprokofi kprokofi merged commit 872f119 into develop Mar 7, 2023
@kprokofi kprokofi deleted the soobee/fix-eval-seg branch March 7, 2023 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ALGO Any changes in OTX Algo Tasks implementation API Any changes in OTX API TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants