Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test & Score misclassification with SVM model #7046

Open
VGBarauna opened this issue Mar 8, 2025 · 3 comments
Open

Test & Score misclassification with SVM model #7046

VGBarauna opened this issue Mar 8, 2025 · 3 comments
Labels
bug report Bug is reported by user, not yet confirmed by the core team

Comments

@VGBarauna
Copy link

What's wrong?

Image
Some samples have higher % for "positive" but are being classified as "negative"

How can we reproduce the problem?

Chagas teste.xlsx

Image

What's your environment?
3.37.00

  • Operating system: Windows 11
  • Orange version: 3.37.00
    Dowload from website
@VGBarauna VGBarauna added the bug report Bug is reported by user, not yet confirmed by the core team label Mar 8, 2025
@ales-erjavec
Copy link
Contributor

Probably https://scikit-learn.org/stable/modules/svm.html#scores-and-probabilities

TL;DR The base SVM method by construction does not support estimating probabilities. The probabilities that are reported by scikit-learn/libsvm are the result of a different model trained specifically to estimate probabilities on SVM decision function scores. But these can output probabilities that are not consistent with the base SVM prediction.

@VGBarauna
Copy link
Author

Dear Ales-erjavec,
Thank you very much for your explanations. So, this data is about scores and not probabilities. So I should consider only the prediction column in the data table and ignore these metrics, right? The next problem is that the ROC curve will be wrong because it is built with the probabilities of each sample, and in this case, the widget is getting the scores, right? Is there no way to make the ROC curve from the SVM since the data table generates scores and not probabilities?

@thocevar
Copy link
Contributor

@VGBarauna You can consider only the "SVM (positive)" column as the predicted probability of the positive class and compute anything else that you might need (e.g. actual class being positive if the prediction > 0.5) from that probability. You can also use these probabilities for ROC curve.

We might want to consider overriding the SklModel's class predictions with the argmax of the computed probability distribution (when available) for consistency.

return value, probs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team
Projects
None yet
Development

No branches or pull requests

3 participants