Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Interpreting the contents of dump_model for multi-label classification #6157

Closed
Opt-Mucca opened this issue Oct 27, 2023 · 3 comments
Labels

Comments

@Opt-Mucca
Copy link

I am currently having issues determining which trees correspond to which class from the output of LGBMClassifier.booster_.dump_model().

Take a simple scenario of LGBMClassifier has n_classes_=3 and n_estimators_=2. From my understanding of GBDTs and RFs in LightGBM, for each class label n_estimators_ are constructed. This is supported by the output of raw = LGBMClassifier.booster_.dump_model(), where raw["tree_info"] is a list containing 2*3 decision trees.

My question is:
How do I know which set of n_estimators_ decision trees correspond to a given class?

In the output of raw = LGBMClassifier.booster_.dump_model() I can find no information on which class from the set {0, 1, 2} raw["tree_info"][i] corresponds to for i in {0,...,5}. So let the classes be {0, 1, 2}. Are the 6 trees assumed to be ordered in any of the following ways?: [0, 1, 2, 0, 1, 2] or [0, 0, 1, 1, 2, 2]. If not, where would I find this information? (I have looked into the reader and believe it is doing the first option [0, 1, 2, 0, 1, 2] , but would like some confirmation. I would also assume this information should be somewhere in the output of dump_model, but I cannot find anything).

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM.

for each class label n_estimators_ are constructed

This is correct specifically in the case of multi-class classification.

For binary classification, there will be n_estimators_ trees (assuming no early stopping).

are the 6 trees assumed to be ordered in any of the following ways?

Using the terminology from your example, the trees for a multiclass classification model are ordered [0, 1, 2, 0, 1, 2, ...].

You could also confirm this by checking the predictions, e.g. like this example with the R package: #5223 (comment).

@Opt-Mucca
Copy link
Author

Thanks for the great response! It perfectly answers the question. Additional thanks for even specifying the binary case.

Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants