-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase robustness of cell type prioritization #30
Comments
Hi Theo - I'm not entirely sure I understand your question. Augur doesn't test for statistical significance but simply returns the feature importance from the random forest algorithm. But there are a number of reasons to take these importances with a grain of salt and if you are interested in identifying statistically significant differences, a conventional differential expression (DE) analysis as implemented in our Libra package might make more sense. Beyond that, you can set |
Sorry, I was not clear enough. I guess my question can be rephrased as "what does "deature importance" actually mean? How can it be interpreted?". |
I am probably not going to give a better explanation than in the randomForest documentation. In Augur, the importance values are then averaged over repeated subsamples for each cell type. In general, I would recommend using the results of a DE analysis with Libra to identify genes that are changing between conditions within individual cell types, rather than relying on feature importance. |
Thank you again, I was wondering more about your interpretation of the usability of "feature importance" in cell type prioritization. For example, would 20,000 important features correlate with higher robustness rather than 15,000 features? Or would you say there is a lower threshold of features that signals more confidence, e.g. "AUC=0.8 based on 18,000 important features" compared to "AUC=0.7 based on 10,000 important features"? |
In general I would say I don't really factor this in and go solely by the AUC. Many subsamples of equal size (default=50) are being performed for each cell type, so the fact that 18,000 features have an assigned feature importance doesn't mean that all 18,000 were being used by every classifier trained for that cell type. Feature importance can also be zero or negative, so just because a feature importance is assigned doesn't mean that gene is actually a feature that has a positive impact on classification. |
Hi,
Best,
Theo
The text was updated successfully, but these errors were encountered: