Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect of cell type abundance on AUC values #19

Open
hayfre opened this issue Jun 29, 2022 · 1 comment
Open

Effect of cell type abundance on AUC values #19

hayfre opened this issue Jun 29, 2022 · 1 comment

Comments

@hayfre
Copy link

hayfre commented Jun 29, 2022

Hi! I have been testing out Augur on my single cell dataset that contains 25 different cell types that range in abundance from 50 cells to over 4000 cells. I’ve noticed that Augur seems to produce higher AUC values for some of the least abundant cell populations that I do not expect to be majorly changed between the conditions. I wonder if this is due to the subsample size – does randomly drawing 20/100 cells repeatedly train the classifier to cover more variation in the cell population than drawing 20/2000 cells repeatedly? If this is the case, do you have any recommendations for how to address this potential bias/which arguments to adjust in the calculate_auc function? Thanks!

@skinnider
Copy link
Collaborator

Hi @hayfre - without knowing more about your particular dataset I can only speak in generalities, but my intuition would be that if you can rule out a biological effect, there may be a significant technical effect affecting this population (for instance - cells of this type from one of your libraries are stressed/dying). If there are only a few cells of this type, these cells would be present in every subsample and would make the two conditions easier for the RF to separate.

This is just one potential explanation, but you could experiment with changing the subsample size and see if your results are stable - we found they generally were (Supp. Figs. 6 and 10 in the Augur paper) but it may be the case that the AUC for your small cell population is more sensitive. Only thing I would suggest is if you are going to lower the subsample size you may want to increase the number of subsamples to give the prioritization a better chance to 'converge'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants