You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I have been testing out Augur on my single cell dataset that contains 25 different cell types that range in abundance from 50 cells to over 4000 cells. I’ve noticed that Augur seems to produce higher AUC values for some of the least abundant cell populations that I do not expect to be majorly changed between the conditions. I wonder if this is due to the subsample size – does randomly drawing 20/100 cells repeatedly train the classifier to cover more variation in the cell population than drawing 20/2000 cells repeatedly? If this is the case, do you have any recommendations for how to address this potential bias/which arguments to adjust in the calculate_auc function? Thanks!
The text was updated successfully, but these errors were encountered:
Hi @hayfre - without knowing more about your particular dataset I can only speak in generalities, but my intuition would be that if you can rule out a biological effect, there may be a significant technical effect affecting this population (for instance - cells of this type from one of your libraries are stressed/dying). If there are only a few cells of this type, these cells would be present in every subsample and would make the two conditions easier for the RF to separate.
This is just one potential explanation, but you could experiment with changing the subsample size and see if your results are stable - we found they generally were (Supp. Figs. 6 and 10 in the Augur paper) but it may be the case that the AUC for your small cell population is more sensitive. Only thing I would suggest is if you are going to lower the subsample size you may want to increase the number of subsamples to give the prioritization a better chance to 'converge'.
Hi! I have been testing out Augur on my single cell dataset that contains 25 different cell types that range in abundance from 50 cells to over 4000 cells. I’ve noticed that Augur seems to produce higher AUC values for some of the least abundant cell populations that I do not expect to be majorly changed between the conditions. I wonder if this is due to the subsample size – does randomly drawing 20/100 cells repeatedly train the classifier to cover more variation in the cell population than drawing 20/2000 cells repeatedly? If this is the case, do you have any recommendations for how to address this potential bias/which arguments to adjust in the calculate_auc function? Thanks!
The text was updated successfully, but these errors were encountered: