Very small number of trees affect on AUC and comparing AUC across experiments #14

vincentrose88 · 2021-11-04T09:59:33Z

Hi

Great work on the R-package Augur!

I'm using it to prioritise cell types response on treatment on a disease in two setups: 1x treatment and 5x treatment, and I have a couple of question on how to interpret and use the AUC results:

AUC comparison across experiments?

My question is: Can I compare the AUC across these experiments directly, or can I only use the rank?

For example: Does the Cell-type_A in G2 have a comparable response to Cell-type_A in G1, while Cell-type_I have a significantly bigger response in G2 than G1 in below results?

Results

The experimental groups and results are (anonymised due this being a clients data):

G1: 1x treatment + disease (case) VS 1x placebo + disease (control)

  cell_type     auc
  <chr>         <dbl>
1 Cell-type_B   0.952
2 Cell-type_A   0.944
3 Cell-type_C   0.838
4 Cell-type_E   0.719
5 Cell-type_D   0.707
6 Cell-type_F   0.668
7 Cell-type_H   0.666
8 Cell-type_G   0.666
9 Cell-type_I   0.640

G2: 5x treatment + disease (case) VS 5x placebo + disease (control)

  cell_type    auc
  <chr>        <dbl>
1 Cell-type_A  0.991
2 Cell-type_B  0.976
3 Cell-type_C  0.974
4 Cell-type_D  0.957
5 Cell-type_E  0.957
6 Cell-type_F  0.953
7 Cell-type_G  0.946
8 Cell-type_H  0.935
9 Cell-type_I  0.931

Number of trees affect on AUC

For the experiment group G2 (5x treatment vs 5x placebo), I only get useful results if I use a very low number of trees, as you suggest in your paper (Methods: Hyperparameter analysis)

[…] Empirically, we suggest decreasing the number of trees in the random forest classifier in scenarios where perfect classification can be achieved for many cell types (Supplementary Fig. 10g).

My question is simply: Does it makes sense to have so few trees?

Results

(Only number of trees changes, all other options are default)

Num_tree = 50

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_E  1   
2 Cell-type_A  1   
3 Cell-type_I  1
4 Cell-type_D  1
5 Cell-type_H  1
6 Cell-type_F  1
7 Cell-type_C  1
8 Cell-type_B  1
9 Cell-type_G  1

Num_tree = 10

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_E  1   
2 Cell-type_A  1   
3 Cell-type_I  1.00
4 Cell-type_D  1.00
5 Cell-type_H  1.00
6 Cell-type_F  1.00
7 Cell-type_C  1.00
8 Cell-type_B  1.00
9 Cell-type_G  1.00

Num_tree = 5

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_A  1.00 
2 Cell-type_B  0.999
3 Cell-type_E  0.998
4 Cell-type_C  0.996
5 Cell-type_D  0.996
6 Cell-type_F  0.995
7 Cell-type_H  0.993
8 Cell-type_G  0.993
9 Cell-type_I  0.990

Num_tree = 3

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_A  0.996
2 Cell-type_B  0.995
3 Cell-type_C  0.989
4 Cell-type_F  0.984
5 Cell-type_D  0.983
6 Cell-type_E  0.982
7 Cell-type_G  0.979
8 Cell-type_H  0.975
9 Cell-type_I  0.965

Num_tree = 2

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_A  0.991
2 Cell-type_B  0.976
3 Cell-type_C  0.974
4 Cell-type_D  0.957
5 Cell-type_E  0.957
6 Cell-type_F  0.953
7 Cell-type_G  0.946
8 Cell-type_H  0.935
9 Cell-type_I  0.931

Num_tree = 1

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_A  0.942
2 Cell-type_B  0.933
3 Cell-type_C  0.893
4 Cell-type_G  0.873
5 Cell-type_E  0.870
6 Cell-type_D  0.870
7 Cell-type_F  0.857
8 Cell-type_H  0.839
9 Cell-type_I  0.812

Looking forward to your feedback and thanks in advance!

Kind regard

jordansquair · 2021-11-04T16:08:30Z

Are you using a seurat object as input or directly a count/normalized matrix? If a Seurat object, can you check the default assay?

vincentrose88 · 2021-11-04T18:01:15Z

Are you using a seurat object as input or directly a count/normalized matrix? If a Seurat object, can you check the default assay?

Yes I'm using a Seurat object and the default assay is "integrated"

> DefaultAssay(seurat_obj)
[1] "integrated"

jordansquair · 2021-11-04T18:05:34Z

You will want to switch that back to "RNA" or directly input the count matrix.

DefaultAssay(obj) = "RNA"

Then run Augur.

To answer your question about the experimental design. Yes, you can compare the AUCs themselves.

You may want to consider using differential prioritization for this case also. You can see our protocol: https://www.nature.com/articles/s41596-021-00561-x for more details (specifically Case Study #4).

vincentrose88 · 2021-11-04T18:22:31Z

You will want to switch that back to "RNA" or directly input the count matrix.

DefaultAssay(obj) = "RNA"

Then run Augur.

To answer your question about the experimental design. Yes, you can compare the AUCs themselves.

You may want to consider using differential prioritization for this case also. You can see our protocol: https://www.nature.com/articles/s41596-021-00561-x for more details (specifically Case Study #4).

Thanks!

I'll give that a try!

vincentrose88 · 2021-11-05T14:00:00Z

Using RNA as the default assay I get more sensible results (with num tree = 50):

  annotation  auc
1 Cell-type_A 0.6052060
2 Cell-type_B 0.5276417
3 Cell-type_C 0.5242139
4 Cell-type_D 0.5189135
5 Cell-type_E 0.5170862
6 Cell-type_F 0.5112566
7 Cell-type_G 0.5066270
8 Cell-type_H 0.4989002

Thanks for the help! You can consider this issue closed 👍

vincentrose88 · 2021-11-15T12:36:13Z

Thinking more about these results, I'm surprised that the AUC is so much higher when running on a Seurat integrated space than on RNA:

RNA (num_tree = 50)

  annotation  auc
1 Cell-type_A 0.6052060
2 Cell-type_B 0.5276417
3 Cell-type_C 0.5242139
4 Cell-type_D 0.5189135
5 Cell-type_E 0.5170862
6 Cell-type_F 0.5112566
7 Cell-type_G 0.5066270
8 Cell-type_H 0.4989002

Integrated (num_tree = 2)

  cell_type       auc
  <chr>         <dbl>
1 Cell-type_A  0.991
2 Cell-type_B  0.976
3 Cell-type_C  0.974
4 Cell-type_D  0.957
5 Cell-type_E  0.957
6 Cell-type_F  0.953
7 Cell-type_G  0.946
8 Cell-type_H  0.935
9 Cell-type_I  0.931

Do you have any explanation for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very small number of trees affect on AUC and comparing AUC across experiments #14

Very small number of trees affect on AUC and comparing AUC across experiments #14

vincentrose88 commented Nov 4, 2021

jordansquair commented Nov 4, 2021

vincentrose88 commented Nov 4, 2021 •

edited

Loading

jordansquair commented Nov 4, 2021

vincentrose88 commented Nov 4, 2021

vincentrose88 commented Nov 5, 2021

vincentrose88 commented Nov 15, 2021

Very small number of trees affect on AUC and comparing AUC across experiments #14

Very small number of trees affect on AUC and comparing AUC across experiments #14

Comments

vincentrose88 commented Nov 4, 2021

AUC comparison across experiments?

Results

Number of trees affect on AUC

Results

Num_tree = 50

Num_tree = 10

Num_tree = 5

Num_tree = 3

Num_tree = 2

Num_tree = 1

jordansquair commented Nov 4, 2021

vincentrose88 commented Nov 4, 2021 • edited Loading

jordansquair commented Nov 4, 2021

vincentrose88 commented Nov 4, 2021

vincentrose88 commented Nov 5, 2021

vincentrose88 commented Nov 15, 2021

RNA (num_tree = 50)

Integrated (num_tree = 2)

vincentrose88 commented Nov 4, 2021 •

edited

Loading