adata.raw overwritten with normalized counts in SCENIC+ pipeline #484

cjiang310437 · 2024-10-17T19:55:19Z

Hello,

First of all, thank you for developing such a powerful and intuitive tool.

Describe the bug
According to the documentation, the RNA count input for the SCENIC+ pipeline should consist of raw counts, which are expected to be stored in the adata.raw slot. However, after following the tutorial's steps, I observed that the adata.raw slot seems to be overwritten with normalized counts instead of retaining the raw counts.

Here are the key details:

I confirmed that my raw count matrix was correctly loaded into the AnnData object initially.
After running the normalization steps as described in the tutorial, I noticed that adata.raw now contains the normalized data, not the raw counts.
This appears to contradict the documentation, which specifies that the adata.raw slot should contain raw counts and that these should be used as input for the SCENIC+ pipeline.

Additionally, I tested running the pipeline using both raw and normalized RNA counts, and the results were significantly different. The results generated using normalized counts seem more promising. Could you kindly clarify which input (raw or normalized counts) is appropriate for running the SCENIC+ pipeline? It would also be helpful to understand why one input should be preferred over the other and how this impacts the pipeline results.

I appreciate your guidance and look forward to your response. Thank you again for your continued efforts in developing and maintaining this tool.

To Reproduce

adata.raw = adata
print(adata.raw.X.max())
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
print(adata.raw.X.max())

Error output
1384.0
6.714874659931793

Expected behavior
The expected is that adata.raw before and after normalization should be the same.

Screenshots

Version:

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
Scanpy version: 1.8.2
SCENIC+ version: 1.0a1

The text was updated successfully, but these errors were encountered:

Neofita22 · 2024-10-23T04:18:49Z

First, thank you @SeppeDeWinter and Aertslab for your amazing tool!

I am wondering myself the same question. Also, I was analyzing my gene expression matrix data before and after performing Simulation Perturbation. What I have noticed is that before I run the Perturbation, the data I obtain is normalized data. I thought that for this analysis, raw data would also be included and that the function would internally normalize and/or transform it:

raw_data = adata.to_df()

gex_scenic = scplus_mdata["scRNA_counts"].to_df()

The perturbed matrices take this data as a basis for their simulation analysis, and the perturbed data, I think already appears normalized, like this:

simulation_scenic = perturbation_over_iter[5]

So, I am not sure if this (starting the analysis with normalized data) is significantly altering the Perturbation analysis by already using normalized data? Or is it fine to use it this way? I dont have big experience analyzing data, especially knockouts, but any information would be very useful.
Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adata.raw overwritten with normalized counts in SCENIC+ pipeline #484

adata.raw overwritten with normalized counts in SCENIC+ pipeline #484

cjiang310437 commented Oct 17, 2024

Neofita22 commented Oct 23, 2024

adata.raw overwritten with normalized counts in SCENIC+ pipeline #484

adata.raw overwritten with normalized counts in SCENIC+ pipeline #484

Comments

cjiang310437 commented Oct 17, 2024

Neofita22 commented Oct 23, 2024