Incorrect preparation of non-multiome data causes error in the step 'Calculating region to gene importance, using GMB method' #371
-
Hello, Thank you for the development of scenicplus and it's helpful documentation. I am trying to use scenicplus on non-multiome data that I have previously integrated by utilizing ArchR. By following the corresponding tutorials of scRNA and scATAC preprocessing I have reached the point of running scenicplus through the snakefile by altering accordingly the config.yaml file. More importantly, I made sure that the anndata and cistopic objects both contained a variable under the name 'ACC:RNA_barcodes' with the same cell_names based on the integration of the two modalities (in total 5956 cells). The pipeline is progressing smoothly until it reaches the point of calculating the region to gene importance, when it gives out the following error:
By trying to figure out what went wrong, I realized that the resulting ACC_GEX.h5mu file that should contain the two modalities is not prepared correctly as it seems to lack both the cell names and the expression/fragment matrices, as show here:
No error occurred during the step of preparing the non-multiome data, as all cells were found in both modalities, but there is a suspicious output during the procedure of ingestion as shown here:
Do the zeros next to the cell names mean that no metacells and no pseudo multi-ome data are created? I tried to run the pipeline with slight modifications in the anndata and cistopic objects but could not figure out the problem. Do you maybe have any idea on why this problem comes up? Thank you! I am using: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @AthanasiaSt In the case of non-multiome data the cell barcodes between RNA and ATAC are not matching. For that reason, we integrate the samples based on common cell type (or state) labels. This label is the variable you should provide in the In case you did the integration elsewhere and you have one-to-one cell barcode matches, as I believe is the case for you (?), you can run the analysis as if the data is multiome. However, in this case the cell barcodes of both modalities should have exactly the same names. I hope this helps? All the best, Seppe |
Beta Was this translation helpful? Give feedback.
Hi @AthanasiaSt
In the case of non-multiome data the cell barcodes between RNA and ATAC are not matching. For that reason, we integrate the samples based on common cell type (or state) labels. This label is the variable you should provide in the
yaml
file (where you providedACC:RNA_barcodes
). Based on these labels multiome data will be simulated by sampling cells from each label and for each modality.In case you did the integration elsewhere and you have one-to-one cell barcode matches, as I believe is the case for you (?), you can run the analysis as if the data is multiome. However, in this case the cell barcodes of both modalities should have exactly the same names.
I hope this helps?
…