Using salmon quantification with DeSeq2 #437

thkitapci · 2019-10-22T18:39:52Z

Hi,
I run salmon then used quantmerge to combine the results as

salmon quantmerge --quants cat list_of_quant_folders--column numreads -o Merged_quants.txt

Can I use this as input for the DeSeq2 analysis ? One problem is "numreads" column is not integer and DeSeq2 requires integer input (read counts) can I convert the numbers in this column to integer then use as DeSeq2 input?

Thanks
Hamdi

The text was updated successfully, but these errors were encountered:

roryk · 2019-10-22T19:07:08Z

Hi Hamdi,

http://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html has a really great description of how to use the output from Salmon with DESeq2.

thkitapci · 2019-10-22T23:42:48Z

Hi @roryk
I know about tximport but is there any way to generate the input data for DESeq2 without using R? I am processing the data on one platform and then transfer to another platform for R/DESeq2 analysis. I would like to be able to generate the output of the first part (salmon) without using an R library.

If it is not possible and I have run R to get the count matrix for DEseq2, I can figure out a way to do it.

DESeq2 input file is a simple matrix of counts and "salmon quantmerge" already generates this, can you please explain to me why an external library is required ? Is there something I am missing that tximport package is doing to the data? Does tximport takes into account gene lengths or library size to generate the output?

Thanks
Best Regards
Hamdi

tamuanand · 2019-11-03T22:25:22Z

Hi Hamdi

Bit confused about your logic here - why would you not want to use tximport in R when your next step (DESeq2) is still going to be R?

I am curious to know your reasoning

I am processing the data on one platform and then transfer to another platform for R/DESeq2 analysis. I would like to be able to generate the output of the first part (salmon) without using an R library.

DustinChen1986 · 2020-05-22T04:52:02Z

This is a really torturous step for none R knowledge to use tximport for data transmission to DESeq2. Main problem is that the count matrix for DESeq2, which can be easily prepared by Python, must be integer, but tximport did not explain how to deal with the decimals.

qins · 2020-12-25T06:49:54Z

I also wonder why salmon not output original reads counts

update:

Because it is more accurate.

DeSeq2 should accept it.

As for now, maybe we could simply round to the nearest integer

NumReads — This is salmon’s estimate of the number of reads mapping to each transcript that was quantified. It is an “estimate” insofar as it is the expected number of reads that have originated from each transcript given the structure of the uniquely mapping and multi-mapping reads and the relative abundance estimates for each transcript.

rob-p · 2020-12-25T06:57:20Z

If you can already use DESeq2, then using tximport should not make it any harder at all. Given the tximport data, getting it into DESeq2 is as easy as

dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)

as shown in the tximport vignette.

Regarding outputting "original read counts"; salmon does output the estimates for the number of reads deriving from each transcript. If the question is, why is this number not an integer, that's because the best estimate (the maximum likelihood estimate) is often not integral. Tools that simply count reads (e.g. HTSeq) produce integer counts, but these are in no way "original read counts" for the corresponding genes, and are usually less accurate (farther from the true number of fragments deriving from a transcript / gene) than the estimates produced by salmon. The fact that the best estimate is often not an integer is a direct result of the fact one is considering a statistical model and taking expectations.

qins · 2020-12-25T07:09:01Z

Still maybe it's better to have an integer version of read counts file.
There are cloud services and they do not accept non-integer read counts.

rob-p · 2020-12-25T07:20:33Z

You mean like cloud services to perform the DE analysis? It’s always possible to round the non-integer counts to the nearest integer. However, reliable abundance estimation tools (e.g. RSEM) have been around long enough now that it’s worth pushing any cloud service you might be using to properly deal with these types of inputs. We do differential analysis quite commonly with DESeq2, and salmon -> tximport -> DESeq2 is a quite low-friction solution.

authentic-zz · 2021-02-20T15:35:12Z

I am also confused about how to use salmon quantification for DeSeq2, because my further use of DeSeq2 is on an online tool, not r. I would be very grateful if someone could answer my doubts.

DustinChen1986 · 2021-02-20T16:42:34Z

Hi authentic-zz:
I have succeeded to use tximport package to transmit salmon results to DESeq2.
Now I have 6 salmon results named T11.sf, T12.sf, T13.sf，T21.sf，T22.sf，T23.sf. And I have the transcripts and genes relationship table file trans2gene.csv (each line have one transcript and gene separated by tab).
library(DESeq2)
library("tximport")
library("readr")
tx2gene <- read_csv("trans2gene.csv")
samples <- data.frame(run = c('T11', 'T12', 'T13', 'T21', 'T22', 'T23'), condition = c("T1","T1","T1","T2","T2","T2"))
files <- file.path('.', paste(substring(samples$run, 1,3),".sf", sep = "")) # set salmon result files, each sample names plus '.sf'
names(files) <- samples$run
txi <- tximport(files, type="salmon", tx2gene=tx2gene)
head(txi$counts)
dds <- DESeqDataSetFromTximport(txi, colData=samples, design= ~ condition)
dds <- DESeq(dds)
res <- results(dds)
write.table(res,"T1_2.csv", sep = ",", row.names = TRUE)

Hope give you some help.

authentic-zz · 2021-02-21T05:34:47Z

Hi DustinChen1986：
Thank you very much for your useful help, but the problem I am currently experiencing is rather special.
The data in my hand is quantified by salmon and sorted into a text file (row name is the ID of the gene, and the column name is the sample name), which seems to be different from the file input by tximport package, which is a troublesome thing.
If you have a corresponding solution, I would be very grateful.

DustinChen1986 · 2021-02-22T02:10:44Z

To authentic-zz:
You mean you have get the counts into a data frame. I don't know how to transfer the salmon data frame into DESeq2，and I fail to handle data frame from salmon, too. So I have these recommendations to you:

Get the salmon raw results, the sf files, as the tximport input files.
Run the salmon again or choose other pipeline like HISAT2-Stringtie/featurecounts/HTseq.
Try to generate the salmon sf file. The sf file's row name is "Name Length EffectiveLength TPM NumReads". You have the gene ID and counts, so fill the length, effect length, and TPM by tab or something. But I do not sure if this handle is correct. https://salmon.readthedocs.io/en/latest/file_formats.html#fileformats

Yang-amd · 2022-11-16T17:17:43Z

@DustinChen1986
Hello DustinChen1986
Your codes are clean and easy to read,while I have a problem.How can I create "transcript2Gene.csv“?
Thank you for your patient.
Yang RT

DustinChen1986 · 2022-11-17T15:11:15Z

@DustinChen1986 Hello DustinChen1986 Your codes are clean and easy to read,while I have a problem.How can I create "transcript2Gene.csv“? Thank you for your patient. Yang RT

The first column is the transcript's names, the second column is the gene's names, and the header is required

Yang-amd · 2022-11-18T11:13:49Z

@DustinChen1986 Hello DustinChen1986 Your codes are clean and easy to read,while I have a problem.How can I create "transcript2Gene.csv“? Thank you for your patient. Yang RT

The first column is the transcript's names, the second column is the gene's names, and the header is required

Thank you very much,it helps me a lot! @DustinChen1986

caodudu · 2023-05-04T16:29:22Z

You mean like cloud services to perform the DE analysis? It’s always possible to round the non-integer counts to the nearest integer. However, reliable abundance estimation tools (e.g. RSEM) have been around long enough now that it’s worth pushing any cloud service you might be using to properly deal with these types of inputs. We do differential analysis quite commonly with DESeq2, and salmon -> tximport -> DESeq2 is a quite low-friction solution.

I noticed that now salmon can export the quant.gene.sf file if I add the parameters"-g xx.gtf". What's difference between this file and the result of tximport? Can I use the result to replace tximport?

rob-p · 2023-05-04T16:43:37Z

It is possible to output gene counts directly, but using tximport is the preferred and officially supported method. The reason for this is that the gene-level aggregation built into salmon is per-sample, that is each sample is aggregated to the gene level independently. On the other hand, tximport considers all samples in an experiment to determine e.g. the average expressed length of a gene over all samples. This leads to better aggregation. Likewise, tximport (specifically via tximeta) provides a nice interface to track and propagate reference annotation provenance, which ultimately leads to more reproducible analyses.

davidecarlson mentioned this issue Jan 28, 2025

File "salmon.merged.gene_counts.tsv" Contains Floating-Point Values Instead of Raw Integer Counts nf-core/rnaseq#1491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using salmon quantification with DeSeq2 #437

Using salmon quantification with DeSeq2 #437

thkitapci commented Oct 22, 2019

roryk commented Oct 22, 2019

thkitapci commented Oct 22, 2019

tamuanand commented Nov 3, 2019

DustinChen1986 commented May 22, 2020

qins commented Dec 25, 2020 •

edited

Loading

rob-p commented Dec 25, 2020

qins commented Dec 25, 2020 •

edited

Loading

rob-p commented Dec 25, 2020

authentic-zz commented Feb 20, 2021

DustinChen1986 commented Feb 20, 2021

authentic-zz commented Feb 21, 2021

DustinChen1986 commented Feb 22, 2021

Yang-amd commented Nov 16, 2022

DustinChen1986 commented Nov 17, 2022

Yang-amd commented Nov 18, 2022

caodudu commented May 4, 2023

rob-p commented May 4, 2023

Using salmon quantification with DeSeq2 #437

Using salmon quantification with DeSeq2 #437

Comments

thkitapci commented Oct 22, 2019

roryk commented Oct 22, 2019

thkitapci commented Oct 22, 2019

tamuanand commented Nov 3, 2019

DustinChen1986 commented May 22, 2020

qins commented Dec 25, 2020 • edited Loading

rob-p commented Dec 25, 2020

qins commented Dec 25, 2020 • edited Loading

rob-p commented Dec 25, 2020

authentic-zz commented Feb 20, 2021

DustinChen1986 commented Feb 20, 2021

authentic-zz commented Feb 21, 2021

DustinChen1986 commented Feb 22, 2021

Yang-amd commented Nov 16, 2022

DustinChen1986 commented Nov 17, 2022

Yang-amd commented Nov 18, 2022

caodudu commented May 4, 2023

rob-p commented May 4, 2023

qins commented Dec 25, 2020 •

edited

Loading

qins commented Dec 25, 2020 •

edited

Loading