-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using salmon quantification with DeSeq2 #437
Comments
Hi Hamdi, http://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html has a really great description of how to use the output from Salmon with DESeq2. |
Hi @roryk If it is not possible and I have run R to get the count matrix for DEseq2, I can figure out a way to do it. DESeq2 input file is a simple matrix of counts and "salmon quantmerge" already generates this, can you please explain to me why an external library is required ? Is there something I am missing that tximport package is doing to the data? Does tximport takes into account gene lengths or library size to generate the output? Thanks |
Hi Hamdi Bit confused about your logic here - why would you not want to use tximport in R when your next step (DESeq2) is still going to be R? I am curious to know your reasoning
|
This is a really torturous step for none R knowledge to use tximport for data transmission to DESeq2. Main problem is that the count matrix for DESeq2, which can be easily prepared by Python, must be integer, but tximport did not explain how to deal with the decimals. |
I also wonder why salmon not output original reads counts update: Because it is more accurate. DeSeq2 should accept it. As for now, maybe we could simply round to the nearest integer
|
If you can already use DESeq2, then using tximport should not make it any harder at all. Given the tximport data, getting it into DESeq2 is as easy as
as shown in the tximport vignette. Regarding outputting "original read counts"; salmon does output the estimates for the number of reads deriving from each transcript. If the question is, why is this number not an integer, that's because the best estimate (the maximum likelihood estimate) is often not integral. Tools that simply count reads (e.g. HTSeq) produce integer counts, but these are in no way "original read counts" for the corresponding genes, and are usually less accurate (farther from the true number of fragments deriving from a transcript / gene) than the estimates produced by salmon. The fact that the best estimate is often not an integer is a direct result of the fact one is considering a statistical model and taking expectations. |
Still maybe it's better to have an integer version of read counts file. |
You mean like cloud services to perform the DE analysis? It’s always possible to round the non-integer counts to the nearest integer. However, reliable abundance estimation tools (e.g. RSEM) have been around long enough now that it’s worth pushing any cloud service you might be using to properly deal with these types of inputs. We do differential analysis quite commonly with DESeq2, and salmon -> tximport -> DESeq2 is a quite low-friction solution. |
I am also confused about how to use salmon quantification for DeSeq2, because my further use of DeSeq2 is on an online tool, not r. I would be very grateful if someone could answer my doubts. |
Hi authentic-zz: Hope give you some help. |
Hi DustinChen1986: |
To authentic-zz:
|
@DustinChen1986 |
The first column is the transcript's names, the second column is the gene's names, and the header is required |
Thank you very much,it helps me a lot! @DustinChen1986 |
I noticed that now salmon can export the quant.gene.sf file if I add the parameters"-g xx.gtf". What's difference between this file and the result of tximport? Can I use the result to replace tximport? |
It is possible to output gene counts directly, but using |
Hi,
I run salmon then used quantmerge to combine the results as
salmon quantmerge --quants
cat list_of_quant_folders--column numreads -o Merged_quants.txt
Can I use this as input for the DeSeq2 analysis ? One problem is "numreads" column is not integer and DeSeq2 requires integer input (read counts) can I convert the numbers in this column to integer then use as DeSeq2 input?
Thanks
Hamdi
The text was updated successfully, but these errors were encountered: