-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MACS2 stalls pre-computing pvalue-qvalue table.. #101
Comments
Is it related to #86? |
I confirm there is still a problem in current master (9ddacb0). Using the following SAM input file:
and the following command line:
|
I found this thread when troubleshooting run times for jobs that I had running, where MACS2 was part of an analysis pipeline to process ChIPseq data. I looked at the error files as the jobs had been running for several days, which was odd. Tail of the error files showed the jobs were hung at the same step in MACS2 reported here - issue #101 and #86. INFO @ Fri, 29 Jul 2022 10:03:32: #3 Call peaks... I've tried using macs2 --version 2.2.6 and 2.2.7.1 The odd part is that when I ran MACS2 with all default parameters the jobs finished in a few hours. However, when I add the the following parameters the program is hung at the indicated steps: or --nomodel --extsize 147 --broad --broad-cutoff 0.1 Single threads being used seem to be computing efficiently at 100%, but the jobs simply spin for up to 7 days. These run times seem way too long, as file read in for macs2 is super fast - 8.5G BAM files load and are read in (steps 1-3) ~ 10 minutes. So, this seems like a bug rather than a compute environment issues. Do you have suggestions on potential fixes? One other note - I recently used the same versions of MACS2 on the same compute system on ATACseq data and used similar --no-model parameters, but only had treatment files, no control input files. These jobs used even larger files and finished within 12-24 hours. |
A follow up regarding this issue - I found that I get different results with different files. I tested pre-processed BED files from SRA - also ChIPseq data, and macs2 or macs3 completed peak calling in about 30 minutes, no issues. However, I downloaded the raw data from SRA and processed with my pipeline and I get the same issue - the process hangs at Pre-compute pvalue-quvalue table.. My pipeline uses trimmomatic to trim illimina adaptors, hisat2 alignment (-p 8 -k 1 --no-spliced-alignment), bamsormadup to remove duplicates and sort bam output. macs2 doesn't throw and error, but just hangs. Any ideas would be much appreciated. |
@jcohn42 Is it possible that the raw data processed from SRA contains many random chromosomes? The process hangs at p-q table calculation is an old issue I dealt with years ago where when the hash table (we used |
@taoliu, yes that is right, the SRA data does have many chromosomes (10 maize chromosomes and a few dozed contigs). They are present in the pre-processed BED files downloaded from SRA, I used the same reference genome for aligning the raw data using hisat2, so the bam files resulting have the same chromosomes. The only major difference I noted is that the external BED files seem to have been pre-filtered for alignment quality; $cut -5 |sort -u gives single value of 255. hisat2 uses different quality measures, but I could try filtering on 60. I don't think macs2 uses the quality information, though, is that right? I've tried macs2 and macs3 - both get hung on the same step. I agree, q values are great to have, but one can filter on p values, or use IDR or combine replicates with MSPC, etc. Since my original data was PE I also tried -f BAMPE and -f BAM on the deduplicated BAM files, but both get stuck at the same place. I could try to filter input files for only 'real' chromosome data, but I've already tried filtering out a single chromosome of data for troubleshooting, and I have the same issue. |
@jcohn42 Could you share me the SRA accession number so that I can give it a try? |
@taoliu The files come from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA492464 Here is my macs3 call with the processed BED files that worked beautifully, completing within 30 minutes FASTQ files were aligned to B73_v4 genome, an internal copy, but should be very similar to B73 RefGen_v4, AGPv4, available from here: https://www.maizegdb.org/assembly. Note, there is a newer genome, but this was the genome used to generate the processed files tested. I used the same macs3 call, with the change "-f BAM" and -t <deduplicated, sorted bam files for test> -c <deduplicated, sorted bam file for input". No errors, but still paused on #3 Pre-compute pvalue-qvalue table..., running now ~ 20 hours. Do you have recommendations for buffer size, # slots, node parameters, etc? I've requested > 10G RAM, which should be fine, I think? |
@taoliu, can you clarify if the problem you described was 'fixed' "The process hangs at p-q table calculation is an old issue I dealt with years ago where when the hash table (we used khash) becomes too big." What was the fix, is there some work around, like adjusting input file sizes, processing one chromosome at a time, etc, that I could use to get MACS2 (or MACS3) working for my data? Are there particular filters of aligned reads that should be used? Thanks for your recommendations and help. |
@taoliu I discovered the problem - it was an issue with my genome size input - a typo! Thanks again for your replies; btw, MACS3 is also working quite well with my data and the external data. Thanks for providing a very useful tool! |
Hello,
MACS2 stalls at the following line:
I have two BAM files, one converted to BAM from Bowtie2 output and the other which has underwent some processing. Here is an example alignment from the raw BAM file:
and here is an example from the processed BAM file:
MACS2 stalls when I input the processed BAM file but not the raw file. The processed BAM file was created by converting the raw BAM to a BED file, processing the data with AWK, and then converting back to a BAM file (both conversions done using BEDtools).
I call peaks using the following command:
Thank you
The text was updated successfully, but these errors were encountered: