Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a BAM file uses up all available CPU cores #1195

Closed
eldariont opened this issue Jul 24, 2019 · 21 comments
Closed

Reading a BAM file uses up all available CPU cores #1195

eldariont opened this issue Jul 24, 2019 · 21 comments
Assignees
Labels
bug faulty or wrong behaviour of code

Comments

@eldariont
Copy link
Collaborator

eldariont commented Jul 24, 2019

EDIT: Updated problem description on May 21

Hi,

while testing seqan/iGenVar on a real-world bam file I observed extreme CPU load caused by reading the BAM file.

Platform

  • SeqAn version: 3.0.3 (on branch release-3.0.3)
  • Operating system: Linux elcattivo.molgen.mpg.de 5.10.24.mx64.375 [INTERNAL] Renamed sequence file headers. #1 SMP Fri Mar 19 12:29:21 CET 2021 x86_64 GNU/Linux
  • Compiler: g++ (GCC) 7.5.0

Description

How to repeat the problem

Test input: download

The following minimal example should just open the given test BAM file and print all read names in the file. This is the code:

#include <seqan3/io/sam_file/input.hpp>

int main(int /**/, char ** argv)
{
    using my_fields = seqan3::fields<seqan3::field::id>;
    seqan3::sam_file_input sam_file{argv[1], my_fields{}};
    
    for (auto & record : sam_file)
    {
        std::string const query_name = record.id(); 
        std::cout << query_name << std::endl;
    }
}

Expected behaviour

When running this code, I would expect it to print the read names and produce approximately 100% CPU load.

Actual behaviour

The program is printing the read names alright but completely uses up all available CPUs (visible in htop or below).

/usr/bin/time -v ./build/test_bam_load test.bam

...
m64070_190824_163708/156762735/7917_22377
m64070_190824_163708/145687260/2901_5429
m64070_190824_163708/41091648/0_1505
m64070_190824_163708/112723090/7596_28230
        Command being timed: "./build/test_bam_load test.bam"
        User time (seconds): 50.48
        System time (seconds): 8.67
        Percent of CPU this job got: 8005%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.73
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 140488
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 55
        Minor (reclaiming a frame) page faults: 33988
        Voluntary context switches: 1194
        Involuntary context switches: 768499
        Swaps: 0
        File system inputs: 88
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Cheers
David

@eldariont eldariont added the bug faulty or wrong behaviour of code label Jul 24, 2019
@smehringer
Copy link
Member

Hi David,

thanks for reporting back to us so much! It really helps to get rid of a lot of bugs when actual use cases arise.

The heavy CPU load is expected, you are not doing anything wrong. Since our files behave as ranges, they have to buffer the first record as soon as they are initialized. For SAM/BAM this is only the header and the first line. For a genome file this probably reads in chromosome 1, which I guess is reasonable to take roughly 30s.

with both initializations: takes ~47s, 800% CPU load (I have an 8 core machine)

First, 800% is actually great :D thanks for reporting. You are probably reading in a BAM file? Because decompression is automatically multithreaded. In my (probably outdated) benchmarks, I only reached about 600% CPU load on 8 cores.

I am at a loss though why initializing both takes more resources than the individual commands, I can only speculate that this might be because of concurrent IO behaviour of the system.

@eldariont
Copy link
Collaborator Author

Hi Svenja,

thanks for your quick reply. Now it makes sense that the initialization of the sequence file takes a while. Thanks for the explanation.

The 800% CPU load are still weird in my opinion. When I print a debug message after each initialization, I see that the alignment file (yes, it's a BAM) is initialized in no time (as would be expected because header + first record are short). The initialization of the sequence file (a simple FASTA) takes ~45s and puts full load on all 8 cores. This cannot be explained by decompression because the FASTA is not compressed.

Even weirder is that if I change the order of the two initializations (i.e. the sequence file gets initialized first) the sequence file uses only 1 core and takes ~30s.

So it looks as if there is something strange happening when the alignment file is initialized before the sequence file. Can you reproduce this and do you have any idea what might be going on?

@smehringer
Copy link
Member

smehringer commented Jul 25, 2019

Hi David,

you are right this is unexpected behaviour.
I can reproduce your issue on GCC7 and GCC8:
(times are for GCC8)

int main(int /**/, char ** argv)
{
    sequence_file_input sin{argv[1]};
    alignment_file_input ain{argv[2]};
}

|

Input: BAM
24,93s user 0,20s system 100% cpu 24,989 total
25,50s user 0,19s system 100% cpu 25,464 total
25,38s user 0,18s system 100% cpu 25,343 total
int main(int /**/, char ** argv)
{
    alignment_file_input ain{argv[2]};
    sequence_file_input sin{argv[1]};
}
Input: BAM
226,65s user 143,37s system 726% cpu 50,914 total
225,79s user 146,36s system 730% cpu 50,975 total
228,34s user 146,28s system 734% cpu 50,985 total
Input: SAM
46,57s user 0,20s system 99% cpu 46,998 total
46,70s user 0,24s system 99% cpu 46,949 total
47,27s user 0,20s system 99% cpu 47,523 total
47,19s user 0,24s system 99% cpu 47,498 total

Scoping solves this:

int main(int /**/, char ** argv)
{
    {
        alignment_file_input ain{argv[2]};
    }
    sequence_file_input sin{argv[1]};
}
Input: BAM
24,79s user 0,20s system 100% cpu 24,762 total
25,41s user 0,25s system 100% cpu 25,415 total
25,16s user 0,19s system 100% cpu 25,125 total

GCC9

GCC9 seq before alignment Input: BAM
15,78s user 0,12s system 100% cpu 15,771 total
15,60s user 0,14s system 100% cpu 15,624 total
15,64s user 0,19s system 100% cpu 15,729 total
GCC9 alignment before seq Input: BAM
112,16s user 72,84s system 712% cpu 25,951 total
109,05s user 71,41s system 726% cpu 24,834 total
107,60s user 70,81s system 720% cpu 24,759 total
GCC9 **alignment before seq but scoped ** Input: BAM
15,00s user 0,15s system 100% cpu 15,036 total
14,98s user 0,13s system 100% cpu 14,999 total
15,04s user 0,16s system 100% cpu 15,080 total

This needs to be investigated further although I don't if I'll time for this.
I'll assign this to @h-2 who is back next week and wanted to work on IO anyway. 😁

@h-2
Copy link
Member

h-2 commented Aug 2, 2019

I have difficulties reproducing this, for me the program always immediately exits (as expected because only one record is buffered). Can you provide the files that trigger this behaviour?

@eldariont
Copy link
Collaborator Author

Hi Hannes,

I generated a tiny BAM file which triggers the behaviour on my Linux machine: https://ws.molgen.mpg.de/ws/635147/test.bam

As reference FASTA, I used

wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
gunzip hs37d5.fa.gz

I observe a difference between Release and Debug mode, though. In Debug mode (-DCMAKE_BUILD_TYPE=Debug), the CPU load is sustained much longer. In Relase mode, it takes only about one second.

@h-2 h-2 assigned rrahn and unassigned h-2 Sep 12, 2019
@h-2
Copy link
Member

h-2 commented Sep 12, 2019

@rrahn You said you would have a look at this? Or maybe @marehr can have a look?

@rrahn
Copy link
Contributor

rrahn commented Sep 12, 2019

@h-2 I said I look into the issues we have with the queue and see if it might be related. But there were other things that needed to be done before.

@marehr
Copy link
Member

marehr commented Sep 30, 2019

maybe related to #1081

@rrahn
Copy link
Contributor

rrahn commented Nov 29, 2019

@eldariont can you again give me the bam file. I am finally looking into it. Sorry for the long delay.

@rrahn
Copy link
Contributor

rrahn commented Nov 30, 2019

Ok, I cannot validate the issue anymore. Note, that in the meantime we already fixed the issue of loading the first record on initialisation of the file. Thus, with the first call of begin the first record is being fetched and not on construction of the file. Can you verify that this issue is also fixed for your case?

@marehr
Copy link
Member

marehr commented Nov 30, 2019

That means if you change the program to:

int main(int /**/, char ** argv)
{
    sequence_file_input sin{argv[1]};
    alignment_file_input ain{argv[2]};
    std::ranges::begin(sin);
    std::ranges::begin(ain);
}

This might be still a problem?

@eldariont
Copy link
Collaborator Author

@rrahn Sorry for the long delay. In case you still need the bam file: https://ws.molgen.mpg.de/ws/921352/test.bam

@eldariont
Copy link
Collaborator Author

@marehr Yes, I can still run into exactly the same behavior if I call begin() after initializing the files.

  • loading only the alignment file takes <0.1s
  • loading only the sequence file takes ~30s, 100% CPU load
  • loading both files regardless of their order takes ~47s, 800% CPU load

@rrahn
Copy link
Contributor

rrahn commented Dec 2, 2019

Ok thanks. I am looking into it then. With another bam file I couldn’t reproduce this.

Sent with GitHawk

@rrahn
Copy link
Contributor

rrahn commented Dec 3, 2019

Mhmm, ok I really can't reproduce this. Neither on macOS nor on linux.
I've compiled in release and debug mode. And I couldn't spot any difference (except of course longer running times, but they were also coherent).

Here are some timings in release mode

fasta then bam

Open seq file: 1 ms
Open bam file: 10 ms
Reading seq record: 462 ms
Reading bam record: 4 ms

real    0m0.488s
user    0m4.358s
sys     0m2.466s

bam then fasta

Open seq file: 1 ms
Open bam file: 12 ms
Reading bam record: 4 ms
Reading seq record: 450 ms

real    0m0.478s
user    0m4.594s
sys     0m2.233s

only fasta

Open seq file: 1 ms
Reading seq record: 500 ms

real    0m0.521s
user    0m0.457s
sys     0m0.052s

only bam

Open bam file: 9 ms
Reading bam record: 4 ms

real    0m0.030s
user    0m0.059s
sys     0m0.030s

@smehringer
Copy link
Member

smehringer commented Dec 12, 2019

I do not know what you are testing. But I can still reproduce this. 👀

Only seq : 0,28s  99% CPU chrono: 363ms
Only bam : 0,06s 291% CPU chrono: 12ms
Both s b : 2,44s 762% CPU chrono: 599ms

With this snippet:

#include <seqan3/argument_parser/all.hpp>   // includes all necessary headers
#include <seqan3/core/debug_stream.hpp>     // our custom output stream
#include <seqan3/std/filesystem>            // use std::filesystem::path
#include <seqan3/io/sequence_file/all.hpp>  // FASTA support
#include <seqan3/io/alignment_file/all.hpp> // SAM/BAM support

#include <chrono>

int main(int /**/, char ** argv)
{
    std::chrono::time_point<std::chrono::system_clock> start, end;

    start = std::chrono::system_clock::now();

    seqan3::sequence_file_input sin{argv[1]};
    seqan3::alignment_file_input ain{argv[2]};
    std::ranges::begin(sin);
    std::ranges::begin(ain);

    end = std::chrono::system_clock::now();

    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count() << std::endl;
}

the bam file from David and the fasta file /group/ag_abi/takifugu-data/SeqAn/homo_sapiens/chr10/chr10.fa

@marehr
Copy link
Member

marehr commented Apr 12, 2021

This might be fixed in the meantime? (Maybe fixed by #2455? Maybe releated by #2282?)

@eldariont eldariont changed the title Initializing one sequence and one alignment file takes 47s and uses all 8 CPU cores Reading a BAM file uses up all available CPU cores May 21, 2021
@eldariont
Copy link
Collaborator Author

eldariont commented May 21, 2021

This might be fixed in the meantime? (Maybe fixed by #2455? Maybe releated by #2282?)

@marehr @eseiler : Unfortunately, this problem still seems to persist. I didn't retry with FASTA files but observed the extreme CPU load in iGenVar when it is reading a BAM file. iGenVar actually consumes all available cores (>80) and I have no idea what it is doing with that compute power 😄

I updated the description of this issue at the top now with a new minimal example and a new test file.

@eseiler
Copy link
Member

eseiler commented May 21, 2021

This might be fixed in the meantime? (Maybe fixed by #2455? Maybe releated by #2282?)

@marehr @eseiler : Unfortunately, this problem still seems to persist. I didn't retry with FASTA files but observed the extreme CPU load in iGenVar when it is reading a BAM file. iGenVar actually consumes all available cores (>80) and I have no idea what it is doing with that compute power 😄

I updated the description of this issue at the top now with a new minimal example and a new test file.

BAM will use all threads by default. You should be able to change that via https://docs.seqan.de/seqan/3-master-user/cookbook.html#setting_compression_threads

So this is expected. The weird behaviour with also having a FASTA file was not expected.
We will at some point change the interface for configuring threads, and I'd like the default to not be all threads.

@eldariont
Copy link
Collaborator Author

So this is expected. The weird behaviour with also having a FASTA file was not expected.

Thanks, for some reason I forgot that using all threads is the default 🤦 Sorry, I should not have changed the issue title and description then.

However, I agree with you that using all cores should not be the default (particularly in a research setting where the machines have lots of cores). I think it would be more intuitive to use only one core by default. But in any case we need to set the number of threads in iGenVar so that I can run it on one our servers without annoying other users 😄

@SGSSGene
Copy link
Contributor

This is fixed in #2911 the new default is 4 threads if not otherwise specified.
Thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug faulty or wrong behaviour of code
Projects
None yet
Development

No branches or pull requests

7 participants