Reading a BAM file uses up all available CPU cores #1195

eldariont · 2019-07-24T12:52:10Z

EDIT: Updated problem description on May 21

Hi,

while testing seqan/iGenVar on a real-world bam file I observed extreme CPU load caused by reading the BAM file.

Platform

SeqAn version: 3.0.3 (on branch release-3.0.3)
Operating system: Linux elcattivo.molgen.mpg.de 5.10.24.mx64.375 [INTERNAL] Renamed sequence file headers. #1 SMP Fri Mar 19 12:29:21 CET 2021 x86_64 GNU/Linux
Compiler: g++ (GCC) 7.5.0

Description

How to repeat the problem

Test input: download

The following minimal example should just open the given test BAM file and print all read names in the file. This is the code:

#include <seqan3/io/sam_file/input.hpp>

int main(int /**/, char ** argv)
{
    using my_fields = seqan3::fields<seqan3::field::id>;
    seqan3::sam_file_input sam_file{argv[1], my_fields{}};
    
    for (auto & record : sam_file)
    {
        std::string const query_name = record.id(); 
        std::cout << query_name << std::endl;
    }
}

Expected behaviour

When running this code, I would expect it to print the read names and produce approximately 100% CPU load.

Actual behaviour

The program is printing the read names alright but completely uses up all available CPUs (visible in htop or below).

/usr/bin/time -v ./build/test_bam_load test.bam

...
m64070_190824_163708/156762735/7917_22377
m64070_190824_163708/145687260/2901_5429
m64070_190824_163708/41091648/0_1505
m64070_190824_163708/112723090/7596_28230
        Command being timed: "./build/test_bam_load test.bam"
        User time (seconds): 50.48
        System time (seconds): 8.67
        Percent of CPU this job got: 8005%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.73
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 140488
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 55
        Minor (reclaiming a frame) page faults: 33988
        Voluntary context switches: 1194
        Involuntary context switches: 768499
        Swaps: 0
        File system inputs: 88
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Cheers
David

The text was updated successfully, but these errors were encountered:

smehringer · 2019-07-25T07:18:16Z

Hi David,

thanks for reporting back to us so much! It really helps to get rid of a lot of bugs when actual use cases arise.

The heavy CPU load is expected, you are not doing anything wrong. Since our files behave as ranges, they have to buffer the first record as soon as they are initialized. For SAM/BAM this is only the header and the first line. For a genome file this probably reads in chromosome 1, which I guess is reasonable to take roughly 30s.

with both initializations: takes ~47s, 800% CPU load (I have an 8 core machine)

First, 800% is actually great :D thanks for reporting. You are probably reading in a BAM file? Because decompression is automatically multithreaded. In my (probably outdated) benchmarks, I only reached about 600% CPU load on 8 cores.

I am at a loss though why initializing both takes more resources than the individual commands, I can only speculate that this might be because of concurrent IO behaviour of the system.

eldariont · 2019-07-25T08:48:26Z

Hi Svenja,

thanks for your quick reply. Now it makes sense that the initialization of the sequence file takes a while. Thanks for the explanation.

The 800% CPU load are still weird in my opinion. When I print a debug message after each initialization, I see that the alignment file (yes, it's a BAM) is initialized in no time (as would be expected because header + first record are short). The initialization of the sequence file (a simple FASTA) takes ~45s and puts full load on all 8 cores. This cannot be explained by decompression because the FASTA is not compressed.

Even weirder is that if I change the order of the two initializations (i.e. the sequence file gets initialized first) the sequence file uses only 1 core and takes ~30s.

So it looks as if there is something strange happening when the alignment file is initialized before the sequence file. Can you reproduce this and do you have any idea what might be going on?

smehringer · 2019-07-25T13:17:29Z

Hi David,

you are right this is unexpected behaviour.
I can reproduce your issue on GCC7 and GCC8:
(times are for GCC8)

int main(int /**/, char ** argv)
{
    sequence_file_input sin{argv[1]};
    alignment_file_input ain{argv[2]};
}

|

Input: BAM
24,93s user	0,20s system	100% cpu	24,989 total
25,50s user	0,19s system	100% cpu	25,464 total
25,38s user	0,18s system	100% cpu	25,343 total

int main(int /**/, char ** argv)
{
    alignment_file_input ain{argv[2]};
    sequence_file_input sin{argv[1]};
}

Input: BAM
226,65s user	143,37s system	726% cpu	50,914 total
225,79s user	146,36s system	730% cpu	50,975 total
228,34s user	146,28s system	734% cpu	50,985 total

Input: SAM
46,57s user	0,20s system	99% cpu	46,998 total
46,70s user	0,24s system	99% cpu	46,949 total
47,27s user	0,20s system	99% cpu	47,523 total
47,19s user	0,24s system	99% cpu	47,498 total

Scoping solves this:

int main(int /**/, char ** argv)
{
    {
        alignment_file_input ain{argv[2]};
    }
    sequence_file_input sin{argv[1]};
}

Input: BAM
24,79s user	0,20s system	100% cpu	24,762 total
25,41s user	0,25s system	100% cpu	25,415 total
25,16s user	0,19s system	100% cpu	25,125 total

GCC9

GCC9	seq before alignment	Input: BAM
15,78s user	0,12s system	100% cpu	15,771 total
15,60s user	0,14s system	100% cpu	15,624 total
15,64s user	0,19s system	100% cpu	15,729 total

GCC9	alignment before seq	Input: BAM
112,16s user	72,84s system	712% cpu	25,951 total
109,05s user	71,41s system	726% cpu	24,834 total
107,60s user	70,81s system	720% cpu	24,759 total

GCC9	alignment before seq but scoped	Input: BAM
15,00s user	0,15s system	100% cpu	15,036 total
14,98s user	0,13s system	100% cpu	14,999 total
15,04s user	0,16s system	100% cpu	15,080 total

This needs to be investigated further although I don't if I'll time for this.
I'll assign this to @h-2 who is back next week and wanted to work on IO anyway. 😁

h-2 · 2019-08-02T14:16:18Z

I have difficulties reproducing this, for me the program always immediately exits (as expected because only one record is buffered). Can you provide the files that trigger this behaviour?

eldariont · 2019-08-05T10:26:28Z

Hi Hannes,

I generated a tiny BAM file which triggers the behaviour on my Linux machine: https://ws.molgen.mpg.de/ws/635147/test.bam

As reference FASTA, I used

wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
gunzip hs37d5.fa.gz

I observe a difference between Release and Debug mode, though. In Debug mode (-DCMAKE_BUILD_TYPE=Debug), the CPU load is sustained much longer. In Relase mode, it takes only about one second.

h-2 · 2019-09-12T19:41:15Z

@rrahn You said you would have a look at this? Or maybe @marehr can have a look?

rrahn · 2019-09-12T20:41:10Z

@h-2 I said I look into the issues we have with the queue and see if it might be related. But there were other things that needed to be done before.

marehr · 2019-09-30T14:45:52Z

maybe related to #1081

rrahn · 2019-11-29T14:34:35Z

@eldariont can you again give me the bam file. I am finally looking into it. Sorry for the long delay.

rrahn · 2019-11-30T16:07:35Z

Ok, I cannot validate the issue anymore. Note, that in the meantime we already fixed the issue of loading the first record on initialisation of the file. Thus, with the first call of begin the first record is being fetched and not on construction of the file. Can you verify that this issue is also fixed for your case?

marehr · 2019-11-30T16:58:50Z

That means if you change the program to:

int main(int /**/, char ** argv)
{
    sequence_file_input sin{argv[1]};
    alignment_file_input ain{argv[2]};
    std::ranges::begin(sin);
    std::ranges::begin(ain);
}

This might be still a problem?

eldariont · 2019-12-02T12:38:27Z

@rrahn Sorry for the long delay. In case you still need the bam file: https://ws.molgen.mpg.de/ws/921352/test.bam

eldariont · 2019-12-02T12:55:22Z

@marehr Yes, I can still run into exactly the same behavior if I call begin() after initializing the files.

loading only the alignment file takes <0.1s
loading only the sequence file takes ~30s, 100% CPU load
loading both files regardless of their order takes ~47s, 800% CPU load

rrahn · 2019-12-02T13:21:23Z

Ok thanks. I am looking into it then. With another bam file I couldn’t reproduce this.

_{Sent with GitHawk}

rrahn · 2019-12-03T14:38:27Z

Mhmm, ok I really can't reproduce this. Neither on macOS nor on linux.
I've compiled in release and debug mode. And I couldn't spot any difference (except of course longer running times, but they were also coherent).

Here are some timings in release mode

fasta then bam

Open seq file: 1 ms
Open bam file: 10 ms
Reading seq record: 462 ms
Reading bam record: 4 ms

real    0m0.488s
user    0m4.358s
sys     0m2.466s

bam then fasta

Open seq file: 1 ms
Open bam file: 12 ms
Reading bam record: 4 ms
Reading seq record: 450 ms

real    0m0.478s
user    0m4.594s
sys     0m2.233s

only fasta

Open seq file: 1 ms
Reading seq record: 500 ms

real    0m0.521s
user    0m0.457s
sys     0m0.052s

only bam

Open bam file: 9 ms
Reading bam record: 4 ms

real    0m0.030s
user    0m0.059s
sys     0m0.030s

smehringer · 2019-12-12T12:43:49Z

I do not know what you are testing. But I can still reproduce this. 👀

Only seq : 0,28s  99% CPU chrono: 363ms
Only bam : 0,06s 291% CPU chrono: 12ms
Both s b : 2,44s 762% CPU chrono: 599ms

With this snippet:

#include <seqan3/argument_parser/all.hpp>   // includes all necessary headers
#include <seqan3/core/debug_stream.hpp>     // our custom output stream
#include <seqan3/std/filesystem>            // use std::filesystem::path
#include <seqan3/io/sequence_file/all.hpp>  // FASTA support
#include <seqan3/io/alignment_file/all.hpp> // SAM/BAM support

#include <chrono>

int main(int /**/, char ** argv)
{
    std::chrono::time_point<std::chrono::system_clock> start, end;

    start = std::chrono::system_clock::now();

    seqan3::sequence_file_input sin{argv[1]};
    seqan3::alignment_file_input ain{argv[2]};
    std::ranges::begin(sin);
    std::ranges::begin(ain);

    end = std::chrono::system_clock::now();

    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count() << std::endl;
}

the bam file from David and the fasta file /group/ag_abi/takifugu-data/SeqAn/homo_sapiens/chr10/chr10.fa

marehr · 2021-04-12T12:31:10Z

This might be fixed in the meantime? (Maybe fixed by #2455? Maybe releated by #2282?)

eldariont · 2021-05-21T14:03:13Z

This might be fixed in the meantime? (Maybe fixed by #2455? Maybe releated by #2282?)

@marehr @eseiler : Unfortunately, this problem still seems to persist. I didn't retry with FASTA files but observed the extreme CPU load in iGenVar when it is reading a BAM file. iGenVar actually consumes all available cores (>80) and I have no idea what it is doing with that compute power 😄

I updated the description of this issue at the top now with a new minimal example and a new test file.

eseiler · 2021-05-21T14:06:44Z

This might be fixed in the meantime? (Maybe fixed by #2455? Maybe releated by #2282?)

@marehr @eseiler : Unfortunately, this problem still seems to persist. I didn't retry with FASTA files but observed the extreme CPU load in iGenVar when it is reading a BAM file. iGenVar actually consumes all available cores (>80) and I have no idea what it is doing with that compute power 😄

I updated the description of this issue at the top now with a new minimal example and a new test file.

BAM will use all threads by default. You should be able to change that via https://docs.seqan.de/seqan/3-master-user/cookbook.html#setting_compression_threads

So this is expected. The weird behaviour with also having a FASTA file was not expected.
We will at some point change the interface for configuring threads, and I'd like the default to not be all threads.

eldariont · 2021-05-21T14:58:39Z

So this is expected. The weird behaviour with also having a FASTA file was not expected.

Thanks, for some reason I forgot that using all threads is the default 🤦 Sorry, I should not have changed the issue title and description then.

However, I agree with you that using all cores should not be the default (particularly in a research setting where the machines have lots of cores). I think it would be more intuitive to use only one core by default. But in any case we need to set the number of threads in iGenVar so that I can run it on one our servers without annoying other users 😄

SGSSGene · 2022-05-16T11:43:00Z

This is fixed in #2911 the new default is 4 threads if not otherwise specified.
Thanks for reporting!

eldariont added the bug faulty or wrong behaviour of code label Jul 24, 2019

smehringer assigned h-2 Jul 25, 2019

h-2 assigned rrahn and unassigned h-2 Sep 12, 2019

marehr assigned eseiler Apr 12, 2021

eldariont changed the title ~~Initializing one sequence and one alignment file takes 47s and uses all 8 CPU cores~~ Reading a BAM file uses up all available CPU cores May 21, 2021

eldariont mentioned this issue May 21, 2021

[FIX] Restrict number of decompression threads seqan/iGenVar#129

Closed

SGSSGene closed this as completed May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading a BAM file uses up all available CPU cores #1195

Reading a BAM file uses up all available CPU cores #1195

eldariont commented Jul 24, 2019 •

edited

Loading

smehringer commented Jul 25, 2019

eldariont commented Jul 25, 2019

smehringer commented Jul 25, 2019 •

edited

Loading

h-2 commented Aug 2, 2019

eldariont commented Aug 5, 2019

h-2 commented Sep 12, 2019

rrahn commented Sep 12, 2019

marehr commented Sep 30, 2019

rrahn commented Nov 29, 2019 •

edited

Loading

rrahn commented Nov 30, 2019

marehr commented Nov 30, 2019 •

edited

Loading

eldariont commented Dec 2, 2019

eldariont commented Dec 2, 2019

rrahn commented Dec 2, 2019

rrahn commented Dec 3, 2019 •

edited

Loading

smehringer commented Dec 12, 2019 •

edited

Loading

marehr commented Apr 12, 2021

eldariont commented May 21, 2021 •

edited

Loading

eseiler commented May 21, 2021 •

edited

Loading

eldariont commented May 21, 2021

SGSSGene commented May 16, 2022

Reading a BAM file uses up all available CPU cores #1195

Reading a BAM file uses up all available CPU cores #1195

Comments

eldariont commented Jul 24, 2019 • edited Loading

Platform

Description

How to repeat the problem

Expected behaviour

Actual behaviour

smehringer commented Jul 25, 2019

eldariont commented Jul 25, 2019

smehringer commented Jul 25, 2019 • edited Loading

GCC9

h-2 commented Aug 2, 2019

eldariont commented Aug 5, 2019

h-2 commented Sep 12, 2019

rrahn commented Sep 12, 2019

marehr commented Sep 30, 2019

rrahn commented Nov 29, 2019 • edited Loading

rrahn commented Nov 30, 2019

marehr commented Nov 30, 2019 • edited Loading

eldariont commented Dec 2, 2019

eldariont commented Dec 2, 2019

rrahn commented Dec 2, 2019

rrahn commented Dec 3, 2019 • edited Loading

fasta then bam

bam then fasta

only fasta

only bam

smehringer commented Dec 12, 2019 • edited Loading

marehr commented Apr 12, 2021

eldariont commented May 21, 2021 • edited Loading

eseiler commented May 21, 2021 • edited Loading

eldariont commented May 21, 2021

SGSSGene commented May 16, 2022

eldariont commented Jul 24, 2019 •

edited

Loading

smehringer commented Jul 25, 2019 •

edited

Loading

rrahn commented Nov 29, 2019 •

edited

Loading

marehr commented Nov 30, 2019 •

edited

Loading

rrahn commented Dec 3, 2019 •

edited

Loading

smehringer commented Dec 12, 2019 •

edited

Loading

eldariont commented May 21, 2021 •

edited

Loading

eseiler commented May 21, 2021 •

edited

Loading