Skip to content

Commit

Permalink
Merge branch 'release/1.0.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
David Jones committed Feb 26, 2018
2 parents b3f16cf + a907f0b commit 7629539
Show file tree
Hide file tree
Showing 37 changed files with 489 additions and 66 deletions.
8 changes: 8 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# CHANGES

## 1.0.0

- First full release
- Add overlapping reads support to `bam2bw`, `bam2bwbases` and `bam2bedgraph` via commandline flag `--overlap` `-a`
- If the same readname is encountered twice at a position it is considred an overlapping read pair.
- Where the same base [ACGT] was encountered on each of the reads it will only be counted once. If a different base was encountered then the coverage count is incremented once for each base.
- Updated License headers with new email address

## 0.5.0

- Update to HTSlib 1.5 (for consistency across tools)
Expand Down
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This Code of Conduct applies both within project spaces and in public spaces whe

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at cgpit@sanger.ac.uk. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at cgphelp@sanger.ac.uk. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

Expand Down
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,20 @@ Package of C scripts for generation of [BigWig][BigWig] coverage files
|-----------------------------------------------|----------------------------------------------|
| [![Master Badge][travis-master]][travis-base] | [![Dev Badge][travis-develop]][travis-base] |

## Contents

- [cgpBigWig](#cgpbigwig)
- [Contents](#contents)
- [Installation](#installation)
- [Docker](#docker_and_singularity)
- [Programs](#programs)
- [bwcat](#bwcat)
- [bwjoin](#bwjoin)
- [bam2bw](#bam2bw)
- [bg2bw](#bg2bw)
- [bam2bwbases](#bam2bwbases)
- [bam2bedgraph](#bam2bedgraph)
- [License](#license)
<!-- TOC depthFrom:2 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->

- [Installation](#installation)
- [Docker and Singularity](#docker-and-singularity)
- [Programs](#programs)
- [bwcat](#bwcat)
- [bwjoin](#bwjoin)
- [bam2bw](#bam2bw)
- [bg2bw](#bg2bw)
- [bam2bwbases](#bam2bwbases)
- [bam2bedgraph](#bam2bedgraph)
- [License](#license)

<!-- /TOC -->

## Installation

Expand Down Expand Up @@ -85,7 +85,7 @@ Other:
-v --version Prints the version number.
```

##### bam2bw
### bam2bw
Generate bw coverage file from bam
```
Usage: bam2bw -i input.[b|cr]am -o output.bw
Expand All @@ -99,6 +99,7 @@ Optional:
-c --region [file] A samtools style region (contig:start-stop) or a bed file of regions over which to produce the bigwig file
-z --include-zeroes Include zero coverage regions as additional entries to the bw file
-r --reference [file] Path to reference genome.fa file (required for cram if ref_path cannot be resolved)
-a --overlap Turn on overlaping reads check
Other:
-h --help Display this usage information.
Expand Down Expand Up @@ -129,6 +130,7 @@ bam2bwbases can be used to generate four bw files of per base proportions.
-i --input [file] Path to the input [b|cr]am file.
-F --filter [int] SAM flags to filter. [default: 4]
-o --outfile [file] Path to the output .bw file produced. Per base results wiillbe output as four bw files [ACGT].outputname.bw [default:'(null)']
-a --overlap Turn on overlaping reads check
Optional:
-c --region [file] A samtools style region (contig:start-stop) or a bed file of regions over which to produce the bigwig file
Expand All @@ -151,6 +153,8 @@ Create a BEDGraph file of genomic coverage. BAM file must be sorted.
Optional:
-r --region Region in bam to access.
-f --filter Ignore reads with the filter flags [int].
-a --overlap Turn on overlaping reads check
Other:
-h --help Display this usage information.
-v --version Prints the version number.
Expand All @@ -160,7 +164,7 @@ Other:
```
Copyright (c) 2017-2018 Genome Research Ltd.
Author: David Jones <cgpit@sanger.ac.uk>
Author: David Jones <cgphelp@sanger.ac.uk>
This file is part of cgpBigWig.
Expand Down
2 changes: 1 addition & 1 deletion VERSION.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.5.0
1.0.0
68 changes: 62 additions & 6 deletions c/bam2bedgraph.c
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/** LICENSE
* Copyright (c) 2016 Genome Research Ltd.
* Copyright (c) 2016-2018 Genome Research Ltd.
*
* Author: Cancer Genome Project cgpit@sanger.ac.uk
* Author: Cancer Genome Project cgphelp@sanger.ac.uk
*
* This file is part of cgpBigWig.
*
Expand Down Expand Up @@ -41,6 +41,9 @@ static char *input_file = NULL;
static char *output_file = NULL;
static char *region = NULL;
static int filter = 0;
uint8_t is_overlap = 0;

KHASH_MAP_INIT_STR(strh,uint8_t)

void print_usage (int exit_code){

Expand All @@ -51,6 +54,7 @@ void print_usage (int exit_code){
printf ("Optional:\n");
printf ("-r --region Region in bam to access.\n");
printf ("-f --filter Ignore reads with the filter flags [int].\n");
printf ("-a --overlap Use overlapping read check.\n\n");
printf ("Other:\n");
printf ("-h --help Display this usage information.\n");
printf ("-v --version Prints the version number.\n\n");
Expand All @@ -66,6 +70,7 @@ void options(int argc, char *argv[]){
{"region",required_argument,0,'r'},
{"filter",required_argument,0,'f'},
{"output",required_argument,0,'o'},
{"overlap", no_argument, 0, 'a'},
{"rna",no_argument,0, 'a'},
{ NULL, 0, NULL, 0}

Expand All @@ -75,7 +80,7 @@ void options(int argc, char *argv[]){
int iarg = 0;

//Iterate through options
while((iarg = getopt_long(argc, argv, "i:o:r:f:vh", long_opts, &index)) != -1){
while((iarg = getopt_long(argc, argv, "i:o:r:f:avh", long_opts, &index)) != -1){
switch(iarg){
case 'i':
input_file = optarg;
Expand All @@ -95,6 +100,11 @@ void options(int argc, char *argv[]){
print_usage(1);
}
break;

case 'a':
is_overlap = 1;
break;

case 'h':
print_usage(0);
break;
Expand Down Expand Up @@ -151,6 +161,47 @@ static int pileup_func(uint32_t tid, uint32_t position, int n, const bam_pileup1
return 0;
}

// callback for bam_plbuf_init()
static int pileup_func_overlap(uint32_t tid, uint32_t position, int n, const bam_pileup1_t *pl, void *data){
tmpstruct_t *tmp = (tmpstruct_t*)data;
int pos = (int)position;
int coverage = n;
int i;
khash_t(strh) *h;
khiter_t k;
h = kh_init(strh);
for (i=0;i<n;i++){ // Iterate through each pileup object
if (pl[i].is_del){
coverage--;
continue;
}
int absent;
//Testing overlapping reads
k = kh_put(strh, h, bam_get_qname(pl[i].b), &absent);
uint8_t cbase = bam_seqi(bam_get_seq(pl[i].b),pl[i].qpos);
uint8_t pre_b;
if(!absent){ //Read already processed to get base processed (we only increment if base is different between overlapping read pairs)
k = kh_get(strh, h, bam_get_qname(pl[i].b));
pre_b = kh_val(h,k);
}else{
//Add the value to the hash
kh_value(h, k) = cbase;
}
if(!absent && pre_b == cbase) coverage--; //Remove one from the total coverage if this is an overlap site
}

if (tmp->ltid != tid || tmp->lcoverage != coverage || pos > tmp->lpos+1) {
if (tmp->lpos > 0 && tmp->lcoverage > 0)
fprintf(tmp->out,"%s\t%d\t%d\t%d\n", tmp->head->target_name[tmp->ltid], tmp->lstart,tmp->lpos+1, tmp->lcoverage);
tmp->ltid = tid;
tmp->lstart = pos;
tmp->lcoverage = coverage;
}
tmp->lpos = pos;
kh_destroy(strh, h);
return 0;
}

int main(int argc, char *argv[]){
options(argc, argv);
tmpstruct_t tmp;
Expand All @@ -163,11 +214,17 @@ int main(int argc, char *argv[]){
check(out!=NULL,"Failed to open output file for %s writing.",output_file);
tmp.out = out;
int check = 0;
bw_func func = &pileup_func;
bw_func_reg func_reg = &pileup_func;
if(is_overlap == 1){
func = &pileup_func_overlap;
func_reg = &pileup_func_overlap;
}
if(region == NULL){
check = process_bam_file(input_file,pileup_func, &tmp,filter,NULL);
check = process_bam_file(input_file, func, &tmp, filter, NULL);
check(check==1,"Error parsing bam file.");
}else{
check = process_bam_region(input_file, pileup_func, &tmp, filter, region,NULL);
check = process_bam_region(input_file, func_reg, &tmp, filter, region, NULL);
check(check==1,"Error parsing bam region.");
}
fprintf(out,"%s\t%d\t%d\t%d\n", tmp.head->target_name[tmp.ltid], tmp.lstart,tmp.lpos+1, tmp.lcoverage);
Expand All @@ -183,4 +240,3 @@ int main(int argc, char *argv[]){
if(out) fclose(out);
return -1;
}

83 changes: 78 additions & 5 deletions c/bam2bw.c
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/** LICENSE
* Copyright (c) 2016 Genome Research Ltd.
* Copyright (c) 2016-2018 Genome Research Ltd.
*
* Author: Cancer Genome Project cgpit@sanger.ac.uk
* Author: Cancer Genome Project cgphelp@sanger.ac.uk
*
* This file is part of cgpBigWig.
*
Expand Down Expand Up @@ -40,6 +40,8 @@
#include "bam_access.h"
#include "utils.h"

KHASH_MAP_INIT_STR(strh,uint8_t)

char *out_file = "output.bam.bw";
char *input_file = NULL;
char *region_store = NULL;
Expand All @@ -50,6 +52,7 @@ int is_regions_file = 0;
uint8_t is_base = 0;
int filter = 4;
char base = 0;
uint8_t is_overlap = 0;
int include_zeroes = 0;
uint32_t single = 1;
char *last_contig = "";
Expand All @@ -63,7 +66,8 @@ void print_usage (int exit_code){
printf("Optional: \n");
printf("-c --region [file] A samtools style region (contig:start-stop) or a bed file of regions over which to produce the bigwig file\n");
printf("-z --include-zeroes Include zero coverage regions as additional entries to the bw file\n");
printf("-r --reference [file] Path to reference genome.fa file (required for cram if ref_path cannot be resolved)\n\n");
printf("-r --reference [file] Path to reference genome.fa file (required for cram if ref_path cannot be resolved)\n");
printf("-a --overlap Use overlapping read check\n\n");
printf ("Other:\n");
printf("-h --help Display this usage information.\n");
printf("-v --version Prints the version number.\n\n");
Expand Down Expand Up @@ -98,6 +102,7 @@ void setup_options(int argc, char *argv[]){
{"region",required_argument, 0, 'c'},
{"reference",required_argument, 0, 'r'},
{"include-zeroes",no_argument, 0, 'z'},
{"overlap", no_argument, 0, 'a'},
{"help", no_argument, 0, 'h'},
{"version", no_argument, 0, 'v'},
{ NULL, 0, NULL, 0}
Expand All @@ -108,7 +113,7 @@ void setup_options(int argc, char *argv[]){
int iarg = 0;

//Iterate through options
while((iarg = getopt_long(argc, argv, "F:i:o:c:r:zhv",long_opts, &index)) != -1){
while((iarg = getopt_long(argc, argv, "F:i:o:c:r:azhv",long_opts, &index)) != -1){
switch(iarg){
case 'F':
if(sscanf(optarg, "%i", &filter) != 1){
Expand Down Expand Up @@ -147,6 +152,9 @@ void setup_options(int argc, char *argv[]){
case 'z':
include_zeroes = 1;
break;
case 'a':
is_overlap = 1;
break;
case '?':
print_usage (1);
break;
Expand Down Expand Up @@ -205,6 +213,67 @@ static int pileup_func(uint32_t tid, uint32_t position, int n, const bam_pileup1
return 1;
}

// callback for bam_plbuf_init() for overlapping reads
static int pileup_func_overlap(uint32_t tid, uint32_t position, int n, const bam_pileup1_t *pl, void *data, uint32_t reg_start){
tmpstruct_t *tmp = (tmpstruct_t*)data;
int pos = (int)position;
int coverage = n;
int i;

khash_t(strh) *h;
khiter_t k;
h = kh_init(strh);

for (i=0;i<n;i++){
if (pl[i].is_del){
coverage--;
continue;
}
int absent;
//Testing overlapping reads
k = kh_put(strh, h, bam_get_qname(pl[i].b), &absent);
uint8_t cbase = bam_seqi(bam_get_seq(pl[i].b),pl[i].qpos);
uint8_t pre_b;
if(!absent){ //Read already processed to get base processed (we only increment if base is different between overlapping read pairs)
k = kh_get(strh, h, bam_get_qname(pl[i].b));
pre_b = kh_val(h,k);
}else{
//Add the value to the hash
kh_value(h, k) = cbase;
}
if(!absent && pre_b == cbase) coverage--; //Remove one from the total coverage if this is an overlap site
}

if((uint32_t)pos == reg_start-1){
tmp->ltid = tid;
tmp->lstart = pos;
tmp->lcoverage = coverage;
}
if (tmp->ltid != tid || tmp->lcoverage != coverage || pos > tmp->lpos+1) {
if (tmp->inczero == 1 || tmp->lcoverage > 0 ){
uint32_t start = tmp->lstart;
uint32_t stop = pos;
float cvg = (float)tmp->lcoverage;
if(tmp->lcoverage == 0 && tmp->ltid != tid-1 && tmp->ltid != tid){
tmp->ltid = tid;
}
int chck = bwAddIntervals(tmp->bwout,
&tmp->head->target_name[tmp->ltid],&start,&stop,&cvg,single);
check(chck==0,"Error adding bw interval %s:%"PRIu32"-%"PRIu32" = %f . Error code: %d",tmp->head->target_name[tmp->ltid],start,stop,cvg,chck);
}
//if(tmp->inczero == 1 && tmp->ltid != tid && pos != tmp->head->target_len[tmp->ltid]){
tmp->ltid = tid;
tmp->lstart = pos;
tmp->lcoverage = coverage;
}
tmp->lpos = pos;
kh_destroy(strh, h);
return 0;
error:
kh_destroy(strh, h);
return 1;
}

bigWigFile_t *initialise_bw_output(char *out_file, chromList_t *chromList){
//Open output file
bigWigFile_t *fp = bwOpen(out_file, NULL, "w");
Expand Down Expand Up @@ -360,9 +429,13 @@ int main(int argc, char *argv[]){
float cvg;
//Now we generate the bw info
int chck = 0;
bw_func_reg func_reg = &pileup_func;
if(is_overlap==1){
func_reg = &pileup_func_overlap;
}
int i=0;
for(i=0;i<no_of_regions;i++){
chck = process_bam_region(input_file, pileup_func, &tmp, filter, our_region_list[i], reference);
chck = process_bam_region(input_file, func_reg, &tmp, filter, our_region_list[i], reference);
check(chck==1,"Error parsing bam region.");
start = tmp.lstart;
stop = tmp.lpos+1;
Expand Down
Loading

0 comments on commit 7629539

Please sign in to comment.