Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store balsamic #551

Merged
merged 76 commits into from
Mar 25, 2020
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
bad7473
store-like-mip
patrikgrenfeldt Jan 24, 2020
ac07248
Test store
patrikgrenfeldt Jan 24, 2020
2ccbe94
store balsamic files in meta-file
patrikgrenfeldt Feb 12, 2020
e7f35b5
Merge branch 'master' into store-balsamic
patrikgrenfeldt Feb 12, 2020
2662751
remove unused argument
patrikgrenfeldt Feb 12, 2020
9d38d82
Merge branch 'store-balsamic' of https://github.com/Clinical-Genomics…
patrikgrenfeldt Feb 12, 2020
1634ef4
add missing required parameter
patrikgrenfeldt Feb 12, 2020
9215da1
move calls to store object
patrikgrenfeldt Feb 12, 2020
4bac8a2
change black pre-commit line length to what we use
patrikgrenfeldt Feb 13, 2020
ee8174b
Remove Python2 charset declaration. Format code
patrikgrenfeldt Feb 13, 2020
5ee1c81
Remove Python2 charset declaration
patrikgrenfeldt Feb 13, 2020
820d73d
Capitalize in log message
patrikgrenfeldt Feb 13, 2020
71889f2
add method description
patrikgrenfeldt Feb 13, 2020
7bef772
output with click instead of print
patrikgrenfeldt Feb 13, 2020
523278c
remove unused imports
patrikgrenfeldt Feb 13, 2020
2770128
remove unneccesary whitespace
patrikgrenfeldt Feb 13, 2020
abc5b5c
describe methods
patrikgrenfeldt Feb 13, 2020
418673f
Merge branch 'master' into store-balsamic
patrikgrenfeldt Feb 17, 2020
6825863
use existing case instead of refetching
patrikgrenfeldt Feb 19, 2020
b471db6
remove hard coded dummy values
patrikgrenfeldt Feb 19, 2020
3617120
Merge branch 'master' into store-balsamic
patrikgrenfeldt Feb 27, 2020
c9805c5
remove unused assignment
patrikgrenfeldt Feb 27, 2020
c2f1bb1
add missing argument in call
patrikgrenfeldt Feb 27, 2020
f6dee12
make mock behave more like real thing
patrikgrenfeldt Feb 27, 2020
c526de9
store root_dir in hk_api
patrikgrenfeldt Feb 27, 2020
a33836d
fix string
patrikgrenfeldt Feb 27, 2020
642aa90
handle directories and multiple tags
patrikgrenfeldt Mar 20, 2020
55c6f42
Merge branch 'master' into store-balsamic
patrikgrenfeldt Mar 20, 2020
685e806
Merge with master
patrikgrenfeldt Mar 20, 2020
d6c118d
clarify the priority mapping
patrikgrenfeldt Mar 23, 2020
dd761d0
simplify test
patrikgrenfeldt Mar 23, 2020
e4d6471
remove debug print
patrikgrenfeldt Mar 23, 2020
3a3740b
move test helpers to a module
patrikgrenfeldt Mar 23, 2020
28067a0
fix import of store_helpers
patrikgrenfeldt Mar 23, 2020
1ae475f
remove redundant test setup code
patrikgrenfeldt Mar 23, 2020
3770d5a
fix store_helper module path
patrikgrenfeldt Mar 23, 2020
8b60548
remove redundant import
patrikgrenfeldt Mar 23, 2020
74e94d9
fix import order
patrikgrenfeldt Mar 23, 2020
1c6a013
Add docstring
patrikgrenfeldt Mar 23, 2020
f5d2153
linitng
patrikgrenfeldt Mar 23, 2020
54b485b
linting
patrikgrenfeldt Mar 23, 2020
31e0a76
Merge branch 'master' into store-balsamic
patrikgrenfeldt Mar 23, 2020
5187a59
skip conversion to path
patrikgrenfeldt Mar 23, 2020
16e1b8f
Merge branch 'store-balsamic' of https://github.com/Clinical-Genomics…
patrikgrenfeldt Mar 23, 2020
29c3178
use full path
patrikgrenfeldt Mar 23, 2020
027cccb
rework the path and tag parsing
patrikgrenfeldt Mar 24, 2020
b120d21
return actual compressed filename
patrikgrenfeldt Mar 24, 2020
e78476e
fix broken test
patrikgrenfeldt Mar 24, 2020
0e29ae9
fix path to generated .hk file
patrikgrenfeldt Mar 24, 2020
e1d0726
fix path to generated .hk file
patrikgrenfeldt Mar 24, 2020
5d3188d
fix deliverables file path
patrikgrenfeldt Mar 24, 2020
755a033
Merge branch 'master' into store-balsamic
patrikgrenfeldt Mar 24, 2020
8dbbefb
merge with master
patrikgrenfeldt Mar 24, 2020
e9c9c3b
Protect the store in housekeeper from direct usage
patrikgrenfeldt Mar 24, 2020
68fea14
format code
patrikgrenfeldt Mar 24, 2020
164e476
linting
patrikgrenfeldt Mar 24, 2020
44b4c13
linting
patrikgrenfeldt Mar 24, 2020
585e94f
check black with our decided linelength
patrikgrenfeldt Mar 24, 2020
f8ca62d
format code
patrikgrenfeldt Mar 24, 2020
a4f1f24
call fixture what it is
patrikgrenfeldt Mar 25, 2020
279b8c0
test magic __getattr__
patrikgrenfeldt Mar 25, 2020
b3b02f4
Update cg/cli/workflow/balsamic/base.py
patrikgrenfeldt Mar 25, 2020
e45ef80
simplify creation of balsamic command
patrikgrenfeldt Mar 25, 2020
4530b17
Merge branch 'store-balsamic' of https://github.com/Clinical-Genomics…
patrikgrenfeldt Mar 25, 2020
4c7b169
create the deliverables file path in one way only
patrikgrenfeldt Mar 25, 2020
ab1b7ec
simplify balsamic command
patrikgrenfeldt Mar 25, 2020
539af7a
restore unsecure call
patrikgrenfeldt Mar 25, 2020
259b4bc
Merge branch 'store-balsamic' of https://github.com/Clinical-Genomics…
patrikgrenfeldt Mar 25, 2020
2cc3810
Merge branch 'master' into store-balsamic
patrikgrenfeldt Mar 25, 2020
656223a
remove erronous usage of store on API
patrikgrenfeldt Mar 25, 2020
54a7fdd
forward all arguments to wrapped add_commit
patrikgrenfeldt Mar 25, 2020
596b6d1
fix docstring
patrikgrenfeldt Mar 25, 2020
31ad4f0
Wrap version method in Store
patrikgrenfeldt Mar 25, 2020
518927a
capture log at right level
patrikgrenfeldt Mar 25, 2020
12226ef
more tests
patrikgrenfeldt Mar 25, 2020
635f4f3
fix store name according to real implementation
patrikgrenfeldt Mar 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ repos:
rev: 19.3b0
hooks:
- id: black
args: [--line-length=100]
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ jobs:
- name: "Code formatting"
if: type = pull_request
install: pip install black
script: git --no-pager diff --name-only --diff-filter=AM $TRAVIS_COMMIT_RANGE | grep -F ".py" | xargs black --check --diff
script: git --no-pager diff --name-only --diff-filter=AM $TRAVIS_COMMIT_RANGE | grep -F ".py" | xargs black --check -l 100

- name: "Pylint score"
if: type = pull_request
Expand Down
3 changes: 1 addition & 2 deletions cg/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# -*- coding: utf-8 -*-
import pkg_resources

__title__ = 'cg'
__title__ = "cg"
__version__ = pkg_resources.get_distribution(__title__).version
1 change: 0 additions & 1 deletion cg/apps/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
"""
These are independent interfaces to tools outside the package.

Expand Down
9 changes: 2 additions & 7 deletions cg/apps/balsamic/fastq.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
"""
This module handles concatenation of balsamic fastq files.

Expand Down Expand Up @@ -133,9 +132,7 @@ def link(self, case: str, sample: str, files: List):
linked_fastq_path = wrk_dir / linked_fastq_name

linked_reads_paths[fastq_data["read"]].append(linked_fastq_path)
concatenated_paths[
fastq_data["read"]
] = f"{wrk_dir}/{concatenated_fastq_name}"
concatenated_paths[fastq_data["read"]] = f"{wrk_dir}/{concatenated_fastq_name}"

if not linked_fastq_path.exists():
LOGGER.info("linking: %s -> %s", original_fastq_path, linked_fastq_path)
Expand All @@ -145,9 +142,7 @@ def link(self, case: str, sample: str, files: List):

LOGGER.info("Concatenation in progress for sample %s.", sample)
for read in linked_reads_paths:
FastqFileConcatenator().concatenate(
linked_reads_paths[read], concatenated_paths[read]
)
FastqFileConcatenator().concatenate(linked_reads_paths[read], concatenated_paths[read])
self._remove_files(linked_reads_paths[read])

@staticmethod
Expand Down
31 changes: 25 additions & 6 deletions cg/apps/beacon.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
from typing import List
import datetime as dt
import logging
Expand All @@ -7,24 +6,44 @@

LOG = logging.getLogger(__name__)

class BeaconApi():

class BeaconApi:
"""
Interface with Beacon importer (github.com/Clinical-Genomics/cgbeacon)
Inserts variants from a VCF file inside a Beacon server.
"""

def __init__(self, config: dict):
super(BeaconApi, self).__init__()
self.connection = use_mysqlalchemy(config['cgbeacon']['database'])

self.connection = use_mysqlalchemy(config["cgbeacon"]["database"])

def upload(self, vcf_path: str, panel_path: str, dataset: str, outfile: str, customer: str, samples: List[str], quality: int, genome_reference: str):
def upload(
self,
vcf_path: str,
panel_path: str,
dataset: str,
outfile: str,
customer: str,
samples: List[str],
quality: int,
genome_reference: str,
):
""" Uploads variants from a VCF file to a MySQL Beacon database
Returns: number of new variants in the Beacon
"""

LOG.info("Uploading variants to beacon db.")
upload_result = Utility.beacon_upload(self.connection, vcf_path, panel_path, dataset, outfile, customer, samples, quality, genome_reference)
upload_result = Utility.beacon_upload(
self.connection,
vcf_path,
panel_path,
dataset,
outfile,
customer,
samples,
quality,
genome_reference,
)
LOG.info("Upload complete!")

def remove_vars(self, sample, vcf_path, panel_path=None, qual=20):
Expand Down
37 changes: 18 additions & 19 deletions cg/apps/gt.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
import logging

from subprocess import CalledProcessError
Expand All @@ -22,42 +21,42 @@ class GenotypeAPI(Manager):
"""

def __init__(self, config: dict):
alchy_config = dict(SQLALCHEMY_DATABASE_URI=config['genotype']['database'])
alchy_config = dict(SQLALCHEMY_DATABASE_URI=config["genotype"]["database"])
super(GenotypeAPI, self).__init__(config=alchy_config, Model=models.Model)

self.genotype_config = config['genotype']['config_path']
self.genotype_binary = config['genotype']['binary_path']
self.base_call = [self.genotype_binary, '--config', self.genotype_config]
self.genotype_config = config["genotype"]["config_path"]
self.genotype_binary = config["genotype"]["binary_path"]
self.base_call = [self.genotype_binary, "--config", self.genotype_config]

def upload(self, bcf_path: str, samples_sex: dict, force: bool=False):
def upload(self, bcf_path: str, samples_sex: dict, force: bool = False):
"""Upload genotypes for a family of samples."""
snps = api.snps()
analyses = load_vcf(bcf_path, snps)
for analysis_obj in analyses:
LOG.debug('loading VCF genotypes for sample: %s', analysis_obj.sample_id)
LOG.debug("loading VCF genotypes for sample: %s", analysis_obj.sample_id)
is_saved = api.add_analysis(self, analysis_obj, replace=force)
if is_saved:
LOG.info('loaded VCF genotypes for sample: %s', analysis_obj.sample_id)
LOG.info("loaded VCF genotypes for sample: %s", analysis_obj.sample_id)
else:
LOG.warning('skipped, found previous analysis: %s', analysis_obj.sample_id)
LOG.warning("skipped, found previous analysis: %s", analysis_obj.sample_id)

if is_saved or force:
analysis_obj.sex = samples_sex[analysis_obj.sample_id]['analysis']
analysis_obj.sample.sex = samples_sex[analysis_obj.sample_id]['pedigree']
analysis_obj.sex = samples_sex[analysis_obj.sample_id]["analysis"]
analysis_obj.sample.sex = samples_sex[analysis_obj.sample_id]["pedigree"]
self.commit()

def export_sample(self, days: int = 0) -> str:
"""Export sample info."""
trending_call = self.base_call[:]
trending_call.extend(['export-sample', '-d', days])
trending_call.extend(["export-sample", "-d", days])
try:
LOG.info('Running Genotype API to get data.')
LOG.info("Running Genotype API to get data.")
LOG.debug(trending_call)
output = subprocess.check_output(trending_call)
except CalledProcessError as error:
LOG.critical("Could not run command: %s", ' '.join(trending_call))
LOG.critical("Could not run command: %s", " ".join(trending_call))
raise error
output = output.decode('utf-8')
output = output.decode("utf-8")
# If sample not in genotype db, stdout of genotype command will be empty.
if not output:
raise CaseNotFoundError("samples not found in genotype db")
Expand All @@ -66,15 +65,15 @@ def export_sample(self, days: int = 0) -> str:
def export_sample_analysis(self, days: int = 0) -> str:
"""Export analysis."""
trending_call = self.base_call[:]
trending_call.extend(['export-sample-analysis', '-d', days])
trending_call.extend(["export-sample-analysis", "-d", days])
try:
LOG.info('Running Genotype API to get data.')
LOG.info("Running Genotype API to get data.")
LOG.debug(trending_call)
output = subprocess.check_output(trending_call)
except CalledProcessError as error:
LOG.critical("Could not run command: %s", ' '.join(trending_call))
LOG.critical("Could not run command: %s", " ".join(trending_call))
raise error
output = output.decode('utf-8')
output = output.decode("utf-8")
# If sample not in genotype db, stdout of genotype command will be empty.
if not output:
raise CaseNotFoundError("samples not found in genotype db")
Expand Down
91 changes: 80 additions & 11 deletions cg/apps/hk.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,81 @@
# -*- coding: utf-8 -*-
""" Module to decouple cg code from Housekeeper code """
import datetime as dt
import logging
import os
from pathlib import Path
from typing import List

from housekeeper.exc import VersionIncludedError
from housekeeper.include import include_version, checksum as hk_checksum
from housekeeper.store import Store, models

log = logging.getLogger(__name__)
LOG = logging.getLogger(__name__)


class HousekeeperAPI(Store):
class HousekeeperAPI:
""" API to decouple cg code from Housekeeper """

def __init__(self, config):
super(HousekeeperAPI, self).__init__(
config["housekeeper"]["database"], config["housekeeper"]["root"]
)
self._store = Store(config["housekeeper"]["database"], config["housekeeper"]["root"])
self.root_dir = config["housekeeper"]["root"]

def __getattr__(self, name):
LOG.warning(
"Called undefined method %s on %s, please implement", name, self.__class__.__name__
)
return getattr(self._store, name)

def add_bundle(self, bundle_data):
""" Build a new bundle version of files """
return self._store.add_bundle(bundle_data)

def new_file(
self, path: str, checksum: str = None, to_archive: bool = False, tags: list = None
):
""" Create a new file """
return self._store.new_file(path, checksum, to_archive, tags)

def tag(self, name: str):
""" Fetch a tag """
return self._store.tag(name)

def bundle(self, name: str):
""" Fetch a bundle """
return self._store.bundle(name)

def bundles(self):
""" Fetch bundles """
return self._store.bundles()

def files(
self, *, bundle: str = None, tags: List[str] = None, version: int = None, path: str = None
):
""" Fetch files """
return self._store.files(bundle=bundle, tags=tags, version=version, path=path)

def new_tag(self, name: str, category: str = None):
""" Create a new tag """
return self._store.new_tag(name, category)

def new_bundle(self, name: str, created_at: dt.datetime = None):
""" Create a new file bundle """
return self._store.new_bundle(name, created_at)

def new_version(self, created_at: dt.datetime, expires_at: dt.datetime = None):
""" Create a new bundle version """
return self._store.new_version(created_at, expires_at)

def add_commit(self, db_obj):
""" Wrap method in Housekeeper Store """
return self._store.add_commit(db_obj)

def commit(self):
""" Wrap method in Housekeeper Store """
return self._store.commit()

def session_no_autoflush(self):
""" Wrap property in Housekeeper Store """
return self._store.session.no_autoflush

def include(self, version_obj: models.Version):
"""Call the include version function to import related assets."""
include_version(self.get_root_dir(), version_obj)
Expand All @@ -30,26 +88,28 @@ def include_file(self, file_obj: models.File, version_obj: models.Version):
# generate root directory
version_root_dir = global_root_dir / version_obj.relative_root_dir
version_root_dir.mkdir(parents=True, exist_ok=True)
log.info(f"created new bundle version dir: {version_root_dir}")
LOG.info("Created new bundle version dir: %s", version_root_dir)

if file_obj.to_archive:
# calculate sha1 checksum if file is to be archived
file_obj.checksum = HousekeeperAPI.checksum(file_obj.path)
# hardlink file to the internal structure
new_path = version_root_dir / Path(file_obj.path).name
os.link(file_obj.path, new_path)
log.info(f"linked file: {file_obj.path} -> {new_path}")
LOG.info("Linked file: %s -> %s", file_obj.path, new_path)
file_obj.path = str(new_path).replace(f"{global_root_dir}/", "", 1)

def last_version(self, bundle: str) -> models.Version:
"""Gets the latest version of a bundle"""
return (
self.Version.query.join(models.Version.bundle)
self._store.Version.query.join(models.Version.bundle)
.filter(models.Bundle.name == bundle)
.order_by(models.Version.created_at.desc())
.first()
)

def get_root_dir(self):
"""Returns the root dir of Housekeeper"""
return self.root_dir

def get_files(self, bundle: str, tags: list, version: int = None):
Expand All @@ -59,7 +119,7 @@ def get_files(self, bundle: str, tags: list, version: int = None):
Returns:
iterable(hk.Models.File)
"""
return self.files(bundle=bundle, tags=tags, version=version)
return self._store.files(bundle=bundle, tags=tags, version=version)

def add_file(self, file, version_obj: models.Version, tags, to_archive=False):
"""Add a file to housekeeper."""
Expand All @@ -77,4 +137,13 @@ def add_file(self, file, version_obj: models.Version, tags, to_archive=False):

@staticmethod
def checksum(path):
"""Calculate the checksum"""
return hk_checksum(path)

def initialise_db(self):
"""Create all tables in the store."""
self._store.create_all()

def destroy_db(self):
"""Drop all tables in the store"""
self._store.drop_all()
5 changes: 1 addition & 4 deletions cg/apps/invoice/render.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
import datetime as dt
from pkg_resources import resource_filename

Expand Down Expand Up @@ -38,9 +37,7 @@ def render_xlsx(data: dict) -> Workbook:
pkg_dir = __name__.rpartition(".")[0]
sample_type = "pool" if data["pooled_samples"] else "sample"
costcenter = data["costcenter"]
template_path = resource_filename(
pkg_dir, f"templates/{costcenter}_{sample_type}_invoice.xlsx"
)
template_path = resource_filename(pkg_dir, f"templates/{costcenter}_{sample_type}_invoice.xlsx")
workbook = load_workbook(template_path)
if data["pooled_samples"]:
worksheet = workbook["Bilaga Prover"]
Expand Down
1 change: 0 additions & 1 deletion cg/apps/lims/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
from .api import LimsAPI
from .orderform import parse_orderform
from .limsjson import parse_json
Loading