Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pm storage cleanup and pm storage archive-to-swestore #34

Merged
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f971677
Add clean script
guillermo-carrasco Dec 8, 2014
955e5b3
Loag config file from YAML. Fixes #31
guillermo-carrasco Dec 8, 2014
76ba272
Use biuilt in logging. Fixes #30
guillermo-carrasco Dec 8, 2014
da5cbf3
Code restructure. Starting storage functionality
guillermo-carrasco Dec 8, 2014
72bee69
Merge branch 'storage'
guillermo-carrasco Dec 8, 2014
7d4d16c
More logic about the runs within the code
guillermo-carrasco Dec 9, 2014
b455c11
Decoupled execute external command
guillermo-carrasco Dec 12, 2014
d3b404f
Fix several bigs
guillermo-carrasco Dec 12, 2014
f157467
Fix some problems with bcl2fast absolute path command
guillermo-carrasco Dec 12, 2014
d7ac3d4
Load irods and connect
guillermo-carrasco Dec 17, 2014
f91a359
Don't use irodsclient, not very stable
guillermo-carrasco Dec 18, 2014
7469a75
Support for X-Ten machine IDs
guillermo-carrasco Jan 22, 2015
bdbdf5f
cleanup done. Regex fixed
guillermo-carrasco Jan 22, 2015
c432dc2
Compressing runs. Fixed some external command line issues
guillermo-carrasco Jan 23, 2015
2c2e50a
Correct tar command, dude
guillermo-carrasco Jan 23, 2015
299e552
Get path, use commandline run arg
guillermo-carrasco Jan 23, 2015
f8c394f
Fixed some small bugs. TODO: Check adler32
guillermo-carrasco Jan 23, 2015
c850c02
Remove correct file
guillermo-carrasco Jan 23, 2015
e35ca3b
use os to remove single file
guillermo-carrasco Jan 23, 2015
fb85539
No need of calculating adler32 because -K is doing so with md5
guillermo-carrasco Jan 23, 2015
d803d81
Remove basemask. Fix #33
guillermo-carrasco Jan 26, 2015
74f0e2e
Move runs if they're older than 1 day, thanks @senthil10
guillermo-carrasco Jan 26, 2015
df91247
Modify regexp to pick up MiSeqs as well. Thanks @senthil10
guillermo-carrasco Jan 26, 2015
6359cef
Don't hide checkpoint file. Tack @remiolsen
guillermo-carrasco Jan 26, 2015
f64cc4c
If no config file, log in HOME, not ~/.pm
guillermo-carrasco Jan 26, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ pm.egg-info
dist
build
_build
*log
*.log
.DS*

5 changes: 5 additions & 0 deletions clean.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Remove building leftovers
rm -rf build dist *egg-info

# Remove pyc files
find . -type f -name *pyc -exec rm {} +
28 changes: 28 additions & 0 deletions pm/controllers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
""" Core module.

Place for controllers and other structural stuff.
"""
from cement.core import controller

class BaseController(controller.CementBaseController):
""" Define an application BaseController

The most basic controller. To be used as a template for new and more complex
controllers.
"""
class Meta:
label = 'base'
description = "Project Management - A tool for miscellaneous tasks at NGI"


@controller.expose(hide=True)
def default(self):
print "Execute pm --help to display available commands"

@controller.expose(hide=True, help="Prints a hello message")
def hello(self):
""" Testing method that just prints a hello message.

Will not be listed as an available option (--hide)
"""
self.app.log.info("Welcome to Project Management tools!")
96 changes: 96 additions & 0 deletions pm/controllers/storage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
""" PM controllers
"""
import os
import re
import shutil

from cement.core import controller

from pm.controllers import BaseController
from pm.utils import filesystem, misc

class StorageController(BaseController):
""" Storage Controller

Entry point for all functionalities related to storage
"""
class Meta:
label = 'storage'
description = "Entry point for all functionalities related to storage"
stacked_on = 'base'
stacked_type = 'nested'
arguments = [
(['-r', '--run'], dict(type=str, help="Work with a specific run")),
(['-d', '--days'], dict(type=int, default=10, help="Days to consider a run \"old\""))
]

#######################
# Storage subcommands #
#######################

@controller.expose(help="Move old runs to nosync directory so they're not synced to the processing server")
def cleanup(self):
for data_dir in self.app.config.get('storage', 'data_dirs'):
with filesystem.chdir(data_dir):
for run in [r for r in os.listdir(data_dir) if re.match(filesystem.RUN_RE, r)]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only meant for high-seq runs ? as you defined RUN_RE = '\d{6}_[a-zA-Z\d\-]+_\d{4}_[AB][A-Z\d]{9}'. So it would not work for miseq, would it ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn! no it will not work on MiSeq because of the last part of the regexp, I'll have to change it, well spotted! Thanks!

if os.path.exists(os.path.join(run, 'RTAComplete.txt')):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we don't check how old the RTAComplete.txt is ? Is it possible that in worst case if this script is called just a few seconds after RTAComplete.txt was created, the ran would immediately moved to nosync but still the lsync was not very effective to copy all data to processing server ?

P.S: I am asking this with complete lack of knowledge regarding lsyncand how fast it works, so correct me if I am wrong ☺️

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, hmmm... well even If it may look a bit paranoid, I think you're right and is not that much effort to check. Let's say 1 day old? ;-)

self.app.log.info('Moving run {} to nosync directory'.format(os.path.basename(run)))
shutil.move(run, 'nosync')


@controller.expose(help="Archive old runs to SWESTORE")
def archive_to_swestore(self):
# If the run is specified in the command line, check that exists and archive
if self.app.pargs.run:
if re.match(filesystem.RUN_RE, os.path.basename(self.app.pargs.run)):
if not os.path.exists(self.app.pargs.run):
self.app.log.error(("Run {} not found. Please make sure to specify "
"the absolute path or relative path being in the correct directory.".format(self.app.pargs.run)))
else:
self._archive_run(self.pargs.run)
else:
self.app.log.error("The name {} doesn't look like an Illumina run".format(os.path.basename(run)))
# Otherwise find all runs in every data dir on the nosync partition
else:
self.app.log.info("Archiving old runs to SWESTORE")
for data_dir in self.app.config.get('storage', 'data_dirs'):
to_send_dir = os.path.join(data_dir, 'nosync')
self.app.log.info('Checking {} directory'.format(to_send_dir))
with filesystem.chdir(to_send_dir):
for run in [r for r in os.listdir(to_send_dir) if re.match(filesystem.RUN_RE, r)]:
self._archive_run(run)

#############################################################
# Class helper methods, not exposed as commands/subcommands #
#############################################################
def _archive_run(self, run):
""" Archive a specific run to swestore

:param str run: Run directory
"""
def _send_to_swestore(f, dest, remove=True):
""" Send file to swestore checking adler32 on destination and eventually
removing the file from disk

:param str f: File to remove
:param str dest: Destination directory in Swestore
:param bool remove: If True, remove original file from source
"""
self.app.log.info("Sending {} to swestore".format(f))
misc.call_external_command('iput -K -P {file} {dest}'.format(file=f, dest=dest),
with_log_files=True)
self.app.log.info('Run {} sent correctly and checksum was okay.'.format(f))
if remove:
self.app.log.info('Removing run'.format(f))
os.remove(f)


if run.endswith('bz2'):
_send_to_swestore(run, self.app.config.get('storage', 'irods').get('irodsHome'))
else:
self.app.log.info("Compressing run {}".format(run))
# Compress with pbzip2
misc.call_external_command('tar --use-compress-program=pbzip2 -cf {run}.tar.bz2 {run}'.format(run=run))
self.app.log.info('Run {} successfully compressed! Removing from disk...'.format(run))
shutil.rmtree(run)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fully safe, but I am still feeling paranoid 😂
Lets delete everything 😈

_send_to_swestore('{}.tar.bz2'.format(run), self.app.config.get('storage', 'irods').get('irodsHome'))
4 changes: 0 additions & 4 deletions pm/core/__init__.py

This file was deleted.

49 changes: 0 additions & 49 deletions pm/core/controllers.py

This file was deleted.

2 changes: 1 addition & 1 deletion pm/log/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
""" Project Management logging module
""" PM logging module for external scripts
"""
26 changes: 10 additions & 16 deletions pm/log/loggers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,14 @@
import os
import sys

from ConfigParser import NoOptionError, NoSectionError

from pm.utils import config as cl


def minimal_logger(namespace, config_file=None, to_file=True, debug=False):
"""Make and return a minimal console logger. Optionally write to a file as well.

:param str namespace: Namespace of logger
:param bool to_file: Log to a file (location in configuration file)
:param bool debug: Log in DEBUG level or not

:return: A logging.Logger object
:rtype: logging.Logger
"""
Expand All @@ -36,16 +32,14 @@ def minimal_logger(namespace, config_file=None, to_file=True, debug=False):
log_path = os.path.join(os.environ['HOME'], '.pm', 'pm.log')
if config_file or os.environ.get('PM_CONFIG'):
if os.environ.get('PM_CONFIG'):
config = cl.load_config(os.environ.get('PM_CONFIG'))
else:
config = cl.load_config(config_file)
try:
log_path = config.get('log', 'log_dir')
except (NoOptionError, NoSectionError) as e:
raise e("Section [log] or option 'log_dir' were not found in the configuration file.")
config = cl.load_yaml_config(os.environ.get('PM_CONFIG'))
else:
fh = logging.FileHandler(log_path)
fh.setLevel(log_level)
fh.setFormatter(formatter)
log.addHandler(fh)
return log
config = cl.load_yaml_config(config_file)
log_path = config.get('log', {}).get('log_dir')
if not log_path:
raise RuntimeError("Section [log] or option 'log_dir' were not found in the configuration file.")
fh = logging.FileHandler(log_path)
fh.setLevel(log_level)
fh.setFormatter(formatter)
log.addHandler(fh)
return log
2 changes: 0 additions & 2 deletions pm/storage.py

This file was deleted.

2 changes: 2 additions & 0 deletions pm/utils/filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import contextlib
import os

RUN_RE = '\d{6}_[a-zA-Z\d\-]+_\d{4}_[AB][A-Z\d]{9}'

@contextlib.contextmanager
def chdir(new_dir):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done 👍 :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tack :-)

"""Context manager to temporarily change to a new directory.
Expand Down
36 changes: 36 additions & 0 deletions pm/utils/misc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
""" Miscellaneous or general-use methods
"""
import os
import subprocess
import sys

from datetime import datetime

def call_external_command(cl, with_log_files=False):
""" Executes an external command

:param string cl: Command line to be executed (command + options and parameters)
:param bool with_log_files: Create log files for stdout and stderr
"""
if type(cl) == str:
cl = cl.split(' ')
command = os.path.basename(cl[0])
stdout = sys.stdout
stderr = sys.stderr

if with_log_files:
stdout = open(command + '.out', 'wa')
stderr = open(command + '.err', 'wa')
started = "Started command {} on {}".format(' '.join(cl), datetime.now())
stdout.write(started + '\n')
stdout.write(''.join(['=']*len(cl)) + '\n')

try:
subprocess.check_call(cl, stdout=stdout, stderr=stderr)
except subprocess.CalledProcessError, e:
e.message = "The command {} failed.".format(' '.join(cl))
raise e
finally:
if with_log_files:
stdout.close()
stderr.close()
22 changes: 15 additions & 7 deletions scripts/pm
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,36 @@

""" This is the entry point for PM.
"""
import os

from cement.core import foundation, handler
from pm.core import controllers
from cement.ext.ext_yaml import YamlConfigHandler
from pm.controllers import BaseController
from pm.controllers import storage

CONFIG_FLIE = os.path.join(os.getenv('HOME'), '.pm', 'pm.yaml')

class PmApp(foundation.CementApp):
class Meta:
label = 'PM'
base_controller = controllers.BaseController
base_controller = BaseController
config_handler = YamlConfigHandler

# Create the main application
app = PmApp()

try:
# Register handlers
handler.register(controllers.StorageController)
handler.register(storage.StorageController)

# Setup the application
app.setup()

# Add any common arguments to all handlers
app.args.add_argument('-c', '--config', type=str, help="Path to the configuration file")

app.run()
# Run the application
if not app.config.parse_file(CONFIG_FLIE):
app.log.error('No config file {}; please create and set relevant config sections'.format(CONFIG_FLIE))
else:
app.run()
finally:
# close the app
app.close()
Loading