Skip to content

Computing

Brian M. Schilder edited this page Oct 28, 2021 · 10 revisions

Learning to use the computing cluster:

To get access to the computing cluster, send Nathan Skene an email with your username and he will add you.

The Imperial CX1 cluster uses PBS as a job manager. PBS has many versions and it cannot neccesarily be assumed that a function you find in an online manual will work exactly as described there. The functions which work on the cluster are best found by typing man qstat while logged to an interactive session.

There is a weekly HPC clinic. I strongly recommend making use of this. You can turn up and experts will help you. Even if you are just unsure about something, go and speak to them. They are held in South Kensington but it's well worth going.

Imperial regularly runs a beginner's guide to high performance computing course. If you have not previously used HPC you'll want to register for this as soon as possible. This can be done through their website:

Imperial also runs a course on software carpentry. If you are unfamiliar with usage of Git and Linux then you should take this course.

Combiz wrote useful notes on using the HPC (how to login etc):

A version of RStudio is installed on the computing cluster and can be accessed through your browser. This gives you access to a 24 core machine and is probably better than programming on your laptop.

Saving your password

Rather than entering your password every time you ssh login, you can simply use ssh-keygen to do this once:

ssh-copy-id <username>@login.hpc.ic.ac.uk

This will ask you for your HPC password. Afterwards you will no longer have to enter your password from your local computer. For a more detailed explanation, see here.

Create interactive jobs on the cluster

Installing software on the cluster

If you want an R package installed and there's something that needs root permissions to setup, then request that it be installed via ASK.

You might want to try joining RCS Slack and raising the issue there.

Before doing either of these, you might want to check if the software is listed under module avail. If it is, you might be able to access it within RStudio using the RLinuxModules package. Another option is to setup the software up within conda, but that won't help with RStudio.

Create interactive jobs on the cluster

To create an interactive session on the cluster (to avoid overloading the login nodes) use the following command

qsub -I -l select=01:ncpus=8:mem=96gb -l walltime=08:00:00

That command requests the most resources that can be obtained for interactive jobs, decrease these if you can. On the main queue it will take a long time to submit. If I've given you access to the med-bio queue (ask) then you will be better of using the following:

qsub -I -l select=01:ncpus=2:mem=8gb -l walltime=01:00:00 -q med-bio

SSH into the cluster

VPN into the network, then connect with

ssh YOURUSERNAME@login.cx1.hpc.imperial.ac.uk

Setup ssh for the cluster

To setup ssh on your computer for accessing the cluster add the following to ~/.ssh/config:

Host *
 AddKeysToAgent yes
 IdentityFile ~/.ssh/id_rsa
Host imperial
   User nskene
   AddKeysToAgent yes
   HostName login.cx1.hpc.imperial.ac.uk
   ForwardX11Trusted yes
   ForwardX11 yes
   HostKeyAlgorithms=+ssh-dss
Host imperial-7
   User nskene
   AddKeysToAgent yes
   HostName login-7.cx1.hpc.imperial.ac.uk
   ForwardX11Trusted yes
   ForwardX11 yes
   HostKeyAlgorithms=+ssh-dss

If you use imperial-7 to login then you'll always connect to the same login node which makes using screen/tmux easier.

Joint workspaces on the Imperial cluster

We have two shared project spaces on the cluster. If you are involved in the DRI Multioics Atlas project then use projects/ukdrimultiomicsprojects/. Otherwise, please use projects/neurogenomics-lab

You will not be able to write into the main directory of either of these. They have two folders: live and ephemeral. Read about the differences between these here:

The medbio cluster (for faster job submissions)

What is medbio?

The MedBio cluster has additional computational resources and is accessed via a seperate queue. Read about it here: https://www.imperial.ac.uk/bioinformatics-data-science-group/resources/uk-med-bio/

How do we access it?

We have access to it but I do not have admin rights to grant access to individuals.

To get access it was previously neccesary to email p.blakeley@imperial.ac.uk but he is no longer at Imperial, and the access management doesn't seem to have been cleared up. Another contact email is medbio-help@imperial.ac.uk. Most recently, Brian got access by emailing m.futschik@imperial.ac.uk.

Nathan can access a list of users that currently have admin rights over medbio through the self-service portal. Ask him to check who is on there currently, and then contact one of them. Currently Abbas Dehghan seems like a good candidate.

How to use medbio?

To run on the med bio cluster, just put this at the end of your submit commands: -q med-bio

Express (charged) access to the cluster

It can take a long time to get jobs submitted on the cluster. We can pay to get jobs submitted faster. Details are here: https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/computing/express-access/. I would rather pay for faster results if this is slowing you down. Let me know if this would be useful and I'll add you to the list. Express jobs are sunmitted using Run express jobs with qsub -q express -P exp-XXXXX, substituting your express account code.

Running docker containers on the HPC with singularity

Creating a singularity container from a docker container

You'll need to use singularity to run docker containers on the HPC. To run a rocker R container in interactive mode, run the following, substituting your username where appropriate:

mkdir /rds/general/user/$USER/ephemeral/tmp/
singularity exec -B /rds/general/user/$USER/ephemeral/tmp/:/tmp,/rds/general/user/$USER/ephemeral/tmp/:/var/tmp,/rds/general/user/$USER/ephemeral/rtmp/:/usr/local/lib/R/site-library/ --writable-tmpfs docker://rocker/tidyverse:latest R

To create a Singularity image, first archive the image into a tar file. Obtain the IMAGE_ID with docker images then archive with (substituting the IMAGE_ID): -

docker save 409ad1cbd54c -o singlecell.tar

On a system running singularity-container (>v3) (e.g. on the HPC cluster), generate the Singularity Image File (SIF) from the local tar file with: -

/usr/bin/singularity build singlecell.sif docker-archive://singlecell.tar

This singlecell.sif Singularity Image File is now ready to use.

Running a singularity container

On HPC, Rocker containers can be run through Singularity with a single command much like the native Docker commands, e.g.

singularity exec docker://rocker/tidyverse:latest R

By default singularity bind mounts /home/$USER, /tmp, and $PWD into your container at runtime.

More info is available here.

Extending wall time for jobs that are already running

if you go to this link in the compute tab there is your jobs. For some jobs you can extend the walltime.

Remotely connecting to a linux workstation

I recommend the use of TeamViewer to connect. Setup the computer as a saved workstation and it makes remote use almost as easy as being sat at the computer.

Clone this wiki locally