This repository has been archived by the owner on Oct 12, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 49
Feature/custom package #272
Merged
Merged
Changes from 16 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
25f7c0a
Added custom package script
brnleehng 0a3fd72
Added feature custom download
brnleehng f7cefcf
Fixed typo
brnleehng 8c02328
Fixed directory for installation
brnleehng 75081f5
Fixed full folder directory
brnleehng ac9d4a6
Add dependencies and fix pattern
brnleehng e255111
Fix pattern not found
brnleehng f19680c
Added repo
brnleehng f7b2026
Switching to devtools
brnleehng 9409f62
Fixing devtools install with directory
brnleehng 48d8d4d
Fix in for merger.R
brnleehng dce215b
Working cluster custom packages
brnleehng 6a0a176
Removed printed statements
brnleehng e14ec0e
Working on custom docs
brnleehng 25c42f0
Custom packages sample docs
brnleehng 687ec7d
Fixed typo in azure files typo
brnleehng e519c9d
Fixed typos based on PR
brnleehng 14c50ed
Fixed download install custom path
brnleehng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,29 +38,37 @@ You can install packages by specifying the package(s) in your JSON pool configur | |
} | ||
``` | ||
|
||
## Installing Packages per-*foreach* Loop | ||
|
||
You can also install cran packages by using the **.packages** option in the *foreach* loop. You can also install github/bioconductor packages by using the **github** and **bioconductor" option in the *foreach* loop. Instead of installing packages during pool creation, packages (and its dependencies) can be installed before each iteration in the loop is run on your Azure cluster. | ||
|
||
### Installing a Github Package | ||
|
||
doAzureParallel supports github package with the **github** option. | ||
|
||
Please do not use "https://github.com/" as prefix for the github package name above. | ||
|
||
## Installing packages from a private GitHub repository | ||
|
||
Clusters can be configured to install packages from a private GitHub repository by setting the __githubAuthenticationToken__ property. If this property is blank only public repositories can be used. If a token is added then public and the private github repo can be used together. | ||
Clusters can be configured to install packages from a private GitHub repository by setting the __githubAuthenticationToken__ property in the credentials file. If this property is blank only public repositories can be used. If a token is added then public and the private github repo can be used together. | ||
|
||
When the cluster is created the token is passed in as an environment variable called GITHUB\_PAT on start-up which lasts the life of the cluster and is looked up whenever devtools::install_github is called. | ||
|
||
Credentials File for github authentication token | ||
``` json | ||
{ | ||
... | ||
"githubAuthenticationToken": "", | ||
... | ||
} | ||
|
||
``` | ||
|
||
Cluster File | ||
```json | ||
{ | ||
{ | ||
"name": <your pool name>, | ||
"vmSize": <your pool VM size name>, | ||
"maxTasksPerNode": <num tasks to allocate to each node>, | ||
"poolSize": { | ||
"dedicatedNodes": { | ||
"min": 2, | ||
"max": 2 | ||
}, | ||
"lowPriorityNodes": { | ||
"min": 1, | ||
"max": 10 | ||
}, | ||
"autoscaleFormula": "QUEUE" | ||
}, | ||
... | ||
"rPackages": { | ||
"cran": [], | ||
"github": ["<project/some_private_repository>"], | ||
|
@@ -71,10 +79,18 @@ When the cluster is created the token is passed in as an environment variable ca | |
} | ||
``` | ||
|
||
_More information regarding github authentication tokens can be found [here](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/)_ | ||
_More information regarding github authentication tokens can be found [here](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/) | ||
|
||
## Installing Packages per-*foreach* Loop | ||
You can also install cran packages by using the **.packages** option in the *foreach* loop. You can also install github/bioconductor packages by using the **github** and **bioconductor" option in the *foreach* loop. Instead of installing packages during pool creation, packages (and its dependencies) can be installed before each iteration in the loop is run on your Azure cluster. | ||
### Installing Multiple Packages | ||
By using character vectors of the packages, | ||
|
||
```R | ||
number_of_iterations <- 10 | ||
results <- foreach(i = 1:number_of_iterations, | ||
.packages=c('package_1', 'package_2'), | ||
github = c('Azure/rAzureBatch', 'Azure/doAzureParallel'), | ||
bioconductor = c('IRanges', 'Biobase')) %dopar% { ... } | ||
``` | ||
|
||
To install a single cran package: | ||
```R | ||
|
@@ -94,7 +110,6 @@ number_of_iterations <- 10 | |
results <- foreach(i = 1:number_of_iterations, github='azure/rAzureBatch') %dopar% { ... } | ||
``` | ||
|
||
Please do not use "https://github.com/" as prefix for the github package name above. | ||
|
||
To install multiple github packages: | ||
```R | ||
|
@@ -114,7 +129,7 @@ number_of_iterations <- 10 | |
results <- foreach(i = 1:number_of_iterations, bioconductor=c('package_1', 'package_2')) %dopar% { ... } | ||
``` | ||
|
||
## Installing Packages from BioConductor | ||
## Installing a BioConductor Package | ||
The default deployment of R used in the cluster (see [Customizing the cluster](./30-customize-cluster.md) for more information) includes the Bioconductor installer by default. Simply add packages to the cluster by adding packages in the array. | ||
|
||
```json | ||
|
@@ -134,17 +149,27 @@ The default deployment of R used in the cluster (see [Customizing the cluster](. | |
}, | ||
"autoscaleFormula": "QUEUE" | ||
}, | ||
"containerImage:" "rocker/tidyverse:latest", | ||
"rPackages": { | ||
"cran": [], | ||
"github": [], | ||
"bioconductor": ["IRanges"] | ||
}, | ||
"commandLine": [] | ||
"commandLine": [], | ||
"subnetId": "" | ||
} | ||
} | ||
``` | ||
|
||
Note: Container references that are not provided by tidyverse do not support Bioconductor installs. If you choose another container, you must make sure that Biocondunctor is installed. | ||
Note: Container references that are not provided by tidyverse do not support Bioconductor installs. If you choose another container, you must make sure that Bioconductor is installed. | ||
|
||
## Installing Custom Packages | ||
doAzureParallel supports custom package installation in the cluster. Custom packages installation on the per-*foreach* loop level is not supported. | ||
|
||
For steps on installing on custom packages, it can be found [here](../samples/package_management/custom/README.md). | ||
|
||
Note: If the package requires a compilation such as apt-get installations, users will be require | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. require --> required |
||
to build their own containers. | ||
|
||
## Uninstalling packages | ||
## Uninstalling a Package | ||
Uninstalling packages from your pool is not supported. However, you may consider rebuilding your pool. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
args <- commandArgs(trailingOnly = TRUE) | ||
|
||
sharedPackageDirectory <- file.path( | ||
Sys.getenv("AZ_BATCH_NODE_SHARED_DIR"), | ||
"R", | ||
"packages") | ||
|
||
tempDir <- file.path( | ||
Sys.getenv("AZ_BATCH_NODE_STARTUP_DIR"), | ||
"tmp") | ||
|
||
.libPaths(c(sharedPackageDirectory, .libPaths())) | ||
|
||
pattern <- NULL | ||
if (length(args) > 1) { | ||
if (!is.null(args[2])) { | ||
pattern <- args[2] | ||
} | ||
} | ||
|
||
devtoolsPackage <- "devtools" | ||
if (!require(devtoolsPackage, character.only = TRUE)) { | ||
install.packages(devtoolsPackage) | ||
require(devtoolsPackage, character.only = TRUE) | ||
} | ||
|
||
packageDirs <- list.files( | ||
path = tempDir, | ||
full.names = TRUE, | ||
recursive = FALSE) | ||
|
||
for (i in 1:length(packageDirs)) { | ||
print("Package Directories") | ||
print(packageDirs[i]) | ||
|
||
devtools::install(packageDirs[i], | ||
args = c( | ||
paste0( | ||
"--library=", | ||
"'", | ||
sharedPackageDirectory, | ||
"'"))) | ||
|
||
print("Package Directories Completed") | ||
} | ||
|
||
unlink( | ||
tempDir, | ||
recursive = TRUE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
## Installing Custom Packages | ||
doAzureParallel supports custom package installation in the cluster. Custom packages are R packages that cannot be hosted on Github or be built on a docker image. The recommended approach for custom packages is building them from source and uploading them to an Azure File Share. | ||
|
||
Note: If the package requires a compilation such as apt-get installations, users will be require | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. require --> required |
||
to build their own containers. | ||
|
||
### Building Package from Source in RStudio | ||
1. Open *RStudio* | ||
2. Go to *Build* on the navigation bar | ||
3. Go to *Build From Source* | ||
|
||
### Uploading Custom Package to Azure Files | ||
For detailed steps on uploading files to Azure Files in the Portal can be found | ||
[here](https://docs.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-portal) | ||
|
||
### Notes | ||
1) In order to build the custom packages' dependencies, we need to untar the R packages and build them within their directories. By default, we will build custom packages in the *$AZ_BATCH_NODE_SHARED_DIR/tmp* directory. | ||
2) By default, the custom package cluster configuration file will install any packages that are a *.tar.gz file in the file share. If users want to specify R packages, they must use change this line in the cluster configuration file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use change -> "change" or "use" |
||
|
||
Finds files that end with *.tar.gz in the current Azure File Share directory | ||
``` json | ||
{ | ||
... | ||
"commandLine": [ | ||
... | ||
"mkdir $AZ_BATCH_NODE_STARTUP_DIR/tmp | for i in `ls $AZ_BATCH_NODE_SHARED_DIR/data/*.tar.gz | awk '{print $NF}'`; do tar -xvf $i -C $AZ_BATCH_NODE_STARTUP_DIR/tmp; done", | ||
... | ||
] | ||
} | ||
``` | ||
3) For more information on using Azure Files on Batch, follow our other [sample](./azure_files/readme.md) of using Azure Files | ||
4) Replace your Storage Account name, endpoint and key in the cluster configuration file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#Please see documentation at docs/20-package-management.md for more details on packagement management. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. packagement -> package |
||
|
||
# import the doAzureParallel library and its dependencies | ||
library(doAzureParallel) | ||
|
||
# set your credentials | ||
doAzureParallel::setCredentials("credentials.json") | ||
|
||
# Create your cluster if not exist | ||
cluster <- doAzureParallel::makeCluster("custom_packages_cluster.json") | ||
|
||
# register your parallel backend | ||
doAzureParallel::registerDoAzureParallel(cluster) | ||
|
||
# check that your workers are up | ||
doAzureParallel::getDoParWorkers() | ||
|
||
summary <- foreach(i = 1:1, .packages = c("customR")) %dopar% { | ||
sessionInfo() | ||
# Method from customR | ||
hello() | ||
} | ||
|
||
summary |
27 changes: 27 additions & 0 deletions
27
samples/package_management/custom/custom_packages_cluster.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
{ | ||
"name": "custom-package-pool", | ||
"vmSize": "Standard_D2_v2", | ||
"maxTasksPerNode": 1, | ||
"poolSize": { | ||
"dedicatedNodes": { | ||
"min": 2, | ||
"max": 2 | ||
}, | ||
"lowPriorityNodes": { | ||
"min": 0, | ||
"max": 0 | ||
}, | ||
"autoscaleFormula": "QUEUE" | ||
}, | ||
"rPackages": { | ||
"cran": [], | ||
"github": [], | ||
"bioconductor": [] | ||
}, | ||
"commandLine": [ | ||
"mkdir /mnt/batch/tasks/shared/data", | ||
"mount -t cifs //<Account Name>.file.core.windows.net/<File Share> /mnt/batch/tasks/shared/data -o vers=3.0,username=<Account Name>,password=<Account Key>,dir_mode=0777,file_mode=0777,sec=ntlmssp", | ||
"mkdir $AZ_BATCH_NODE_STARTUP_DIR/tmp | for i in `ls $AZ_BATCH_NODE_SHARED_DIR/data/*.tar.gz | awk '{print $NF}'`; do tar -xvf $i -C $AZ_BATCH_NODE_STARTUP_DIR/tmp; done", | ||
"docker run --rm -v $AZ_BATCH_NODE_ROOT_DIR:$AZ_BATCH_NODE_ROOT_DIR -e AZ_BATCH_NODE_SHARED_DIR=$AZ_BATCH_NODE_SHARED_DIR -e AZ_BATCH_NODE_ROOT_DIR=$AZ_BATCH_NODE_ROOT_DIR -e AZ_BATCH_NODE_STARTUP_DIR=$AZ_BATCH_NODE_STARTUP_DIR rocker/tidyverse:latest Rscript --no-save --no-environ --no-restore --no-site-file --verbose $AZ_BATCH_NODE_STARTUP_DIR/wd/install_custom.R /mnt/batch/tasks/shared/data" | ||
] | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 2nd "on" redundant?