Install R packages at job run time #29

brnleehng · 2017-06-20T20:51:34Z

Updated version
Added github / cran installation scripts for resource files
Added tests

msftclas · 2017-06-20T20:51:38Z

@brnleehng,
Thanks for your contribution as a Microsoft full-time employee or intern. You do not need to sign a CLA.
Thanks,
Microsoft Pull Request Bot

paselem · 2017-06-20T21:42:23Z

R/cluster.R

+  installGithubCommand <- NULL
+
+  if (!is.null(pool$rPackages) && !is.null(pool$rPackages$cran) && length(pool$rPackages$cran) > 0) {
+    installCranCommand <- getPoolPackageInstallationCommand("cran", pool$rPackages$cran)


prepend rAzureBatch:: to package calls

Added rAzureBatch::

paselem · 2017-06-20T22:24:02Z

R/utility.R

-    installation <- paste0(installation,
-                         sprintf("Rscript -e \'args <- commandArgs(TRUE)\' -e \'install.packages(args[1], dependencies=TRUE)\' %s", package),
-                         ";")
+getJobPackageInstallationCommand <- function(type, packages){


Do we run into issues doing this for each job rather than the cluster? I assume we get the benefit of job-prep running as same user as task.

paselem · 2017-06-20T22:26:11Z

R/utility.R

+  if (type == "cran") {
+    script <- "Rscript $AZ_BATCH_JOB_PREP_WORKING_DIR/install_cran.R"
+  }
+  else if (type == "github") {


should this be if instead of else if? What if a use specifies both?

If the user wants to specify both, the user would need to do each function individually..
getJobPackageInstallationCommand("cran", c("httr", "ludridate"))
getJobPackageInstallationCommand("github", c("Azure/doAzureParallel", "Azure/rAzureBatch"))

Should the function actually do both? This is internal function

getJobPackageInstallationCommand(cran = c("ludridate", "httr"), github = c("Azure/doAzureParallel", "Azure/rAzureBatch"))

I think as long as all packages for both cran and github are installed we are ok. It just seemed that from the code that if 'cran' is specified, the 'github' branch doesn't get executed.

The only issue we have is how to differentiate a github / cran package in the foreach argument

foreach( i = 1:3, .packages =c("httr", "lubridate"), .github = c("Azure/doAzureParallel")) {}

So, does this method get called once per package type?

No, we are creating a command line and the script handles the packages.

Rscript $AZ_BATCH_JOB_PREP_WORKING_DIR/install_cran.R httr ludridate hts

* Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * Fixed warning descriptions in makeCluster * Ramped up versions for DESCRIPTION * Updated CHANGELOG to 0.4.0 * Fixed environmentSettings style from merge conflict * Fixed cluster warning naming style

* Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * convert getJobResult output from binary to text * Only write vector to temp file * save cloud merge enabled, chunk size and packages as job metadata * update cloudMergeEnabled to cloudCombineEnabled * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * add getJob api (#84) * add getJob api * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128) long running job support, getJob, getJobList and getJobResult implementation * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * use counting service api in getJobList * fix coding style * return data frame from getJobList * update getJobList parameter from job id list to filter by state * reformat code * update description for getJobList * remove dup code * address review feedback * jobId parameter check for getJobResult * update documentation for long run job * update version to 0.5.0 * update version * address review feedback * update chunkSizeValue to chunkSizeKeyValuePair

* treat warnings as failures and fail the creation of the cluster (#91) * treat warnings as failures and fail the creation of the cluster * fix unit tests * fix lintr lines too long issue * escape single quotes * Check if existing pool is deleted when makeCluster is called (#99) * Added deleting pool check for makeCluster * Fixed double quotes * cluster logs renamed from pool to cluster * Added correct imports and fix range * Feature/bio conductor docs (#106) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * Feature/cluster logs (#98) * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * Renamed function names from past discussion * Fixed log documentation * Added new operations for storage management * Added dont run examples * Fixed unused arg for running example * Updated docs for storage management * Added a new doc dedicated for managing storage * Added attribute for container name in data frame * Fixed downloadBlob to work with new rAzureBatch function * Updated docs based on PR comments * Changed dependency version to razurebatch 0.5.0 * Feature/add azure files cluster config (#108) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * add missing azureFiles cluster config to samples * Add 0.4.2 CHANGELOG comments (#111) * Added live scenario test (#107) * Added live scenario test so users do not have to write their own sample code to test * Added file names for test live * Removed single quote linter * Added comment about the reason for this test * Wait for job preparation task function (#109) * Fixed verbose for getDoParWorkers (#112) * Feature/faq (#110) * initial FAQ * rename faq to FAQ * merge FAQ and Troubleshooting docs * add info on how to reboot a node * refrence TSG and FAQ from main docs index page * add more info as per PR feedback * PR feedback * point raw scripts at master branch (#118) * Update DESCRIPTION (#117) Update version for new milestone. * Fix: Removed anaconda from path (#119) * Removed anaconda from environment path * Line is too long for blobxfer command * For BioConductor install, force remove MRO 3.3 prior to installing MRO 3.4 (#120) * force add PATH to current user * Update bioc_setup.sh * Check verbose null case (#121) * Change True/False to TRUE/FALSE in README example (#124) * add .gitiattrributes file to track line endings * True and False are not valid in R; changed to TRUE and FALSE * Fixed worker and merger scripts (#116) * Fixed worker and merger scripts * Fixed verbose logs based on PR comments * Added documentation on error handling * Fixed header on table markdown * Fixed based on PR comments * v0.4.3 Release (#131) * Upgraded description to use rAzureBatch v0.5.1 * Updated change log for job failure * readme.md update * Merge from feature/getjobresult for long running job support (#130) * Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * convert getJobResult output from binary to text * Only write vector to temp file * save cloud merge enabled, chunk size and packages as job metadata * update cloudMergeEnabled to cloudCombineEnabled * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * add getJob api (#84) * add getJob api * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128) long running job support, getJob, getJobList and getJobResult implementation * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * use counting service api in getJobList * fix coding style * return data frame from getJobList * update getJobList parameter from job id list to filter by state * reformat code * update description for getJobList * remove dup code * address review feedback * jobId parameter check for getJobResult * update documentation for long run job * update version to 0.5.0 * update version * address review feedback * update chunkSizeValue to chunkSizeKeyValuePair * Validate job names and pool names (#129) * Added validator class * Added validators for lintr * Added exclusion list for validators * fix bug in metadata handling for packages and enableCloudCombine (#133) * fix bug in metadata handling for packages and enableCloudCombine * call long running job api in test * update test * add test for long running job feature * code style fix * update job state description in readme * use list for job state filter * address review feedback

brnleehng added 3 commits June 20, 2017 11:51

Added cran/github installation scripts

fec6dad

Added package installation tests

04e3835

Upgraded package version to 0.3.2

e1c5750

msftclas added the cla-not-required label Jun 20, 2017

paselem suggested changes Jun 20, 2017

View reviewed changes

paselem approved these changes Jun 30, 2017

View reviewed changes

Merge branch 'develop' into feature/install-package-scripts

b2042a1

brnleehng merged commit ff48bbe into develop Jul 10, 2017

brnleehng deleted the feature/install-package-scripts branch July 13, 2017 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install R packages at job run time #29

Install R packages at job run time #29

brnleehng commented Jun 20, 2017

msftclas commented Jun 20, 2017

paselem Jun 20, 2017

brnleehng Jun 20, 2017

paselem Jun 20, 2017

paselem Jun 20, 2017

paselem Jun 20, 2017

brnleehng Jun 20, 2017 •

edited

Loading

paselem Jun 21, 2017

brnleehng Jun 21, 2017

paselem Jun 26, 2017

brnleehng Jun 26, 2017

Install R packages at job run time #29

Install R packages at job run time #29

Conversation

brnleehng commented Jun 20, 2017

msftclas commented Jun 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brnleehng Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brnleehng Jun 20, 2017 •

edited

Loading