long running job support (#136)

* treat warnings as failures and fail the creation of the cluster (#91) * treat warnings as failures and fail the creation of the cluster * fix unit tests * fix lintr lines too long issue * escape single quotes * Check if existing pool is deleted when makeCluster is called (#99) * Added deleting pool check for makeCluster * Fixed double quotes * cluster logs renamed from pool to cluster * Added correct imports and fix range * Feature/bio conductor docs (#106) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * Feature/cluster logs (#98) * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * Renamed function names from past discussion * Fixed log documentation * Added new operations for storage management * Added dont run examples * Fixed unused arg for running example * Updated docs for storage management * Added a new doc dedicated for managing storage * Added attribute for container name in data frame * Fixed downloadBlob to work with new rAzureBatch function * Updated docs based on PR comments * Changed dependency version to razurebatch 0.5.0 * Feature/add azure files cluster config (#108) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * add missing azureFiles cluster config to samples * Add 0.4.2 CHANGELOG comments (#111) * Added live scenario test (#107) * Added live scenario test so users do not have to write their own sample code to test * Added file names for test live * Removed single quote linter * Added comment about the reason for this test * Wait for job preparation task function (#109) * Fixed verbose for getDoParWorkers (#112) * Feature/faq (#110) * initial FAQ * rename faq to FAQ * merge FAQ and Troubleshooting docs * add info on how to reboot a node * refrence TSG and FAQ from main docs index page * add more info as per PR feedback * PR feedback * point raw scripts at master branch (#118) * Update DESCRIPTION (#117) Update version for new milestone. * Fix: Removed anaconda from path (#119) * Removed anaconda from environment path * Line is too long for blobxfer command * For BioConductor install, force remove MRO 3.3 prior to installing MRO 3.4 (#120) * force add PATH to current user * Update bioc_setup.sh * Check verbose null case (#121) * Change True/False to TRUE/FALSE in README example (#124) * add .gitiattrributes file to track line endings * True and False are not valid in R; changed to TRUE and FALSE * Fixed worker and merger scripts (#116) * Fixed worker and merger scripts * Fixed verbose logs based on PR comments * Added documentation on error handling * Fixed header on table markdown * Fixed based on PR comments * v0.4.3 Release (#131) * Upgraded description to use rAzureBatch v0.5.1 * Updated change log for job failure * readme.md update * Merge from feature/getjobresult for long running job support (#130) * Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * convert getJobResult output from binary to text * Only write vector to temp file * save cloud merge enabled, chunk size and packages as job metadata * update cloudMergeEnabled to cloudCombineEnabled * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * add getJob api (#84) * add getJob api * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128) long running job support, getJob, getJobList and getJobResult implementation * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * use counting service api in getJobList * fix coding style * return data frame from getJobList * update getJobList parameter from job id list to filter by state * reformat code * update description for getJobList * remove dup code * address review feedback * jobId parameter check for getJobResult * update documentation for long run job * update version to 0.5.0 * update version * address review feedback * update chunkSizeValue to chunkSizeKeyValuePair * Validate job names and pool names (#129) * Added validator class * Added validators for lintr * Added exclusion list for validators * fix bug in metadata handling for packages and enableCloudCombine (#133) * fix bug in metadata handling for packages and enableCloudCombine * call long running job api in test * update test * add test for long running job feature * code style fix * update job state description in readme * use list for job state filter * address review feedback
Azure · Oct 3, 2017 · f6ab94a · f6ab94a
1 parent 737c1d5
commit f6ab94a
Show file tree

Hide file tree

Showing 17 changed files with 851 additions and 612 deletions.
diff --git a/.lintr b/.lintr
@@ -0,0 +1 @@
+exclusions: list("R/validators.R")
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,7 +1,14 @@
 # Change Log
+## [0.5.1] 2017-09-28
+### Added
+- Support for users to get job and job results for long running job
+### Changed
+- [BREAKING CHANGE] Update get job list to take state filter and return job status in a data frame
+
 ## [0.4.3] 2017-09-28
 ### Fixed
 - Allow merge task to run on task failures
+
 ## [0.4.2] 2017-09-08
 ### Added
 - Support for users to get files from nodes and tasks

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: doAzureParallel
 Type: Package
 Title: doAzureParallel
-Version: 0.4.3
+Version: 0.5.0
 Author: Brian Hoang
 Maintainer: Brian Hoang <brhoan@microsoft.com>
 Description: The project is for data experts who use R at scale. The project
@@ -19,7 +19,8 @@ Imports:
     rAzureBatch (>= 0.5.1),
     jsonlite,
     rjson,
-    xml2
+    xml2,
+    R6
 Suggests:
     testthat,
     caret,

diff --git a/NAMESPACE b/NAMESPACE
@@ -6,6 +6,7 @@ export(deleteStorageFile)
 export(generateClusterConfig)
 export(generateCredentialsConfig)
 export(getClusterFile)
+export(getJob)
 export(getJobFile)
 export(getJobList)
 export(getJobResult)

diff --git a/R/cluster.R b/R/cluster.R
@@ -196,6 +196,14 @@ makeCluster <-
       validateClusterConfig(clusterSetting)
     }
 
+    tryCatch({
+      `Validators`$isValidPoolName(poolConfig$name)
+    },
+    error = function(e){
+      stop(paste("Invalid pool name: \n",
+                 e))
+    })
+
     response <- .addPool(
       pool = poolConfig,
       packages = packages,
@@ -253,7 +261,7 @@ makeCluster <-
 
       clusterNodeMismatchWarning <-
         paste(
-          "There is a mismatched between the projected cluster %s",
+          "There is a mismatched between the requested cluster %s",
           "nodes min/max '%s'/'%s' and the existing cluster %s nodes '%s'.",
           "Use the 'resizeCluster' function to get the correct amount",
           "of workers."

diff --git a/R/doAzureParallel.R b/R/doAzureParallel.R
@@ -195,14 +195,22 @@ setHttpTraffic <- function(value = FALSE) {
   assign("packages", obj$packages, .doAzureBatchGlobals)
   assign("pkgName", pkgName, .doAzureBatchGlobals)
 
-  time <- format(Sys.time(), "%Y%m%d%H%M%S", tz = "GMT")
-  id <-  sprintf("%s%s",
-                 "job",
-                 time)
-
   if (!is.null(obj$options$azure$job)) {
     id <- obj$options$azure$job
   }
+  else {
+    time <- format(Sys.time(), "%Y%m%d%H%M%S", tz = "GMT")
+    id <-  sprintf("%s%s", "job", time)
+  }
+
+  tryCatch({
+    `Validators`$isValidStorageContainerName(id)
+    `Validators`$isValidJobName(id)
+  },
+  error = function(e){
+    stop(paste("Invalid job name: \n",
+               e))
+  })
 
   wait <- TRUE
   if (!is.null(obj$options$azure$wait)) {
@@ -321,13 +329,49 @@ setHttpTraffic <- function(value = FALSE) {
       )
 
       # We need to merge any files passed by the calling lib with the resource files specified here
+
       resourceFiles <-
         append(resourceFiles, requiredJobResourceFiles)
 
+      enableCloudCombineKeyValuePair <-
+        list(name = "enableCloudCombine", value = as.character(enableCloudCombine))
+
+      chunkSize <- 1
+
+      if (!is.null(obj$options$azure$chunkSize)) {
+        chunkSize <- obj$options$azure$chunkSize
+      }
+
+      if (!is.null(obj$options$azure$chunksize)) {
+        chunkSize <- obj$options$azure$chunksize
+      }
+
+      if (exists("chunkSize", envir = .doAzureBatchGlobals)) {
+        chunkSize <- get("chunkSize", envir = .doAzureBatchGlobals)
+      }
+
+      chunkSizeKeyValuePair <-
+        list(name = "chunkSize", value = as.character(chunkSize))
+
+      if (is.null(obj$packages)) {
+        metadata <-
+          list(enableCloudCombineKeyValuePair, chunkSizeKeyValuePair)
+      } else {
+        packagesKeyValuePair <-
+          list(name = "packages",
+               value = paste(obj$packages, collapse = ";"))
+
+        metadata <-
+          list(enableCloudCombineKeyValuePair,
+               chunkSizeKeyValuePair,
+               packagesKeyValuePair)
+      }
+
       response <- .addJob(
         jobId = id,
         poolId = data$poolId,
         resourceFiles = resourceFiles,
+        metadata = metadata,
         packages = obj$packages
       )
 
@@ -376,20 +420,6 @@ setHttpTraffic <- function(value = FALSE) {
   job <- rAzureBatch::getJob(id)
   cat(sprintf("Id: %s", job$id), fill = TRUE)
 
-  chunkSize <- 1
-
-  if (!is.null(obj$options$azure$chunkSize)) {
-    chunkSize <- obj$options$azure$chunkSize
-  }
-
-  if (!is.null(obj$options$azure$chunksize)) {
-    chunkSize <- obj$options$azure$chunksize
-  }
-
-  if (exists("chunkSize", envir = .doAzureBatchGlobals)) {
-    chunkSize <- get("chunkSize", envir = .doAzureBatchGlobals)
-  }
-
   ntasks <- length(argsList)
 
   startIndices <- seq(1, length(argsList), chunkSize)

diff --git a/R/helpers.R b/R/helpers.R
@@ -138,6 +138,7 @@
 .addJob <- function(jobId,
                     poolId,
                     resourceFiles,
+                    metadata,
                     ...) {
   args <- list(...)
   packages <- args$packages
@@ -168,7 +169,8 @@
     poolInfo = poolInfo,
     jobPreparationTask = jobPreparationTask,
     usesTaskDependencies = usesTaskDependencies,
-    content = "text"
+    content = "text",
+    metadata = metadata
   )
 
   return(response)