Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Commit

Permalink
long running job support (#136)
Browse files Browse the repository at this point in the history
* treat warnings as failures and fail the creation of the cluster (#91)

* treat warnings as failures and fail the creation of the cluster

* fix unit tests

* fix lintr lines too long issue

* escape single quotes

* Check if existing pool is deleted when makeCluster is called (#99)

* Added deleting pool check for makeCluster

* Fixed double quotes

* cluster logs renamed from pool to cluster

* Added correct imports and fix range

* Feature/bio conductor docs (#106)

* initial command line instructions for bioconductor

* initial startup scripts for installing bioconductor

* fix if then syntax

* force update node environment with update path for R runtime

* install bioconductor

* wrap bioconductor install command in Rscript

* bioconductor sample docs

* update bioC docs

* remove .gitignore rule for .json files

* add pointer to BioC cluster config from docs

* Feature/cluster logs (#98)

* download merge result gets content raw

* Added setHttpTraffic and logging functions docs

* Fixed broken links

* Shorten lines down to 120 characters

* download merge result gets content raw

* Added setHttpTraffic and logging functions docs

* Fixed broken links

* Shorten lines down to 120 characters

* Renamed function names from past discussion

* Fixed log documentation

* Added new operations for storage management

* Added dont run examples

* Fixed unused arg for running example

* Updated docs for storage management

* Added a new doc dedicated for managing storage

* Added attribute for container name in data frame

* Fixed downloadBlob to work with new rAzureBatch function

* Updated docs based on PR comments

* Changed dependency version to razurebatch 0.5.0

* Feature/add azure files cluster config (#108)

* initial command line instructions for bioconductor

* initial startup scripts for installing bioconductor

* fix if then syntax

* force update node environment with update path for R runtime

* install bioconductor

* wrap bioconductor install command in Rscript

* bioconductor sample docs

* update bioC docs

* remove .gitignore rule for .json files

* add pointer to BioC cluster config from docs

* add missing azureFiles cluster config to samples

* Add 0.4.2 CHANGELOG comments (#111)

* Added live scenario test (#107)

* Added live scenario test so users do not have to write their own sample code to test

* Added file names for test live

* Removed single quote linter

* Added comment about the reason for this test

* Wait for job preparation task function (#109)

* Fixed verbose for getDoParWorkers (#112)

* Feature/faq (#110)

* initial FAQ

* rename faq to FAQ

* merge FAQ and Troubleshooting docs

* add info on how to reboot a node

* refrence TSG and FAQ from main docs index page

* add more info as per PR feedback

* PR feedback

* point raw scripts at master branch (#118)

* Update DESCRIPTION (#117)

Update version for new milestone.

* Fix: Removed anaconda from path (#119)

* Removed anaconda from environment path

* Line is too long for blobxfer command

* For BioConductor install, force remove MRO 3.3 prior to installing MRO 3.4 (#120)

* force add PATH to current user

* Update bioc_setup.sh

* Check verbose null case (#121)

* Change True/False to TRUE/FALSE in README example (#124)

* add .gitiattrributes file to track line endings

* True and False are not valid in R; changed to TRUE and FALSE

* Fixed worker and merger scripts (#116)

* Fixed worker and merger scripts

* Fixed verbose logs based on PR comments

* Added documentation on error handling

* Fixed header on table markdown

* Fixed based on PR comments

* v0.4.3 Release (#131)

* Upgraded description to use rAzureBatch v0.5.1

* Updated change log for job failure

* readme.md update

* Merge from feature/getjobresult for long running job support (#130)

* Added set chunk size

* Added cluster configuration validation function (#30)

* Added pool config test validation

* Added a fix for validation

* Added if checks for null tests and more validation tests

* Install R packages at job run time (#29)

* Added cran/github installation scripts

* Added package installation tests

* Upgraded package version to 0.3.2

* Output file support (#40)

* Output files support

* Added createOutputFile method

* output files readme documentation

* added tests and find container sas

* Added more detailed variable names

* Enable/disable merge task (#39)

* Merge task pass params

* Fixed enableMerge cases

* Merge task documentation on README.md

* Fixed typo on merge task description

* Update doAzureParallel.R

* Changed enableMerge to enableCloudCombine

* convert getJobResult output from binary to text

* Only write vector to temp file

* save cloud merge enabled, chunk size and packages as job metadata

* update cloudMergeEnabled to cloudCombineEnabled

* Fix/backwards compatible (#68)

* Added backwards compatible in make cluster

* Added deprecated config validator

* Added mismatch label

* Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52)

* Added validation for quota limits and bad getPool requests

* Fixed based on PR

* Fixed progress bar layout to use switch statements instead of if statements

* Changed clusterId to poolId

* Added comments and fixed messages

* Added running state to the node status

* Reformatted lines for function

* Added end statement for node completion

* Feature/custom script and reduce (#70)

* Added custom scripts and removed dependencies parameter

* Updated roxygen tool version

* Added parallelThreads support

* Added test coverage

* Removed verbose message on command line

* Added Reduce function for group of tasks

* Fix build because of doc semantics mismatch with function

* Removed unused function

* Added command line arg

* Added docs for custom script

* Moved customize cluster to separate doc for future usage

* Fixed typo

* Bug - Waiting for tasks to completion function ends too early (#69)

* Moved wait for tasks to complete to doAzureParallel utility

* Removed unneeded variables and progress

* Fixed camel case for skiptoken

* Travis/lintr (#72)

* Added lintr config file

* Added travis github package installation

* Removed snake case rule

* Fixed documents on doAzureParallel

* Based on lintr default_settins docs, correctly added default rules

* Updated lintr package to use object_name_style

* Added package :: operator

* Reformatted after merge

* Fixed command line tests

* Upgraded roxygen to 6.0.1

* Cluster config docs

* Removed additional delete job

* add getJob api (#84)

* add getJob api

* reformat code

* update styling in utility file

* fix code styling

* update chunksize to chunkSize and remove unused code

* handle job metadata in getJob api

* fix styling issue

* update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128)

long running job support, getJob, getJobList and getJobResult implementation

* reformat code

* update styling in utility file

* fix code styling

* update chunksize to chunkSize and remove unused code

* handle job metadata in getJob api

* fix styling issue

* use counting service api in getJobList

* fix coding style

* return data frame from getJobList

* update getJobList parameter from job id list to filter by state

* reformat code

* update description for getJobList

* remove dup code

* address review feedback

* jobId parameter check for getJobResult

* update documentation for long run job

* update version to 0.5.0

* update version

* address review feedback

* update chunkSizeValue to chunkSizeKeyValuePair

* Validate job names and pool names (#129)

* Added validator class

* Added validators for lintr

* Added exclusion list for validators

* fix bug in metadata handling for packages and enableCloudCombine (#133)

* fix bug in metadata handling for packages and enableCloudCombine

* call long running job api in test

* update test

* add test for long running job feature

* code style fix

* update job state description in readme

* use list for job state filter

* address review feedback
  • Loading branch information
zfengms authored Oct 3, 2017
1 parent 737c1d5 commit f6ab94a
Show file tree
Hide file tree
Showing 17 changed files with 851 additions and 612 deletions.
1 change: 1 addition & 0 deletions .lintr
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
exclusions: list("R/validators.R")
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# Change Log
## [0.5.1] 2017-09-28
### Added
- Support for users to get job and job results for long running job
### Changed
- [BREAKING CHANGE] Update get job list to take state filter and return job status in a data frame

## [0.4.3] 2017-09-28
### Fixed
- Allow merge task to run on task failures

## [0.4.2] 2017-09-08
### Added
- Support for users to get files from nodes and tasks
Expand Down
5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: doAzureParallel
Type: Package
Title: doAzureParallel
Version: 0.4.3
Version: 0.5.0
Author: Brian Hoang
Maintainer: Brian Hoang <brhoan@microsoft.com>
Description: The project is for data experts who use R at scale. The project
Expand All @@ -19,7 +19,8 @@ Imports:
rAzureBatch (>= 0.5.1),
jsonlite,
rjson,
xml2
xml2,
R6
Suggests:
testthat,
caret,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export(deleteStorageFile)
export(generateClusterConfig)
export(generateCredentialsConfig)
export(getClusterFile)
export(getJob)
export(getJobFile)
export(getJobList)
export(getJobResult)
Expand Down
10 changes: 9 additions & 1 deletion R/cluster.R
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,14 @@ makeCluster <-
validateClusterConfig(clusterSetting)
}

tryCatch({
`Validators`$isValidPoolName(poolConfig$name)
},
error = function(e){
stop(paste("Invalid pool name: \n",
e))
})

response <- .addPool(
pool = poolConfig,
packages = packages,
Expand Down Expand Up @@ -253,7 +261,7 @@ makeCluster <-

clusterNodeMismatchWarning <-
paste(
"There is a mismatched between the projected cluster %s",
"There is a mismatched between the requested cluster %s",
"nodes min/max '%s'/'%s' and the existing cluster %s nodes '%s'.",
"Use the 'resizeCluster' function to get the correct amount",
"of workers."
Expand Down
68 changes: 49 additions & 19 deletions R/doAzureParallel.R
Original file line number Diff line number Diff line change
Expand Up @@ -195,14 +195,22 @@ setHttpTraffic <- function(value = FALSE) {
assign("packages", obj$packages, .doAzureBatchGlobals)
assign("pkgName", pkgName, .doAzureBatchGlobals)

time <- format(Sys.time(), "%Y%m%d%H%M%S", tz = "GMT")
id <- sprintf("%s%s",
"job",
time)

if (!is.null(obj$options$azure$job)) {
id <- obj$options$azure$job
}
else {
time <- format(Sys.time(), "%Y%m%d%H%M%S", tz = "GMT")
id <- sprintf("%s%s", "job", time)
}

tryCatch({
`Validators`$isValidStorageContainerName(id)
`Validators`$isValidJobName(id)
},
error = function(e){
stop(paste("Invalid job name: \n",
e))
})

wait <- TRUE
if (!is.null(obj$options$azure$wait)) {
Expand Down Expand Up @@ -321,13 +329,49 @@ setHttpTraffic <- function(value = FALSE) {
)

# We need to merge any files passed by the calling lib with the resource files specified here

resourceFiles <-
append(resourceFiles, requiredJobResourceFiles)

enableCloudCombineKeyValuePair <-
list(name = "enableCloudCombine", value = as.character(enableCloudCombine))

chunkSize <- 1

if (!is.null(obj$options$azure$chunkSize)) {
chunkSize <- obj$options$azure$chunkSize
}

if (!is.null(obj$options$azure$chunksize)) {
chunkSize <- obj$options$azure$chunksize
}

if (exists("chunkSize", envir = .doAzureBatchGlobals)) {
chunkSize <- get("chunkSize", envir = .doAzureBatchGlobals)
}

chunkSizeKeyValuePair <-
list(name = "chunkSize", value = as.character(chunkSize))

if (is.null(obj$packages)) {
metadata <-
list(enableCloudCombineKeyValuePair, chunkSizeKeyValuePair)
} else {
packagesKeyValuePair <-
list(name = "packages",
value = paste(obj$packages, collapse = ";"))

metadata <-
list(enableCloudCombineKeyValuePair,
chunkSizeKeyValuePair,
packagesKeyValuePair)
}

response <- .addJob(
jobId = id,
poolId = data$poolId,
resourceFiles = resourceFiles,
metadata = metadata,
packages = obj$packages
)

Expand Down Expand Up @@ -376,20 +420,6 @@ setHttpTraffic <- function(value = FALSE) {
job <- rAzureBatch::getJob(id)
cat(sprintf("Id: %s", job$id), fill = TRUE)

chunkSize <- 1

if (!is.null(obj$options$azure$chunkSize)) {
chunkSize <- obj$options$azure$chunkSize
}

if (!is.null(obj$options$azure$chunksize)) {
chunkSize <- obj$options$azure$chunksize
}

if (exists("chunkSize", envir = .doAzureBatchGlobals)) {
chunkSize <- get("chunkSize", envir = .doAzureBatchGlobals)
}

ntasks <- length(argsList)

startIndices <- seq(1, length(argsList), chunkSize)
Expand Down
4 changes: 3 additions & 1 deletion R/helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@
.addJob <- function(jobId,
poolId,
resourceFiles,
metadata,
...) {
args <- list(...)
packages <- args$packages
Expand Down Expand Up @@ -168,7 +169,8 @@
poolInfo = poolInfo,
jobPreparationTask = jobPreparationTask,
usesTaskDependencies = usesTaskDependencies,
content = "text"
content = "text",
metadata = metadata
)

return(response)
Expand Down
Loading

0 comments on commit f6ab94a

Please sign in to comment.