This repository has been archived by the owner on Oct 12, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added set chunk size * Adding resource files on pool creation * renaming generate file functions * Moved worker/merger scripts to doAzureParallel and created common job env * Added stdout and stderr logs in uploads * added to docs / README * Switched params for cluster and added examples * setCreds, resizeCluster, job management * cred generator update * Added samples, moved autoscale, and low-pri/output files * Added documentation on methods for ??R feature * Added export for makeCluster * Namespace missing export * clusterSetting param name * cluster id param name * NumOfNodes param for wait nodes completion fix * Added proper naming for registerDoAzureParallel * readme update' * typo readme * low pri in readme * monte carlo simulation * Added new sample for sas resource files * caret + annotation on montecarlo sim * samples readme.md * samples readme * Fixed the resource files to use proper storage account for example * Update README.md * Update 11-autoscale.md * Fixed autoscale formula for task queue to take maxTaskPerNode * Added named args to createSasToken * Update resource-files-example.R * Update 21-distributing-data.md * Renamed samples files to underscore format * Update 21-distributing-data.md * Update README.md * Update README.md * Update README.md * Edited changelog file * Update plyr_example.R * Update README.md
- Loading branch information
Showing
31 changed files
with
1,480 additions
and
191 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
0.3.0 | ||
- [BREAKING CHANGE] Two configuration files for easier debugging - credentials and cluster settings | ||
- [BREAKING CHANGE] Added low priority virtual machine support for additional cost saving | ||
- Added external method for setting chunk size (SetChunkSize) | ||
- Added getJobList function to check the status of user's jobs | ||
- Added resizeCluster function to allow users to change their autoscale formulas on the fly |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,21 @@ | ||
Package: doAzureParallel | ||
Type: Package | ||
Title: doAzureParallel | ||
Version: 0.2.2 | ||
Version: 0.3.0 | ||
Author: Brian Hoang | ||
Maintainer: Who to complain to <yourfault@somewhere.net> | ||
Description: More about what it does (maybe more than one line) | ||
License: What license is it under? | ||
Maintainer: Brian Hoang <brhoan@microsoft.com> | ||
Description: The project is for data experts who use R at scale. The project | ||
comes together as an R package that will allow users to run their R code in | ||
parallel across a cluster hosted on Azure. The cluster will be created and | ||
maintained by Azure Batch and, for the initial version, will be a public/ | ||
communal pool. The orchestration for each job that needs to be parallelized in | ||
the cluster will be done by a middle layer that schedules each request. | ||
License: Microsoft Corporation | ||
LazyData: TRUE | ||
Depends: | ||
foreach (>= 1.4.3), | ||
iterators (>= 1.0.8), | ||
rAzureBatch (>= 0.1.0) | ||
rAzureBatch (>= 0.2.4) | ||
Suggests: | ||
testthat | ||
testthat, caret, plyr | ||
RoxygenNote: 5.0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,13 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
exportPattern("^[^\\.]") | ||
|
||
export(generateClusterConfig) | ||
export(generateCredentialsConfig) | ||
export(getJobList) | ||
export(getJobResult) | ||
export(makeCluster) | ||
export(registerDoAzureParallel) | ||
export(setChunkSize) | ||
export(setCredentials) | ||
export(setVerbose) | ||
export(stopCluster) | ||
export(waitForNodesToComplete) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
AUTOSCALE_WORKDAY_FORMULA <- paste0( | ||
"$curTime = time();", | ||
"$workHours = $curTime.hour >= 8 && $curTime.hour < 18;", | ||
"$isWeekday = $curTime.weekday >= 1 && $curTime.weekday <= 5;", | ||
"$isWorkingWeekdayHour = $workHours && $isWeekday;", | ||
"$TargetDedicatedNodes = $isWorkingWeekdayHour ? %s:%s;") | ||
|
||
AUTOSCALE_WEEKEND_FORMULA <- paste0( | ||
"$isWeekend = $curTime.weekday >= 6 && $curTime.weekday <= 7;", | ||
"$TargetDedicatedNodes = $isWeekend ? %s:%s;") | ||
|
||
AUTOSCALE_MAX_CPU_FORMULA <- "$totalNodes = | ||
(min($CPUPercent.GetSample(TimeInterval_Minute * 10)) > 0.7) ? | ||
($CurrentDedicated * 1.1) : $CurrentDedicated; $totalNodes = | ||
(avg($CPUPercent.GetSample(TimeInterval_Minute * 60)) < 0.2) ? | ||
($CurrentDedicated * 0.9) : $totalNodes; | ||
$TargetDedicatedNodes = min(%s, $totalNodes)" | ||
|
||
AUTOSCALE_QUEUE_FORMULA <- paste0( | ||
"$samples = $ActiveTasks.GetSamplePercent(TimeInterval_Minute * 15);", | ||
"$tasks = $samples < 70 ? max(0,$ActiveTasks.GetSample(1)) : max( $ActiveTasks.GetSample(1), avg($ActiveTasks.GetSample(TimeInterval_Minute * 15)));", | ||
"$maxTasksPerNode = %s;", | ||
"$round = $maxTasksPerNode - 1;", | ||
"$targetVMs = $tasks > 0? (($tasks + $round)/ $maxTasksPerNode) : max(0, $TargetDedicated/2) + 0.5;", | ||
"$TargetDedicatedNodes = max(%s, min($targetVMs, %s));", | ||
"$TargetLowPriorityNodes = max(%s, min($targetVMs, %s));", | ||
"$NodeDeallocationOption = taskcompletion;" | ||
) | ||
|
||
AUTOSCALE_FORMULA = list("WEEKEND" = AUTOSCALE_WEEKEND_FORMULA, | ||
"WORKDAY" = AUTOSCALE_WORKDAY_FORMULA, | ||
"MAX_CPU" = AUTOSCALE_MAX_CPU_FORMULA, | ||
"QUEUE" = AUTOSCALE_QUEUE_FORMULA) | ||
|
||
getAutoscaleFormula <- function(formulaName, dedicatedMin, dedicatedMax, lowPriorityMin, lowPriorityMax, maxTasksPerNode = 1){ | ||
formulas <- names(AUTOSCALE_FORMULA) | ||
|
||
if(formulaName == formulas[1]){ | ||
return(sprintf(AUTOSCALE_WEEKEND_FORMULA, dedicatedMin, dedicatedMax)) | ||
} | ||
else if(formulaName == formulas[2]){ | ||
return(sprintf(AUTOSCALE_WORKDAY_FORMULA, dedicatedMin, dedicatedMax)) | ||
} | ||
else if(formulaName == formulas[3]){ | ||
return(sprintf(AUTOSCALE_MAX_CPU_FORMULA, dedicatedMin)) | ||
} | ||
else if(formulaName == formulas[4]){ | ||
return(sprintf(AUTOSCALE_QUEUE_FORMULA, maxTasksPerNode, dedicatedMin, dedicatedMax, lowPriorityMin, lowPriorityMax)) | ||
} | ||
else{ | ||
stop("Incorrect autoscale formula: QUEUE, MAX_CPU, WEEKEND, WORKDAY") | ||
} | ||
} | ||
|
||
#' Resize an Azure cloud-enabled cluster. | ||
#' | ||
#' @param cluster Cluster object that was referenced in \code{makeCluster} | ||
#' @param dedicatedMin The minimum number of dedicated nodes | ||
#' @param dedicatedMax The maximum number of dedicated nodes | ||
#' @param lowPriorityMin The minimum number of low priority nodes | ||
#' @param lowPriorityMax The maximum number of low priority nodes | ||
#' @param algorithm Current built-in autoscale formulas: QUEUE, MAX_CPU, WEEKEND, WEEKDAY | ||
#' @param timeInterval | ||
#' | ||
#' @examples | ||
#' resizeCluster(cluster, dedicatedMin = 2, dedicatedMax = 6, dedicatedMin = 2, dedicatedMax = 6, algorithm = "QUEUE", timeInterval = "PT10M") | ||
resizeCluster <- function(cluster, | ||
dedicatedMin, | ||
dedicatedMax, | ||
lowPriorityMin, | ||
lowPriorityMax, | ||
algorithm = "QUEUE", | ||
timeInterval = "PT5M"){ | ||
pool <- getPool(cluster$poolId) | ||
|
||
resizePool(cluster$poolId, | ||
autoscaleFormula = getAutoscaleFormula(algorithm, dedicatedMin, dedicatedMax, lowPriorityMin, lowPriorityMax, maxTasksPerNode = pool$maxTasksPerNode), | ||
autoscaleInterval = timeInterval) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.