-
Notifications
You must be signed in to change notification settings - Fork 49
Conversation
brnleehng
commented
Jul 12, 2017
- Added Azure storage support for output file
- Users will be provide a file pattern that will be uploaded to azure storage by supplying a container sas url
@brnleehng, |
R/utility.R
Outdated
azureDomain <- "blob.core.windows.net" | ||
parsedStorageUrl <- strsplit(url, azureDomain)[[1]] | ||
|
||
baseUrl <- paste0(parsedStorageUrl[1], azureDomain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Personally I like to have variable placeholders for array indexing, e.g.
accountName <- parsedValue[1]
accountKey <- parsedValue[2]
Makes the code much easier to read and follow since you know what you're expecting from the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll make the change!
README.md
Outdated
@@ -316,6 +316,31 @@ setVerbose(True) | |||
setVerbose(False) | |||
``` | |||
|
|||
|
|||
### Output Files | |||
Batch will automatically handle your output files when the user assigns a file pattern and container url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't want to just use 'container' since people may imply Docker Containers. Please use 'storage container' instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited.
README.md
Outdated
file.create(fileName) | ||
fileConn<-file(fileName) | ||
close(fileConn) | ||
NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a doc around why you're returning NULL here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add documentation
README.md
Outdated
containerName = outputFolder, | ||
sasToken = writeToken) | ||
|
||
output <- createOutputFile("test-*.txt", containerUrl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These samples/docs are really important for our users since it's the only way they'll know to use this. Can we add a few more examples on how you might construct output files? Perhaps some more complicated use cases would be good to show the flexibility of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of examples are you thinking of..? Also should we create a sample instead of a doc then?
Will work on documentation. |
* Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * Fixed warning descriptions in makeCluster * Ramped up versions for DESCRIPTION * Updated CHANGELOG to 0.4.0 * Fixed environmentSettings style from merge conflict * Fixed cluster warning naming style
* Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * convert getJobResult output from binary to text * Only write vector to temp file * save cloud merge enabled, chunk size and packages as job metadata * update cloudMergeEnabled to cloudCombineEnabled * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * add getJob api (#84) * add getJob api * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128) long running job support, getJob, getJobList and getJobResult implementation * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * use counting service api in getJobList * fix coding style * return data frame from getJobList * update getJobList parameter from job id list to filter by state * reformat code * update description for getJobList * remove dup code * address review feedback * jobId parameter check for getJobResult * update documentation for long run job * update version to 0.5.0 * update version * address review feedback * update chunkSizeValue to chunkSizeKeyValuePair
* treat warnings as failures and fail the creation of the cluster (#91) * treat warnings as failures and fail the creation of the cluster * fix unit tests * fix lintr lines too long issue * escape single quotes * Check if existing pool is deleted when makeCluster is called (#99) * Added deleting pool check for makeCluster * Fixed double quotes * cluster logs renamed from pool to cluster * Added correct imports and fix range * Feature/bio conductor docs (#106) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * Feature/cluster logs (#98) * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * Renamed function names from past discussion * Fixed log documentation * Added new operations for storage management * Added dont run examples * Fixed unused arg for running example * Updated docs for storage management * Added a new doc dedicated for managing storage * Added attribute for container name in data frame * Fixed downloadBlob to work with new rAzureBatch function * Updated docs based on PR comments * Changed dependency version to razurebatch 0.5.0 * Feature/add azure files cluster config (#108) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * add missing azureFiles cluster config to samples * Add 0.4.2 CHANGELOG comments (#111) * Added live scenario test (#107) * Added live scenario test so users do not have to write their own sample code to test * Added file names for test live * Removed single quote linter * Added comment about the reason for this test * Wait for job preparation task function (#109) * Fixed verbose for getDoParWorkers (#112) * Feature/faq (#110) * initial FAQ * rename faq to FAQ * merge FAQ and Troubleshooting docs * add info on how to reboot a node * refrence TSG and FAQ from main docs index page * add more info as per PR feedback * PR feedback * point raw scripts at master branch (#118) * Update DESCRIPTION (#117) Update version for new milestone. * Fix: Removed anaconda from path (#119) * Removed anaconda from environment path * Line is too long for blobxfer command * For BioConductor install, force remove MRO 3.3 prior to installing MRO 3.4 (#120) * force add PATH to current user * Update bioc_setup.sh * Check verbose null case (#121) * Change True/False to TRUE/FALSE in README example (#124) * add .gitiattrributes file to track line endings * True and False are not valid in R; changed to TRUE and FALSE * Fixed worker and merger scripts (#116) * Fixed worker and merger scripts * Fixed verbose logs based on PR comments * Added documentation on error handling * Fixed header on table markdown * Fixed based on PR comments * v0.4.3 Release (#131) * Upgraded description to use rAzureBatch v0.5.1 * Updated change log for job failure * readme.md update * Merge from feature/getjobresult for long running job support (#130) * Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * convert getJobResult output from binary to text * Only write vector to temp file * save cloud merge enabled, chunk size and packages as job metadata * update cloudMergeEnabled to cloudCombineEnabled * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * add getJob api (#84) * add getJob api * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128) long running job support, getJob, getJobList and getJobResult implementation * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * use counting service api in getJobList * fix coding style * return data frame from getJobList * update getJobList parameter from job id list to filter by state * reformat code * update description for getJobList * remove dup code * address review feedback * jobId parameter check for getJobResult * update documentation for long run job * update version to 0.5.0 * update version * address review feedback * update chunkSizeValue to chunkSizeKeyValuePair * Validate job names and pool names (#129) * Added validator class * Added validators for lintr * Added exclusion list for validators * fix bug in metadata handling for packages and enableCloudCombine (#133) * fix bug in metadata handling for packages and enableCloudCombine * call long running job api in test * update test * add test for long running job feature * code style fix * update job state description in readme * use list for job state filter * address review feedback
* treat warnings as failures and fail the creation of the cluster (#91) * treat warnings as failures and fail the creation of the cluster * fix unit tests * fix lintr lines too long issue * escape single quotes * Check if existing pool is deleted when makeCluster is called (#99) * Added deleting pool check for makeCluster * Fixed double quotes * cluster logs renamed from pool to cluster * Added correct imports and fix range * Feature/bio conductor docs (#106) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * Feature/cluster logs (#98) * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * download merge result gets content raw * Added setHttpTraffic and logging functions docs * Fixed broken links * Shorten lines down to 120 characters * Renamed function names from past discussion * Fixed log documentation * Added new operations for storage management * Added dont run examples * Fixed unused arg for running example * Updated docs for storage management * Added a new doc dedicated for managing storage * Added attribute for container name in data frame * Fixed downloadBlob to work with new rAzureBatch function * Updated docs based on PR comments * Changed dependency version to razurebatch 0.5.0 * Feature/add azure files cluster config (#108) * initial command line instructions for bioconductor * initial startup scripts for installing bioconductor * fix if then syntax * force update node environment with update path for R runtime * install bioconductor * wrap bioconductor install command in Rscript * bioconductor sample docs * update bioC docs * remove .gitignore rule for .json files * add pointer to BioC cluster config from docs * add missing azureFiles cluster config to samples * Add 0.4.2 CHANGELOG comments (#111) * Added live scenario test (#107) * Added live scenario test so users do not have to write their own sample code to test * Added file names for test live * Removed single quote linter * Added comment about the reason for this test * Wait for job preparation task function (#109) * Fixed verbose for getDoParWorkers (#112) * Feature/faq (#110) * initial FAQ * rename faq to FAQ * merge FAQ and Troubleshooting docs * add info on how to reboot a node * refrence TSG and FAQ from main docs index page * add more info as per PR feedback * PR feedback * point raw scripts at master branch (#118) * Update DESCRIPTION (#117) Update version for new milestone. * Fix: Removed anaconda from path (#119) * Removed anaconda from environment path * Line is too long for blobxfer command * For BioConductor install, force remove MRO 3.3 prior to installing MRO 3.4 (#120) * force add PATH to current user * Update bioc_setup.sh * Check verbose null case (#121) * Change True/False to TRUE/FALSE in README example (#124) * add .gitiattrributes file to track line endings * True and False are not valid in R; changed to TRUE and FALSE * Fixed worker and merger scripts (#116) * Fixed worker and merger scripts * Fixed verbose logs based on PR comments * Added documentation on error handling * Fixed header on table markdown * Fixed based on PR comments * v0.4.3 Release (#131) * Upgraded description to use rAzureBatch v0.5.1 * Updated change log for job failure * readme.md update * Merge from feature/getjobresult for long running job support (#130) * Added set chunk size * Added cluster configuration validation function (#30) * Added pool config test validation * Added a fix for validation * Added if checks for null tests and more validation tests * Install R packages at job run time (#29) * Added cran/github installation scripts * Added package installation tests * Upgraded package version to 0.3.2 * Output file support (#40) * Output files support * Added createOutputFile method * output files readme documentation * added tests and find container sas * Added more detailed variable names * Enable/disable merge task (#39) * Merge task pass params * Fixed enableMerge cases * Merge task documentation on README.md * Fixed typo on merge task description * Update doAzureParallel.R * Changed enableMerge to enableCloudCombine * convert getJobResult output from binary to text * Only write vector to temp file * save cloud merge enabled, chunk size and packages as job metadata * update cloudMergeEnabled to cloudCombineEnabled * Fix/backwards compatible (#68) * Added backwards compatible in make cluster * Added deprecated config validator * Added mismatch label * Added validation for quota limits and bad getPool requests in waitForNodesToComplete (#52) * Added validation for quota limits and bad getPool requests * Fixed based on PR * Fixed progress bar layout to use switch statements instead of if statements * Changed clusterId to poolId * Added comments and fixed messages * Added running state to the node status * Reformatted lines for function * Added end statement for node completion * Feature/custom script and reduce (#70) * Added custom scripts and removed dependencies parameter * Updated roxygen tool version * Added parallelThreads support * Added test coverage * Removed verbose message on command line * Added Reduce function for group of tasks * Fix build because of doc semantics mismatch with function * Removed unused function * Added command line arg * Added docs for custom script * Moved customize cluster to separate doc for future usage * Fixed typo * Bug - Waiting for tasks to completion function ends too early (#69) * Moved wait for tasks to complete to doAzureParallel utility * Removed unneeded variables and progress * Fixed camel case for skiptoken * Travis/lintr (#72) * Added lintr config file * Added travis github package installation * Removed snake case rule * Fixed documents on doAzureParallel * Based on lintr default_settins docs, correctly added default rules * Updated lintr package to use object_name_style * Added package :: operator * Reformatted after merge * Fixed command line tests * Upgraded roxygen to 6.0.1 * Cluster config docs * Removed additional delete job * add getJob api (#84) * add getJob api * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * update getJobList parameter from list of job ids to filter object, and output jobs status in data frame (#128) long running job support, getJob, getJobList and getJobResult implementation * reformat code * update styling in utility file * fix code styling * update chunksize to chunkSize and remove unused code * handle job metadata in getJob api * fix styling issue * use counting service api in getJobList * fix coding style * return data frame from getJobList * update getJobList parameter from job id list to filter by state * reformat code * update description for getJobList * remove dup code * address review feedback * jobId parameter check for getJobResult * update documentation for long run job * update version to 0.5.0 * update version * address review feedback * update chunkSizeValue to chunkSizeKeyValuePair * Validate job names and pool names (#129) * Added validator class * Added validators for lintr * Added exclusion list for validators * fix bug in metadata handling for packages and enableCloudCombine (#133) * fix bug in metadata handling for packages and enableCloudCombine * call long running job api in test * update test * add test for long running job feature * code style fix * update job state description in readme * use list for job state filter * address review feedback