This section will provide information about how Azure works, how best to take advantage of Azure, and best practices when using the doAzureParallel package.
-
Azure Introduction (link)
Using the Data Science Virtual Machine (DSVM) & Azure Batch
-
Virtual Machine Sizes (link)
How do you choose the best VM type/size for your workload?
-
Autoscale (link)
Automatically scale up/down your cluster to save time and/or money.
-
Azure Limitations (link)
Learn about the limitations around the size of your cluster and the number of foreach jobs you can run in Azure.
-
Package Management (link)
Best practices for managing your R packages across your Azure pool
-
Distributing your Data (link)
Best practices and limitations for working with distributed data.
-
Parallelizing on each VM Core (link)
Best practices and limitations for parallelizing your R code to each core in each VM in your pool
-
Persistent Storage (link)
Taking advantage of persistent storage for long-running jobs