Skip to content

Commit

Permalink
Merge pull request #13 from databio/dev
Browse files Browse the repository at this point in the history
v 0.3 release
  • Loading branch information
nsheff authored Aug 21, 2017
2 parents 41d6261 + eb363dc commit b4e1e8b
Show file tree
Hide file tree
Showing 16 changed files with 152 additions and 89 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: simpleCache
Version: 0.2.1
Date: 2017-08-03
Version: 0.3.0
Date: 2017-08-21
Title: Simply Caching R Objects
Description: Provides intuitive functions for caching R objects, encouraging
reproducible, restartable, and distributed R analysis. The user selects a
Expand Down
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(addCacheSearchEnvironment)
export(availCaches)
export(listCaches)
export(loadCaches)
export(resetCacheSearchEnvironment)
export(setCacheBuildDir)
Expand Down
16 changes: 10 additions & 6 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
# Change log
All notable changes to this project will be documented in this file.

## [0.2.1] -- 2017-07-30
## [0.3.0] -- 2017-08-21

### Added
- Switched default cache dir to tempdir()
- changed availCaches() to listCaches()
- changes cache building to happen in parent.frame(), so that any loaded
packages are available for cache building

- Added examples.

## [0.2.0] -- 2017-07-30
## [0.2.1] -- 2017-07-30

### Added
- Added examples

## [0.2.0] -- 2017-07-30

- support for batchjobs parallel processing
- docs, prep for submission to CRAN

## [0.0.1]

Long-term stable version
- Long-term stable version
4 changes: 2 additions & 2 deletions R/examples/example.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# choose location to store caches
cacheDir = system.file("cache", package="simpleCache")
cacheDir = tempdir()
cacheDir
setCacheDir(cacheDir)

Expand All @@ -13,7 +13,7 @@ normSample2 = rnorm(10, 0, 1)
storeCache("normSample2")

# what's available?
availCaches()
listCaches()

# load a cache
simpleCache("normSample")
Expand Down
14 changes: 14 additions & 0 deletions R/listCaches.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#' Show available caches.
#'
#' Lists any cache files in the cache directory.
#'
#' @param cacheSubDir Optional parameter to specify a subdirectory of the cache folder.
#' @return \code{character} vector in which each element is the path to a file that
#' represents an available cache (within \code{getOption("RCACHE.DIR")})
#' @export
#' @example
#' R/examples/example.R
listCaches = function(cacheSubDir="") {
list.files(paste0(getOption("RCACHE.DIR"), cacheSubDir))
}

14 changes: 0 additions & 14 deletions R/loadCaches.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,3 @@ loadCaches = function(cacheNames, ...) {
simpleCache(cacheNames[i], loadEnvir=parent.frame(n=2), ...)
}
}


#' Show available caches.
#'
#' Lists any cache files in the cache directory.
#'
#' @param cacheSubDir Optional parameter to specify a subdirectory of the cache folder.
#' @export
#' @example
#' R/examples/example.R
availCaches = function(cacheSubDir="") {
list.files(paste0(getOption("RCACHE.DIR"), cacheSubDir))
}

36 changes: 29 additions & 7 deletions R/simpleCache.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,11 @@ simpleCache = function(cacheName, instruction=NULL, buildEnvir=NULL,
cacheDir = file.path(cacheDir, cacheSubDir)
}
if (is.null(cacheDir)) {
message(strwrap("You must set global option RCACHE.DIR with setSharedCacheDir(),
or specify a cacheDir parameter directly to simpleCache()."))
return(NA)
message(strwrap("No cacheDir specified. You should set global option
RCACHE.DIR with setCacheDir(), or specify a cacheDir parameter directly
to simpleCache(). With no other option, simpleCache will use tempdir():
", initial="", prefix=" "), tempdir())
cacheDir = tempdir()
}
if (!"character" %in% class(cacheName)) {
stop("simpleCache expects the cacheName variable to be a character
Expand Down Expand Up @@ -245,23 +247,43 @@ simpleCache = function(cacheName, instruction=NULL, buildEnvir=NULL,
# No cluster submission request, so just run it here!
# "ret," for return, is the name the cacheName is stored under.
if (parse) {
ret = eval(parse(text=instruction))
ret = eval(parse(text=instruction), envir=parent.frame())
} else {
ret = eval( instruction )
# Here we do the evaluation in the parent frame so that
# it will have access to any packages the user has loaded
# that may be required to run the code. Otherwise, it will
# run in the simpleCache namespace which could lack these
# packages (or have a different search path hierarchy),
# leading to failures. The `substitute` call here ensures
# the code isn't evaluated at argument stage, but is retained
# until it makes it to the `eval` call.
ret = eval(instruction, envir=parent.frame())
}
}
if (timer) { toc() }
} else {
# Build environment was provided.
# we must place the instruction in the environment to build from
if (exists("instruction", buildEnvir)) {
stop("Can't provide a variable named 'instruction' in buildEnvir")
}
buildEnvir$instruction = instruction
be = as.environment(buildEnvir)
# As described above, this puts global package functions into
# scope so instructions can use them.
parent.env(be) = parent.frame()
if (timer) { tic() }
if (parse) {
ret = with(buildEnvir, eval(parse(text=instruction)))
ret = with(be, eval(parse(text=instruction)))
} else {
ret = with(buildEnvir, eval(instruction))
#ret = with(buildEnvir, evalq(instruction))
ret = with(be, eval(instruction))

}
if (timer) { toc() }
}
}

# tryCatch
}, error = function(e) { if (nofail) warning(e) else stop(e) })

Expand Down
71 changes: 53 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,56 @@
simpleCache: R caching for restartable analysis
-----------------------------------------------

<a href="https://travis-ci.org/databio/simpleCache"><img src="https://travis-ci.org/databio/simpleCache.svg?branch=master" alt="Travis CI status"></img>
</a>
<a href="https://travis-ci.org/databio/simpleCache"><img src="https://travis-ci.org/databio/simpleCache.svg?branch=master" alt="Travis CI status"></img></a>

`simpleCache` is an R package providing functions for caching R objects. Its purpose is to encourage writing reusable, restartable, and reproducible analysis pipelines for projects with massive data and computational requirements.
`simpleCache` is an R package providing functions for caching R objects. Its
purpose is to encourage writing reusable, restartable, and reproducible analysis
pipelines for projects with massive data and computational requirements.

Like its name indicates, `simpleCache` is intended to be simple. You choose a location to store your caches, and then provide the function with nothing more than a cache name and instructions (R code) for how to produce the R object. While simple, `simpleCache` also provides some advanced options like environment assignments, recreating caches, reloading caches, and even cluster compute bindings (using the `batchtools` package) making it flexible enough for use in large-scale data analysis projects.
Like its name indicates, `simpleCache` is intended to be simple. You choose a
location to store your caches, and then provide the function with nothing more
than a cache name and instructions (R code) for how to produce the R object.
While simple, `simpleCache` also provides some advanced options like environment
assignments, recreating caches, reloading caches, and even cluster compute
bindings (using the `batchtools` package) making it flexible enough for use in
large-scale data analysis projects.

--------------------------------------------------------------------------------
### Installing simpleCache
Install the development version directly from github with devtools

`simpleCache` is on
[CRAN](https://cran.r-project.org/web/packages/simpleCache/index.html) and can
be installed as usual:

```
install.packages("simpleCache")
```

If you like, you may install the development version directly from github with
devtools

```
devtools::install_github("databio/simpleCache")
```

To install a local copy:
```
packageFolder = "~/R/simpleCache";
install.packages(packageFolder, repos=NULL)
packageFolder = "~/R/simpleCache"; install.packages(packageFolder, repos=NULL)
```

--------------------------------------------------------------------------------
### Running simpleCache

`simpleCache` comes with a single primary function that will do almost everything you need. In short, you run it with a few lines like this:
`simpleCache` comes with a single primary function that will do almost
everything you need. In short, you run it with a few lines like this:
```
library(simpleCache)
setCacheDir("~")
simpleCache("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE)
simpleCache("normSample", { rnorm(1e7, 0,1) })
library(simpleCache) setCacheDir(tempdir()) simpleCache("normSample", {
rnorm(1e7, 0,1) }, recreate=TRUE) simpleCache("normSample", { rnorm(1e7, 0,1) })
```

`simpleCache` also interfaces with the `batchtools` package to let you build caches on any cluster resource manager. I have produced some [R vignettes](vignettes/) to get you started.
`simpleCache` also interfaces with the `batchtools` package to let you build
caches on any cluster resource manager. I have produced some [R
vignettes](vignettes/) to get you started.

* [An introduction to simpleCache](vignettes/simpleCacheIntroduction.Rmd)
* [Sharing caches across projects](vignettes/sharingCaches.Rmd)
Expand All @@ -42,11 +59,29 @@ simpleCache("normSample", { rnorm(1e7, 0,1) })
--------------------------------------------------------------------------------
### simpleCache Philosophy

The use case I had in mind for `simpleCache` is that you find yourself constantly recalculating the same R object in several different scripts, or repeatedly in the same script, every time you open it and want to continue that project. SimpleCache is well-suited for interactive analysis, allowing you to pick up right where you left off in a new R session, without having to recalculate everything. It is equally useful in automatic pipelines, where separate scripts may benefit from loading, instead of recalculating, the same R objects produced by other scripts.

R provides some base functions (`save`, `serialize`, and `load`) to let you save and reload such objects, but these low-level functions are a bit cumbersome. `simpleCache` simply provides a convenient, user-friendly interface to these functions, streamlining the process. For example, a single `simpleCache` call will check for a cache and load it if it exists, or create it if it does not. With the base R `save` and `load` functions, you can't just write a single function call and then run the same thing every time you start the script -- even this simple use case requires additional logic to check for an existing cache. SimpleCache just does all this for you.

They thing to keep in mind with simpleCache is that **the cache name is paramount**. SimpleCache assumes that your name for an object is a perfect identifier for that object; in other words, don't cache things that you plan to change.
The use case I had in mind for `simpleCache` is that you find yourself
constantly recalculating the same R object in several different scripts, or
repeatedly in the same script, every time you open it and want to continue that
project. SimpleCache is well-suited for interactive analysis, allowing you to
pick up right where you left off in a new R session, without having to
recalculate everything. It is equally useful in automatic pipelines, where
separate scripts may benefit from loading, instead of recalculating, the same R
objects produced by other scripts.

R provides some base functions (`save`, `serialize`, and `load`) to let you save
and reload such objects, but these low-level functions are a bit cumbersome.
`simpleCache` simply provides a convenient, user-friendly interface to these
functions, streamlining the process. For example, a single `simpleCache` call
will check for a cache and load it if it exists, or create it if it does not.
With the base R `save` and `load` functions, you can't just write a single
function call and then run the same thing every time you start the script --
even this simple use case requires additional logic to check for an existing
cache. SimpleCache just does all this for you.

They thing to keep in mind with simpleCache is that **the cache name is
paramount**. SimpleCache assumes that your name for an object is a perfect
identifier for that object; in other words, don't cache things that you plan to
change.



Expand Down
16 changes: 10 additions & 6 deletions man/availCaches.Rd → man/listCaches.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/loadCaches.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/setCacheDir.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/simpleCache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/storeCache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 2 additions & 4 deletions vignettes/clusterCaches.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,19 @@ To do this, first, create a `batchtools` registry. You can follow more detailed

```{r Try it out, eval=FALSE}
library(simpleCache)
setCacheDir(getwd())
setCacheDir(tempdir())
registry = batchtools::makeRegistry(NA)
templateFile = system.file("templates/slurm-advanced.tmpl", package = "simpleCache")
registry$cluster.functions = batchtools::makeClusterFunctionsSlurm(
template = templateFile)
registry
```

Notice that I'm using a custom slurm template here. With a registry in hand, we next need to define the resources this cache job will require:

```
```{r}
resources = list(ncpus=1, memory=1000, walltime=60, partition="serial")
```

Then, we simply add these as arguments to `simpleCache()` like so:
Expand Down
Loading

0 comments on commit b4e1e8b

Please sign in to comment.