Skip to content
This repository has been archived by the owner on Jan 27, 2022. It is now read-only.

Commit

Permalink
feat(CodeChunk): Add more properties to CodeChunk
Browse files Browse the repository at this point in the history
  • Loading branch information
nokome authored and beneboy committed Sep 2, 2019
1 parent 68a183e commit 49c3543
Show file tree
Hide file tree
Showing 3 changed files with 221 additions and 0 deletions.
13 changes: 13 additions & 0 deletions schema/Code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Code
---

# Structure

## `language`

The `language` property specifies the

## `text`

The most important, and only required, property of a `Code` node is `text`. This property is a `string` of the chunk's source code. There are no restrictions on the length or content of `text` and it is possible for it to be syntactically valid for the `language`.
137 changes: 137 additions & 0 deletions schema/CodeChunk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: CodeChunk
---

# Structure

## Source properties

A `CodeChunk` has two 'source' properties, `text` and `language`, from which it's other properties are derived during compilation (e.g. `import`, `declares`) or execution (e.g. `outputs`).

### `text` and `language`

All `CodeChunk`s are required to have a `text` property, containing the actual code, and most will also specify the programming `language`. If `language` is not specified then it defaults to the language of the previous `CodeChunk` or `CodeExpression` in the document. So, if you are writing a document that only uses one language, you only need to `For more on these properties see [`Code`](./Code.html).

An example of a simple `CodeChunk`,

```json import=helloWorld
{
"type": "CodeChunk",
"language": "python",
"text": "print('Hello world')"
}
```

## Compilation properties

The following properties of a `CodeChunk` are populated during compilation. You can also override the compiler and specify them manually.

### `imports`

The `imports` property lists the packages that a `CodeChunk` imports into the execution context. It is used by the compiler to populate the `requires` property of the document containing the chunk so that it can build an execution environment for it (e.g. a Docker container).

- dokta regex style
- AST walking

```json import=importPython
{
"type": "CodeChunk",
"language": "python",
"text": "import matplotlib"
}
```

```yaml import=importJavascript
type: CodeChunk
language: javascript
text: |
import * as d3 from 'd3'
const kmeans = require('ml-kmeans');
imports:
- d3
- ml-kmeans
```
You can manually add packages to the `imports` property of a `CodeChunk`. When the chunk is compiled, only missing packages are added to required. If for some reason you want full control and do not want the compiler to add anything to `imports`, set the first value to the empty string.

For example, in R you can call a function in a package without first importing via a call to `library`:

```yaml import=importJavascript
type: CodeChunk
language: r
text: |
superdoopa::func("beep", "boop")
imports:
- ''
- superdoopa
```

[//] TODO: Reallife example with dplyr or something

### `declares`

The `declares` property lists the variables that a `CodeChunk` declares. It is used by the compiler to build a graph of the dependencies among `CodeChunk`s and `CodeExpressions`. This in turn allows for reactivity. When a user changes a chunk containing a declaration, all of the other chunks or expressions that use that variable will be rexecuted.

```yaml import=pythonFunction
type: CodeChunk
language: python
text: |
def greet(who: str):
return 'Hello %s!' % who
declares:
- type: Function
name: greet
parameters:
- type: Parameter
name: who
schema:
- type: StringSchema
```

### Code comment overrides

#### R

In R code chunk properties can be specified using the `@property` tags in comments:

```r
#' @imports package1, package2
#' @uses variable1, variable2
#' @declares variable3, variable4
#' @alters variable5, variable6
```

One situation in R where you may want to use a comment override is to enable reactivity for a chunk that uses a function with [non-standard evaluation](http://adv-r.had.co.nz/Computing-on-the-language.html), for example `subset`, or many of the functions in the `tidyverse`. The Stencila R compiler ignores some of the arguments supplied to these functions because of the peculiar scoping rules. e.g.

```r
#' @uses begin, end
data <- subset(all_data, year >= begin && year <= end)
```

See `stencila::nse_funcs` for a list of registered NSE functions.

Another situation is where you have a function that reads files from the file system but is not one of the registered "readers" (see `stencila::reader_funcs` for a list of registered reader functions). In this case, supply a comma separated list of the files read e.g.

```r
#' @reads ./data/all.csv, ./data/categories.csv
all_data <- special_read_csv('./data/all.csv', './data/categories.csv')
```


## Execution properties

The following properties of a `CodeChunk` are populated during execution.

## `outputs`

When a code chunk is executed...

[//] TODO:

## `errors`

[//] TODO:

## `duration`

[//] TODO:
71 changes: 71 additions & 0 deletions schema/CodeChunk.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,80 @@ status: unstable
category: code
description: A executable chunk of code.
properties:
imports:
'@id': stencila:imports
description: Software packages that the code chunk imports
type: array
items:
anyOf:
- type: string
- $ref: SoftwareSourceCode
- $ref: SoftwareApplication
declares:
'@id': stencila:declares
description: Variables that the code chunk declares.
type: array
items:
anyOf:
# String for when user want to specify manually that a cell
# declares a variable
- type: string
# For when we want to store type and possibly value of variable
- $ref: Variable
# Functions and datatables already have a name so it is
# redundant / confusing to wrap then in a variable
- $ref: Function
- $ref: Datatable
assigns:
'@id': stencila:assigns
description: Variables that the code chunk assigns to.
type: array
items:
type: string
$comment: |
In languages such as Python and R we are unable
to differentiate between an assignment that
is a declaration and an assignment that is an alteration
(a re-assignment).
The "primary" compiler keeps a track of the variables
declared in code chunks prior to the current chunk in the
document and may treat an assignment as a declaration.
alters:
'@id': stencila:alters
description: Names of variables that the code chunk alters.
type: array
items:
type: string
$comment: |
This property allows users to add undetected
alterations (i.e. not assignments) to this list. Chunks that alter a variable
need to be inserted into the dependency graph between `declares` and `uses`
uses:
'@id': stencila:uses
description: Names of variables that the code chunk uses (but does not alter).
type: array
items:
type: string
reads:
'@id': stencila:reads
description: Filesystem paths that this code chunk reads from.
type: array
items:
type: string
outputs:
'@id': stencila:outputs
description: Outputs from executing the chunk.
type: array
items:
$ref: Node
errors:
'@id': stencila:errors
description: Errors when compiling or executing the chunk.
type: array
items:
$ref: CodeChunkError
duration:
'@id': stencila:duration
description: Duration in seconds of the last execution of the chunk.
type: number
minimum: 0

0 comments on commit 49c3543

Please sign in to comment.