tree_hackathon.Rmd

# tree hackathon!

## Implementation idea

The idea would be that users do `chrono.subsets` with all their desired options to generate the time bins (this is already implemented and solid), and then they could use the function `get.tree` to extract the trees from the resulting `dispRity` objects.
In the case of the time bins, it would return a list of length number of time bins with each list element containing a `multiPhylo` object that itself contains at least one tree (the one in the time bin).

For example, if we have this tree:

```{r}
## Testing detect edges and get new tree
set.seed(1)
simple_tree <- rtree(5)
simple_tree$edge.length <- rep(1, Nedge(simple_tree))
simple_tree$root.time <- 4
simple_tree$node.label <- letters[1:4]
plot(simple_tree, show.tip.label = FALSE); axisPhylo()
nodelabels()
nodelabels(simple_tree$node.label, adj = -1, col = "blue")
edgelabels()
tiplabels()
tiplabels(simple_tree$tip.label, adj = -1, col = "blue")
abline(v = c(0, 1, 2, 3, 3.8), col = "grey", lty = 2)
```

We can have the pipeline looking something like:

```{r, eval = FALSE}
## Some data (to be ignored)
ignore_data <- matrix(1, ncol = 1, nrow = 9, dimnames = list(c(simple_tree$tip.label, simple_tree$node.label)))

## Using chrono.subsets to do the time bins
my_bins <- chrono.subsets(data = ignore_data, tree = simple_tree, method = "discrete", time = c(0, 1, 2, 3, 3.8))

## Get the trees
my_subtrees <- get.tree(my_bins)
```

You can then measure branch length for each tree using something like:

```{r}
my_branch_lengths_per_bin <- lapply(my_subtrees, lapply, function(x) sum(x$edge.length))
```

Using `chrono.subsets` and `get.tree` allows to generalise the method for any type of time bins (e.g. with tips or nodes spanning across time - FADLAD), and any number of trees (e.g. using a tree distribution rather than a single tree).

### Output format needed

For all that to work smoothly we want `my_subtrees` in this example to be something like:

```
my_subtrees
   |
   \---<first_bin_name>
   |    |
   |    \--<"multiPhylo"> = list of trees in the first bin for the first tree
   |                        (if only one subtree is in that bin, it's a multiPhylo of length 1)
   |
   \---<second_bin_name>
   |    |
   |    \--<"multiPhylo"> = list of trees in the first bin for the first tree
   |
   \--- etc.
```

Easy!

# What does `chrono.subsets` outputs

(this generalises to `custom.subsets` by the way - but ignore for this part of the implementation).

```
object
    |
    \---$matrix* = class:"list" (a list containing the orginal matrix/matrices)
    |   |
    |   \---[[1]]* = class:"matrix" (the matrix (trait-space))
    |   |
    |   \---[[...]] = class:"matrix" (any additional matrices)
    |
    |
    \---$tree* = class:"multiPhylo" (a list containing the attached tree(s) or NULL)
    |   |
    |   \---[[1]] = class:"phylo" (the first tree)
    |   |
    |   \---[[...]] = class:"phylo" (any additional trees)  
    |
    |
    \---$call* = class:"list" (details of the methods used)
    |   |
    |   \---$subsets = class:"character": some info about the subset type (discrete, continuous, custom)
    |   |
    |   \---more stuff. IGNORE
    |
    \---$subsets* = class:"list" (subsets as a list)
    |   |
    |   \---[[1]]* = class:"list" (first item in subsets list)
    |   |   |
    |   |   \---$elements* = class:"matrix" (one column matrix containing the elements within the first subset)
    |   |   |
    |   |   \---[[2]] = IGNORE (for bootstrap and rarefaction)
    |   |
    |   \---[[2]] = class:"list" (second item in subsets list)
    |   |   |
    |   |   \---$elements* = class:"matrix" (one column matrix containing the elements within the second subset)
    |   |   |
    |   |   \---[[2]] = IGNORE (for bootstrap and rarefaction)
    |   |
    |   \---[[...]] = class:"list" (the following subsets)
    |       |
    |       \---$elements* = class:"matrix" (a one column matrix containing the elements within this subset)
    |       |
    |       \---[[...]] = IGNORE (for bootstrap and rarefaction)
    |
    \--- more stuff. IGNORE
```

Basically most info about the subsets generated by `chrono.subsets` is either in `data$call$subsets` for the meta data and in `data$subsets` for the actually data.
The names of the subsets can be extracted using `name.subsets()`.
What is in each subset is in the `$elements` part, it's a matrix of usually one column with the ID of the elements.

For example:
```
data$subsets[[1]]
    $elements
         [,1]
    [1,]    1
    [2,]    2
    [3,]    3
```
Here the integers 1:3 in the column refer to the row numbers in the data. So you can access the sub-matrix using `data$matrix[[1]][data$subsets[[1]]$elements[, 1]]` or it's rownames using `rownames(data$matrix[[1]][data$subsets[[1]]$elements[, 1]])`.
But no need to worry here. This is handled by some other bits.

# Using these elements to get the sub trees

The idea is to use these elements though to access the parts of the subtree and generate it as an output.
So for example, if elements 1 to 3 are the tips `t4` and `t3` and the node `d` in the simple tree above, it should output the sub tree `(d(t4,t3))`.

I've implementing this in the following files:

 * testing: `dispRity/test/testthat/test-dispRity.utilities.R` l.725 ("get.tree with subsets")
 * code: `dispRity/R/dispRity.utilities.R`: l.539 `get.tree.R`
 * code: `dispRity/R/dispRity.utilities_fun.R`: l.137 `detect.edges`, `get.new.tree`, `get.one.tree.subset`.

The best way to load is to go to the root of the repo and use `devtools::load_all()` to get all the functions linked:

```{r}
setwd("dispRity/")
devtools::load_all()
```

So far I have:

1. find the edges connecting the elements (`detect.edges`) which is used in 2
2. using `get.new.tree` to get the tree using the elements as inputs which is used in 3
3. using `get.one.tree.subset` as a function to `lapply` across all subsets in `get.tree`

The problem though is that:

 * it doesn't cut branch lengths
 * it doesn't take age into account
 * it doesn't really works
 

# Working idea

1. Feed the list of elements (in our case element 9, 2)
2. Feed the boundary ages (1 - 0.2)
    * Edit the age boundaries to that the old age becomes the root of the tree and the young age becomes the cutting point
3. Recreate the subtree spanning from the selected elements (descendant from them)
    * Make sure that the elements names are kept
4. Using `slice.tree` to cut the upper boundary (young age)
5. Return the new subtree