-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathtree_hackathon.Rmd
175 lines (140 loc) · 6.36 KB
/
tree_hackathon.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# tree hackathon!
## Implementation idea
The idea would be that users do `chrono.subsets` with all their desired options to generate the time bins (this is already implemented and solid), and then they could use the function `get.tree` to extract the trees from the resulting `dispRity` objects.
In the case of the time bins, it would return a list of length number of time bins with each list element containing a `multiPhylo` object that itself contains at least one tree (the one in the time bin).
For example, if we have this tree:
```{r}
## Testing detect edges and get new tree
set.seed(1)
simple_tree <- rtree(5)
simple_tree$edge.length <- rep(1, Nedge(simple_tree))
simple_tree$root.time <- 4
simple_tree$node.label <- letters[1:4]
plot(simple_tree, show.tip.label = FALSE); axisPhylo()
nodelabels()
nodelabels(simple_tree$node.label, adj = -1, col = "blue")
edgelabels()
tiplabels()
tiplabels(simple_tree$tip.label, adj = -1, col = "blue")
abline(v = c(0, 1, 2, 3, 3.8), col = "grey", lty = 2)
```
We can have the pipeline looking something like:
```{r, eval = FALSE}
## Some data (to be ignored)
ignore_data <- matrix(1, ncol = 1, nrow = 9, dimnames = list(c(simple_tree$tip.label, simple_tree$node.label)))
## Using chrono.subsets to do the time bins
my_bins <- chrono.subsets(data = ignore_data, tree = simple_tree, method = "discrete", time = c(0, 1, 2, 3, 3.8))
## Get the trees
my_subtrees <- get.tree(my_bins)
```
You can then measure branch length for each tree using something like:
```{r}
my_branch_lengths_per_bin <- lapply(my_subtrees, lapply, function(x) sum(x$edge.length))
```
Using `chrono.subsets` and `get.tree` allows to generalise the method for any type of time bins (e.g. with tips or nodes spanning across time - FADLAD), and any number of trees (e.g. using a tree distribution rather than a single tree).
### Output format needed
For all that to work smoothly we want `my_subtrees` in this example to be something like:
```
my_subtrees
|
\---<first_bin_name>
| |
| \--<"multiPhylo"> = list of trees in the first bin for the first tree
| (if only one subtree is in that bin, it's a multiPhylo of length 1)
|
\---<second_bin_name>
| |
| \--<"multiPhylo"> = list of trees in the first bin for the first tree
|
\--- etc.
```
Easy!
# What does `chrono.subsets` outputs
(this generalises to `custom.subsets` by the way - but ignore for this part of the implementation).
```
object
|
\---$matrix* = class:"list" (a list containing the orginal matrix/matrices)
| |
| \---[[1]]* = class:"matrix" (the matrix (trait-space))
| |
| \---[[...]] = class:"matrix" (any additional matrices)
|
|
\---$tree* = class:"multiPhylo" (a list containing the attached tree(s) or NULL)
| |
| \---[[1]] = class:"phylo" (the first tree)
| |
| \---[[...]] = class:"phylo" (any additional trees)
|
|
\---$call* = class:"list" (details of the methods used)
| |
| \---$subsets = class:"character": some info about the subset type (discrete, continuous, custom)
| |
| \---more stuff. IGNORE
|
\---$subsets* = class:"list" (subsets as a list)
| |
| \---[[1]]* = class:"list" (first item in subsets list)
| | |
| | \---$elements* = class:"matrix" (one column matrix containing the elements within the first subset)
| | |
| | \---[[2]] = IGNORE (for bootstrap and rarefaction)
| |
| \---[[2]] = class:"list" (second item in subsets list)
| | |
| | \---$elements* = class:"matrix" (one column matrix containing the elements within the second subset)
| | |
| | \---[[2]] = IGNORE (for bootstrap and rarefaction)
| |
| \---[[...]] = class:"list" (the following subsets)
| |
| \---$elements* = class:"matrix" (a one column matrix containing the elements within this subset)
| |
| \---[[...]] = IGNORE (for bootstrap and rarefaction)
|
\--- more stuff. IGNORE
```
Basically most info about the subsets generated by `chrono.subsets` is either in `data$call$subsets` for the meta data and in `data$subsets` for the actually data.
The names of the subsets can be extracted using `name.subsets()`.
What is in each subset is in the `$elements` part, it's a matrix of usually one column with the ID of the elements.
For example:
```
data$subsets[[1]]
$elements
[,1]
[1,] 1
[2,] 2
[3,] 3
```
Here the integers 1:3 in the column refer to the row numbers in the data. So you can access the sub-matrix using `data$matrix[[1]][data$subsets[[1]]$elements[, 1]]` or it's rownames using `rownames(data$matrix[[1]][data$subsets[[1]]$elements[, 1]])`.
But no need to worry here. This is handled by some other bits.
# Using these elements to get the sub trees
The idea is to use these elements though to access the parts of the subtree and generate it as an output.
So for example, if elements 1 to 3 are the tips `t4` and `t3` and the node `d` in the simple tree above, it should output the sub tree `(d(t4,t3))`.
I've implementing this in the following files:
* testing: `dispRity/test/testthat/test-dispRity.utilities.R` l.725 ("get.tree with subsets")
* code: `dispRity/R/dispRity.utilities.R`: l.539 `get.tree.R`
* code: `dispRity/R/dispRity.utilities_fun.R`: l.137 `detect.edges`, `get.new.tree`, `get.one.tree.subset`.
The best way to load is to go to the root of the repo and use `devtools::load_all()` to get all the functions linked:
```{r}
setwd("dispRity/")
devtools::load_all()
```
So far I have:
1. find the edges connecting the elements (`detect.edges`) which is used in 2
2. using `get.new.tree` to get the tree using the elements as inputs which is used in 3
3. using `get.one.tree.subset` as a function to `lapply` across all subsets in `get.tree`
The problem though is that:
* it doesn't cut branch lengths
* it doesn't take age into account
* it doesn't really works
# Working idea
1. Feed the list of elements (in our case element 9, 2)
2. Feed the boundary ages (1 - 0.2)
* Edit the age boundaries to that the old age becomes the root of the tree and the young age becomes the cutting point
3. Recreate the subtree spanning from the selected elements (descendant from them)
* Make sure that the elements names are kept
4. Using `slice.tree` to cut the upper boundary (young age)
5. Return the new subtree