-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.md~
124 lines (78 loc) · 4.09 KB
/
README.md~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
SuppData_Metazoa_2017
---
This repository contains supplemental materials from [citation]:
## Dataset
**alignements_1719genes.tgz**
This archive contains the 1719 alignments as obtained after our complete dataset building protocol.
**supermatrix_97sp_401632pos_1719genes.fasta**
this file corresponds to the concatenation of the entire dataset (1719 gene alignements).
**supermatrix_90sp_136618pos_heterop60.puz**
Dataset after the removal of the 60% most heteropecileous sites.
**supermatrix_90sp_102464pos_heterop70.puz**
Dataset after the removal of the 70% most heteropecileous sites.
**supermatrix_90sp_268032pos_dayhoff6.phy**
Dayhoff6 recoding of our dataset (constant sites were removed).
You probably want to remove the first line of this file when using other softwares than PhyloBayes.
**partition_401632pos_1719genes.part**
Partitionning scheme corresponding to the boundaries of the 1719 genes.
**supermatrix_whelan2015_81006pos_NoDemo.phy**
Dataset from Whelan et al. 2015 in which demosponges have been removed.
**supermatrix_whelan2015_81006pos_NoDemoCalcHomo.phy**
Dataset from Whelan et al. 2015 in which all sponges except hexactinellides have been removed.
**partition_whelan2015_81006pos.part**
Partitionning scheme corresponding to the boundaries of the 251 genes from Whelan et al. 2015.
## Trees
### complete dataset analyses
**tree_97sp_CAT.tre**
10 jackknife replicates of 100,000 position each - CAT+G4 (Phylobayes)
**tree_90sp_CAT.tre**
100 jackknife replicates of 100,000 position each - CAT+G4 (Phylobayes)
**tree_97sp_LGF-PARTITION.tre**
100 bootstraps - partitionning by gene - LG+G4+F (RAxML)
**tree_90sp_LGF-PARTITION.tre**
100 bootstraps - partitionning by gene - LG+G4+F (RAxML)
### removal of heteropecileous sites
**tree_90sp_CAT_heterop60.tre**
2 independant MCMC chains - CAT+G4 (Phylobayes)
**tree_90sp_CAT_heterop70.tre**
2 independant MCMC chains - CAT+G4 (Phylobayes)
### model comparison when reducing taxonomic sampling
**tree_NoDemo_CAT.tre**
10 jackknife replicates of 100,000 position each - CAT+G4 (Phylobayes)
**tree_NoDemo_LGF-PARTITION.tre**
100 bootstraps - partitionning by gene - LG+G4+F (RAxML)
**tree_NoDemoCalcHomo_CAT.tre**
10 jackknife replicates of 100,000 position each - CAT+G4 (Phylobayes)
**tree_NoDemoCalcHomo_LGF-PARTITION.tre**
100 bootstraps - partitionning by gene - LG+G4+F (RAxML)
**tree_Whelan2015_NoDemo_CAT.tre**
2 independant MCMC chains - CAT+G4 (Phylobayes)
**tree_Whelan2015_NoDemo_LGF-PARTITION.tre**
100 bootstraps - partitionning by gene - LG+G4+F (RAxML)
**tree_Whelan2015_NoDemoCalcHomo_CAT.tre**
2 independant MCMC chains - CAT+G4 (Phylobayes)
**tree_Whelan2015_NoDemoCalcHomo_LGF-PARTITION.tre**
100 bootstraps - partitionning by gene - LG+G4+F (RAxML)
### Pipeline cleaning example : the rpl2 gene
**tree_rpl2_FIFO.pdf**
rpl2 tree when using initial datasets (right after the Filter Focus step - FIFO).
**tree_rpl2_DCC.pdf**
rpl2 tree when using datasets after the De-Cross-Contamination step (DCC).
**tree_rpl2_DC1.pdf**
rpl2 tree when using datasets after the De-Contamination step 1 (DC1).
**tree_rpl2_DC2.pdf**
rpl2 tree when using datasets after the De-Contamination step 2 (DC2).
**tree_rpl2_DC3.pdf**
rpl2 tree when using datasets after the De-Contamination step 3 (DC3).
**tree_rpl2_FINAL.pdf**
Final rpl2 tree after all cleaning steps of our pipeline, as used for supermatrix concatenation.
## Softwares
**utilities_src.tgz**
These are the sources of the C programs used in our dataset assembly procedure.
Please note that the "De-Cross-Contamination" (DCC) step of our procedure has been re-worked
as a dedicated software (named "CroCo") currently under final development and that will be published elsewhere.
## Information
**choanoflagellate_names.xls**
This table contains the correspondance between old names used for choanoflagellate data and their recently published valid names
([Carr et al. 2016](http://www.sciencedirect.com/science/article/pii/S1055790316302743)).
While recently published valid names are used in our article, we make this table available for clarity.