The “tr2” currently has two options for species delimitation. One is calculating posterior probability scores for user-specified delimitation hypotheses. Another option is finding the best delimitation under a guide tree, which specifies a hierarchical structure of species grouping.
The first option is probably useful to compare multiple species groupings and find the best one (such as comparing morphological species vs. mtDNA groups) while the second option can be used without any prior assignments and find species only from gene trees.
Let’s start with the first option. (I assume you have already set up an environment for tr2.)
You must have two input files: A gene tree file in Newick format and a tab-delimited text file which specify associations of species and individual samples.
In a tree file, one line must contain one gene tree. Trees can have missing taxa. They must be rooted. (Yes. The program is based on “rooted triplet”. So, trees must be rooted. If you do not have outgroups, midpoint rooting or RAxML’s “-I f” option often works well.)
In an association file, the first column represents the names of samples. They must be identical to the names of the tree tips. The second and so forth columns are species groups. You can write as many columns as you want. Also, you can use any codes to describe species names.
For example, a table below specifies three alternative delimitations of samples 16.4-20.4
19.4 4 B sp5
17.4 4 B sp5
18.4 4 B sp4
16.4 4 A sp4
20.4 4 A sp4
Association files must contain all sample names which appear in the tree file.
Once you have a tree file and an association file, simply run the tr2 command as follows.
./run_tr2.py -a sp_association.txt -t genetrees.tre
Some example files are stored in the “sim4sp” folder. If you use them to test tr2, the command is like this.
./run_tr2.py -a sim4sp/sp.assoc.4sp.txt -t sim4sp/simulated.gene.trees.nex10.4sp.tre
The outputs of this command must be like below.
write: <stdout> model score null 51391.76 model1 5.73
The score of “model1” looks much smaller than the “null” model (, which assumes all samples are from one single species). So, you can be quite confident that model1 is a better delimitation.