Multilocus delimitation with tr2: Guide tree approach

The second option of delimitation with the “tr2” is a guide tree approach. A guide tree is a tree which specify a hierarchical structure of species grouping. By using it, you can significantly reduce the number of possible delimitation hypotheses to search.

The tr2 implements an algorithm to find a best position of nodes which define species group under a given guide tree. As the algorithm is reasonably fast, you can search the best delimitation through a tree from each tip is distinct species to all tips are from one single species.

Now, acceptable size of the number of taxa on guide tree is around 100. If you exceed 200 taxa, the memory requirement usually becomes huge and normal desktop computers can not handle. The current limitation of the number of input trees, that is, the limit of number of loci, is ~ 1000. Theoretically, it can be larger, but a problem on numerical calculation now limits the number of loci you can use.

To run tr2 with a guide tree, you need a newick formatted guide tree file, and a gene tree file also in newick format.

Only the first line of the guide tree file is used. In the standard analysis, guide tree tips must contain all taxa found in gene trees. Guide trees can be built any methods such as concatenated ML (eg. RAxML) or coalescent-based species tree methods (eg. ASTRAL). Most importantly , it must be properly rooted. Incorrect rooting often results in over-splitting.

The gene tree file must contain one tree per line. They must be rooted too. Missing taxa are allowed.

Once two files are ready, the command below starts a search algorithm.

./ -g guide.tre -t genetree.tre

To test with bundled example files, use files in “sim4sp” directory.

./ -g sim4sp/4sp.nex10.RTC.tre -t sim4sp/simulated.gene.trees.nex10.4sp.tre

After some intermediate outputs, you will see a tree with delimitation results and a table.

write: <stdout>
species sample
1     12.3
1    11.3
1    15.3
1    14.3
1    13.3
2    7.2
2    10.2

If the tree and table are too large on your console screen, use “-o” option to output them into files.

./ -o out -g sim4sp/4sp.nex10.RTC.tre -t sim4sp/simulated.gene.trees.nex10.4sp.tre

This command ouputs a table and a tree into “out.table.txt” and “out.tre” respectively. You can see the results using any programs.

Now, I use R + “ape” package to check a delimitation result.

tr <- read.tree("./out.tre")
nodelabels(text=substr(tr$node.label,1,6), bg="white")

These R commands plots the guide tree and delimitation like the  below picture.


The numbers on the nodes indicate average differences of posterior probability scores. If they are positive the node has between-species branches. The negative values suggests that nodes are within species. “nan” indicates there are not enough samples to split/merge the node. “*” signs show the best position of delimitation. In this case, there are 4 putative species.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s