I am writing a code to compare topologies of 2 trees for one of my projects. It calculates a triplet distance between 2 trees by counting concordant/discordant partial trees with 3 tips. It runs with the order of n3, where n is the number of tips. This order is not very fast, but still usable for small scale analyses like mine. (Apparently, much faster algorithms with the n*log(n) order was developed recently [link]. Looks intimidatingly difficult, but very interesting.)
While I am writing a code, I came up with a small question. Which pairs of trees have the largest distance?
My initial guess was a pair of trees like a picture below.
They look different. Indeed the distance between them is 1.0, which means all triplets are discordant between them.
The right tree’s branching order is identical to the left one while the order of tips is reversed. It appears that the tip ordering determines the distance on this comb-like tree. For instance,
(1,(2,(3,(4,(5,6))))); – (1,(2,(3,(5,(4,6))))); -> 0.05
(1,(2,(3,(4,(5,6))))); – (1,(3,(2,(5,(4,6))))); -> 0.2
(1,(2,(3,(4,(5,6))))); – (3,(1,(2,(5,(4,6))))); -> 0.4
When you swap more, you have larger distance. Actually, the topology distance is highly correlated with the permutation distance of the tip labels (plot).
(The distance between 2 tip permutations is calculated by counting the number of swaps needed to convert one into another. This measure is called Kendall tau distance which can be calculated in n*log(n) time scale. It could be used as a part of a faster measure of topological distance. However, not all of swaps are equal, eg, swapping 5 with 6 doesn’t change topology. I don’t know how to accommodate this into the algorithm.)
The topology is more different when you have more swaps on tip labels, and the maximum distance is attained when you completely reverse the tip order.
But does this rule apply to other tree shapes? Can I swap the tip labels of the tree, like (((1,2),(3,4)),(5,6));, to have the maximum distance from it?
The answer is no. Any one of 720 tip-permuted trees don’t have the distance 1.0. (In addition, the Kendall tau distance is no longer correlated with topological distance.)
While the “Pectinate(comb-like)” tree have wide range of distance just by swapping tips (right), the “Balanced” tree have a limited number of distance because its symmetrical shape.(left).
So, how about changing the branch ordering of the tree as well as tip ordering? Are there trees which are the most distant from the “Balanced” tree in the whole tree space? Unexpectedly, no. (at least for me it was unexpected.)
Even among the 945 all possible topologies, the maximum topological distance from the tree was 0.9 (left). On the other hand, the comb-like tree had the distance ranging from 0.0 to 1.0 (right).
The maximum attainable distance from a tree depends on the tree’s topology. This might mean you need a bit of consideration when you compare tree topologies , say, comparing between reconstructed trees from multiple loci. The distance measures between the trees might have slightly different meanings.
The code used to generate all permutations of tips is below.
import sys def permutation(L1, L2=): if len(L1) == 0: yield L2 else: for i in range(len(L1)): for p in permutation(L1[0:i]+L1[(i+1):], L2 + [L1[i]]): yield p if __name__=="__main__": N = int(sys.argv) L = [i+1 for i in range(N)] for p in permutation(L): print ",".join(["%d"]*len(p)) % tuple(p) #print "(%d,(%d,(%d,(%d,(%d,%d)))));" % tuple(p) #print out 6tip trees #print "(((%d,%d),(%d,%d)),(%d,%d));" % tuple(p)
And the code for all possible topologies.
import sys class TreeNode: def __init__(self, name="", parent=None): self.name = name self.parent = parent self.left = None self.right = None def __str__(self): if self.is_terminal(): if self.name: str = "%s" % self.name else: str = " " return str else: str = "(" str += self.left.__str__() + "," str += self.right.__str__() str += ")" return (str) def is_terminal(self): return (self.left==None and self.right==None) def is_root(self): return (self.parent==None) def traverse(self): yield (self) if self.left: for node in self.left.traverse(): yield (node) if self.right: for node in self.right.traverse(): yield (node) def clone(self): nod = TreeNode(name=self.name) if self.left: nod.left = self.left.clone() nod.left.parent = nod if self.right: nod.right = self.right.clone() nod.right.parent = nod return nod def insert_node(tr, ntip): nnod = 2*ntip - 1 trs = [tr.clone() for i in range(nnod)] for i in range(nnod): count = 0 for nod in trs[i].traverse(): if count == i: if nod.is_root(): root = TreeNode() nod.parent = root root.left = nod root.right = TreeNode(name=ntip+1, parent=root) trs[i] = root else: innod = TreeNode(parent=nod.parent) innod.left = nod innod.right = TreeNode(name=ntip+1, parent=innod) if nod == nod.parent.left: nod.parent.left = innod elif nod == nod.parent.right: nod.parent.right = innod nod.parent = innod count += 1 return trs def generate_all_topology(N): if N == 1: return [TreeNode(name=1)] if N == 2: root = TreeNode() root.left = TreeNode(name=1, parent=root) root.right = TreeNode(name=2, parent=root) return [root] if N >= 3: root = TreeNode() root.left = TreeNode(name=1, parent=root) root.right = TreeNode(name=2, parent=root) init = [root] ntip = 2 while ntip < N: update =  for tr in init: update.extend(insert_node(tr, ntip)) init = update ntip += 1 return init if __name__=="__main__": for tr in generate_all_topology(int(sys.argv)): print tr.__str__() + ";"