Distance between two trees

I am writing a code to compare topologies of 2 trees for one of my projects. It calculates a triplet distance between 2 trees by counting concordant/discordant partial trees with 3 tips. It runs with the order of n3, where n is the number of tips. This order is not very fast, but still usable for small scale analyses like mine. (Apparently, much faster algorithms with the n*log(n) order was developed recently [link]. Looks intimidatingly difficult, but very interesting.)

While I am writing a code,  I came up with a small question. Which pairs of trees have the largest distance?

My initial guess was a pair of trees like a picture below.

tree.pectinate

They look different. Indeed the distance between them is 1.0, which means all triplets are discordant between them.

The right tree’s branching order is identical to the left one while the order of tips is reversed. It appears that the tip ordering determines the distance on this comb-like tree. For instance,

(1,(2,(3,(4,(5,6))))); – (1,(2,(3,(5,(4,6))))); -> 0.05

(1,(2,(3,(4,(5,6))))); – (1,(3,(2,(5,(4,6))))); -> 0.2

(1,(2,(3,(4,(5,6))))); – (3,(1,(2,(5,(4,6))))); -> 0.4

When you swap more, you have larger distance. Actually, the topology distance is highly correlated with the permutation distance of the tip labels (plot).

(The distance between 2 tip permutations is calculated by counting the number of swaps needed to convert one  into another. This measure is called Kendall tau distance which can be calculated in n*log(n) time scale. It could be used as a part of a faster measure of topological distance. However, not all of swaps are equal, eg, swapping 5 with 6 doesn’t change topology.  I don’t know how to accommodate this into the algorithm.)

The topology is more different when you have more swaps on tip labels, and the maximum distance is attained when you completely reverse the tip order.

But does this rule apply to other tree shapes? Can I swap the tip labels of the tree, like (((1,2),(3,4)),(5,6));, to have the maximum distance from it?

tree.balanced

The answer is no. Any one of 720 tip-permuted trees don’t have the distance 1.0. (In addition, the Kendall tau distance is no longer correlated with topological distance.)

hist.all.perm.dist

While the “Pectinate(comb-like)” tree have wide range of distance just by swapping tips (right), the “Balanced” tree have a limited number of distance because its symmetrical shape.(left).

So, how about changing the branch ordering of the tree as well as tip ordering? Are there trees which are the most distant from the “Balanced” tree in the whole tree space? Unexpectedly, no. (at least for me it was unexpected.)

hist.all.dist

Even among the 945 all possible topologies, the maximum topological distance from the tree was 0.9 (left). On the other hand, the comb-like tree had the distance ranging from 0.0 to 1.0 (right).

The maximum attainable distance from a tree depends on the tree’s topology. This might mean you need a bit of consideration when you compare tree topologies , say, comparing between reconstructed trees from multiple loci. The distance measures between the trees might have slightly different meanings.

The code used to generate all permutations of tips is below.

import sys

def permutation(L1, L2=[]):
	if len(L1) == 0:
		yield L2
	else:
		for i in range(len(L1)):
			for p in permutation(L1[0:i]+L1[(i+1):], L2 + [L1[i]]):
				yield p

if __name__=="__main__":
	N = int(sys.argv[1])
	L = [i+1 for i in range(N)]

	for p in permutation(L):
		print ",".join(["%d"]*len(p)) % tuple(p)
		#print "(%d,(%d,(%d,(%d,(%d,%d)))));" % tuple(p) #print out 6tip trees
		#print "(((%d,%d),(%d,%d)),(%d,%d));" % tuple(p)

And the code for all possible topologies.

import sys

class TreeNode:
	def __init__(self, name="", parent=None):
		self.name = name
		self.parent = parent
		self.left = None
		self.right = None

	def __str__(self):
		if self.is_terminal():
			if self.name:
				str = "%s" % self.name
			else:
				str = " "
			return str
		else:
			str = "("
			str += self.left.__str__() + ","
			str += self.right.__str__()
			str += ")"
			return (str)

	def is_terminal(self):
		return (self.left==None and self.right==None)

	def is_root(self):
		return (self.parent==None)

	def traverse(self):
		yield (self)
		if self.left:
			for node in self.left.traverse():
				yield (node)
		if self.right:
			for node in self.right.traverse():
				yield (node)
	def clone(self):
		nod = TreeNode(name=self.name)
		if self.left:
			nod.left = self.left.clone()
			nod.left.parent = nod
		if self.right:
			nod.right = self.right.clone()
			nod.right.parent = nod
		return nod

def insert_node(tr, ntip):
	nnod = 2*ntip - 1

	trs = [tr.clone() for i in range(nnod)]

	for i in range(nnod):
		count = 0
		for nod in trs[i].traverse():
			if count == i:
				if nod.is_root():
					root = TreeNode()
					nod.parent = root
					root.left = nod
					root.right = TreeNode(name=ntip+1, parent=root)
					trs[i] = root
				else:
					innod = TreeNode(parent=nod.parent)
					innod.left = nod
					innod.right = TreeNode(name=ntip+1, parent=innod)

					if nod == nod.parent.left:
						nod.parent.left = innod
					elif nod == nod.parent.right:
						nod.parent.right = innod
					nod.parent = innod
			count += 1
	return trs

def generate_all_topology(N):
	if N == 1:
		return [TreeNode(name=1)]
	if N == 2:
		root = TreeNode()
		root.left = TreeNode(name=1, parent=root)
		root.right = TreeNode(name=2, parent=root)
		return [root]
	if N >= 3:
		root = TreeNode()
		root.left = TreeNode(name=1, parent=root)
		root.right = TreeNode(name=2, parent=root)
		init = [root]

		ntip = 2
		while ntip < N:
			update = []
			for tr in init:
				update.extend(insert_node(tr, ntip))
			init = update
			ntip += 1
		return init

if __name__=="__main__":
	for tr in generate_all_topology(int(sys.argv[1])):
		print tr.__str__() + ";"
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s