exact.match.pairs.R for showing exactly matched species

One of users of the splits package asked me if there is a function to show pairs of species which have exact match. The comp.delimit function counts the number of exact matches between 2 groupings, but does not show which species matches with which one. Surely, it is useful to have this type of information.

At first, tracking pairing information looked complicated for me, but actually it is easily done by simple modifications on existing codes.

exact.match <- function(x, y) {
	if (length(x) != length(y)) {
	return (NA)
	}

	match <- 0
	for (l1 in unique(x)) {
		for (l2 in unique(y)) {
			l1_ids <- sort(which(x == l1))
			l2_ids <- sort(which(y == l2))

			if (length(l1_ids) == length(l2_ids) && all(l1_ids == l2_ids)) {
				match <- match + 1
			}
		}
	}
	return (match)
}

The code above is the “exact match” function used in the comp.delimit. It runs on 2 vectors of delimitation. It does pairwise comparisons between all pairs of species and records the number of pairs which matched exactly.

For instance, consider 2 alternative groupings x=(A,A,B,C) and y=(1,1,2,2) (,where the first element of each vector is species of the first sample, and the second, the third and so on.) The function returns 1, which means one match between species A and species 1.

Tracking the names of pairs with exact match is done just by replacing the counter of matches in the code above with a matrix which keeps names of matched pairs.
The modified code is below.

exact.match.pairs <- function(d1, d2) {
	if (nrow(d1) != nrow(d2)) {
		return (NA)
	}

	x <- d1[,1]
	y <- d2[match(d1[,2], d2[,2]), 1]

	pair <- c()
	for (l1 in unique(x)) {
		for (l2 in unique(y)) {
			l1_ids <- sort(which(x == l1))
			l2_ids <- sort(which(y == l2))

			if (length(l1_ids) == length(l2_ids) && all(l1_ids == l2_ids)) {
				pair <- rbind(pair, c(l1, l2))
			}
		}
	}
	return (pair)
}

Another small modification is the function now accepts 2 tables of delimitation instead of 2 vectors. Lines 6 and 7 were added to sort tables by sample names. The first column of the table must be species names, and the second column sample names.

For a gmyc result, res1, and a dataframe of delimitation, d, it is used as follows.

>exact.match.pairs(spec.list(res1), d)

For 2 tables of delimitation,

>exact.match.pairs(d1, d2)

These commands return a matrix containing pairs of species.

[,1] [,2]
[1,] “1” “spec1”
[2,] “2” “spec18”
[3,] “5” “spec26”
[4,] “12” “spec16”
[5,] “14” “spec25”

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s