After you obtain the species delimitation from the GMYC, the first thing you want to do is probably measuring its accuracy or how well it matches with other groupings (most commonly, morphological species).
This is done by the function “comp.delimit” (available in version >= 1.0-19).
One naive method for comparison is counting the number of identical groups.
, where “result” is a gmyc object from the gmyc function, and “sp” is a data frame specifying the delimitation you want compare with the GMYC, for instance,
1 spec1 spec1.5
2 spec1 spec1.4
3 spec1 spec1.3
4 spec1 spec1.2
5 spec1 spec1.1
6 spec18 spec18.3
7 spec18 spec18.2
8 spec18 spec18.1
9 spec18 spec18.5
10 spec18 spec18.4
11 spec29 spec29.4
The first column of the data frame must indicate the species, and the second column must be tip labels of the input tree. The returned value of comp.delimit is the number of groups which have the exact match (ie. identical size and identical members) between 2 delimitation. If you compare the GMYC groups with taxonomic species, this measure shows how many taxonomic species are recovered by the GMYC.
One problem of counting matches is that its value depends on the number of species included in your data. Dividing the match by the total number of species is a way to normalize the result. But this is only valid when the total number of species are the same between 2 alternative delimitation. (Dividing by the true number of species is one option when you know it.)
More sophisticated methods to compare two clustering have been developed in the filed of machine learning. The comp.delimit function devises a method called the “normalized mutual information” (NMI). The mutual information is a measure of how much information are shared between 2 groupings.
>comp.delimit(result, morphsp, method=”NMI”)
The number (or proportion) of exact match is a more stringent measure of accuracy. Exact match against the true species rapidly drops when the assumptions of the GMYC (eg. species monophyly ) is violated.
The NMI is not as sensitive as the exact match. The GMYC is a tree-base delimitation. So, even if the threshold time is wrongly placed, similarity between the true and the GMYC species is retained as long as tree reconstruction is correct.