Comparing GMYC species with other delimitations

After you obtain the species delimitation from the GMYC, the first thing you want to do is probably measuring its accuracy or how well it matches with other groupings (most commonly, morphological species).

This is done by the function “comp.delimit” (available in version >= 1.0-19).

One naive method for comparison is counting the number of identical groups.

>comp.delimit(result, sp)

, where “result” is a gmyc object from the gmyc function, and “sp” is a data frame specifying the delimitation you want compare with the GMYC, for instance,

> sp

species samplename
1    spec1           spec1.5
2    spec1           spec1.4
3    spec1           spec1.3
4    spec1           spec1.2
5    spec1           spec1.1
6   spec18          spec18.3
7   spec18          spec18.2
8   spec18          spec18.1
9   spec18          spec18.5
10  spec18          spec18.4
11  spec29          spec29.4

>comp.delimit(resut, sp)

15

The first column of the data frame must indicate the species, and the second column must be tip labels of the input tree. The returned value of comp.delimit is the number of groups which have the exact match (ie. identical size and identical members) between 2 delimitation. If you compare the GMYC groups with taxonomic species, this measure shows how many taxonomic species are recovered by the GMYC.

One problem of counting matches is that its value depends on the number of species included in your data. Dividing the match by the total number of species is a way to normalize the result. But this is only valid when the total number of species are the same between 2 alternative delimitation. (Dividing by the true number of species is one option when you know it.)

More sophisticated methods to compare two clustering have been developed in the filed of machine learning. The comp.delimit function devises a method called the “normalized mutual information” (NMI). The mutual information is a measure of how much information are shared between 2 groupings.

>comp.delimit(result, morphsp, method=”NMI”)

0.9295553

The number (or proportion) of exact match is a more stringent measure of accuracy. Exact match against the true species rapidly drops when the assumptions of the GMYC (eg. species monophyly ) is violated.

monophyl.match

The NMI is not as sensitive as the exact match. The GMYC is a tree-base delimitation. So, even if the threshold time is wrongly placed,  similarity between the true and the GMYC species is retained as long as tree reconstruction is correct.

monophyl.nmi

Advertisements

6 thoughts on “Comparing GMYC species with other delimitations

  1. Pingback: The first year review | Tomochika Fujisawa's site

  2. cpvcow

    when i try to use this follow command,

    comp.delimit(result1, result2)

    and then i get this error
    Error in `[.data.frame`(result2, , c(2, 3)) : undefined columns selected???

    result1 is get using single method, and result2 is using multiple method

    Reply
    1. t.fujisawa Post author

      Hi,

      The comp.delimit function requires a gmyc object and a table as arguments.
      This is because what you want to compare is not only GMYC results but other kind of delimitation like taxonomic species.
      So, If you want to compare 2 GMYC results, you need to write like following.

      >comp.delimit(result1, spec.list(result2))

      Tomochika

      Reply
      1. cpvcow

        Dear Tomochika,

        Thanks for you quick reply and solution, i thought two same output should be enable to compare.
        I wonder if the function of comp.delimit possible to list the identical species name?
        Thank you again!

        Quinn

      2. t.fujisawa Post author

        Hi Quinn,

        Thank you for the suggestion.
        A function listing identical species names may be useful.
        I cannot write it right now as it does not look very straightforward.
        But, I will write a post for it.

        Tomochika

  3. Pingback: exact.match.pairs.R for showing exactly matched species | Tomochika Fujisawa's site

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s