Comparing GMYC species with other delimitations

After you obtain the species delimitation from the GMYC, the first thing you want to do is probably measuring its accuracy or how well it matches with other groupings (most commonly, morphological species).

This is done by the function “comp.delimit” (available in version >= 1.0-19).

One naive method for comparison is counting the number of identical groups.

>comp.delimit(result, sp)

, where “result” is a gmyc object from the gmyc function, and “sp” is a data frame specifying the delimitation you want compare with the GMYC, for instance,

> sp

species samplename
1    spec1           spec1.5
2    spec1           spec1.4
3    spec1           spec1.3
4    spec1           spec1.2
5    spec1           spec1.1
6   spec18          spec18.3
7   spec18          spec18.2
8   spec18          spec18.1
9   spec18          spec18.5
10  spec18          spec18.4
11  spec29          spec29.4

>comp.delimit(resut, sp)


The first column of the data frame must indicate the species, and the second column must be tip labels of the input tree. The returned value of comp.delimit is the number of groups which have the exact match (ie. identical size and identical members) between 2 delimitation. If you compare the GMYC groups with taxonomic species, this measure shows how many taxonomic species are recovered by the GMYC.

One problem of counting matches is that its value depends on the number of species included in your data. Dividing the match by the total number of species is a way to normalize the result. But this is only valid when the total number of species are the same between 2 alternative delimitation. (Dividing by the true number of species is one option when you know it.)

More sophisticated methods to compare two clustering have been developed in the filed of machine learning. The comp.delimit function devises a method called the “normalized mutual information” (NMI). The mutual information is a measure of how much information are shared between 2 groupings.

>comp.delimit(result, morphsp, method=”NMI”)


The number (or proportion) of exact match is a more stringent measure of accuracy. Exact match against the true species rapidly drops when the assumptions of the GMYC (eg. species monophyly ) is violated.


The NMI is not as sensitive as the exact match. The GMYC is a tree-base delimitation. So, even if the threshold time is wrongly placed,  similarity between the true and the GMYC species is retained as long as tree reconstruction is correct.


10 thoughts on “Comparing GMYC species with other delimitations

  1. Pingback: The first year review | Tomochika Fujisawa's site

  2. cpvcow

    when i try to use this follow command,

    comp.delimit(result1, result2)

    and then i get this error
    Error in `[.data.frame`(result2, , c(2, 3)) : undefined columns selected???

    result1 is get using single method, and result2 is using multiple method

    1. t.fujisawa Post author


      The comp.delimit function requires a gmyc object and a table as arguments.
      This is because what you want to compare is not only GMYC results but other kind of delimitation like taxonomic species.
      So, If you want to compare 2 GMYC results, you need to write like following.

      >comp.delimit(result1, spec.list(result2))


      1. cpvcow

        Dear Tomochika,

        Thanks for you quick reply and solution, i thought two same output should be enable to compare.
        I wonder if the function of comp.delimit possible to list the identical species name?
        Thank you again!


      2. t.fujisawa Post author

        Hi Quinn,

        Thank you for the suggestion.
        A function listing identical species names may be useful.
        I cannot write it right now as it does not look very straightforward.
        But, I will write a post for it.


  3. Pingback: exact.match.pairs.R for showing exactly matched species | Tomochika Fujisawa's site

  4. Lubos

    How can I compare the results of single and multiple threshold within splits package?
    May I use the function comp.delimit ?

    1. t.fujisawa Post author

      Hi Lubos,

      Yes. You can use the comp.delimit to compare two results.
      You have to convert one of them into a table of delimitation before comparing.

      > comp.delimit(ress, spec.list(resm))

      Here, ress is a result of single threshold and resm multiple threshold.


      1. Lubos

        Hi Tomochika,
        Thank you very much. I have one more question:
        Is any test, that compare if signle or multiple threshold is better ? I have read your discussion about inappropriateness of chi-square test from 2016, but I did not find any other test.
        In my case, both models fit significantly better than null model, but multiple threshold oversplit (as usually). I have interpret single threshold results, but I would like to test whitch one is better.
        My results are following:
        method: single
        likelihood of null model: 335.4751
        maximum likelihood of GMYC model: 343.4486
        likelihood ratio: 15.94712
        result of LR test: 0.0003444511***
        method: multiple
        likelihood of null model: 335.4751
        maximum likelihood of GMYC model: 344.4189
        likelihood ratio: 17.8877
        result of LR test: 0.0001305376***

        With regards,

      2. t.fujisawa Post author

        Hi Lubos,

        Sorry for late response.

        This is really a difficult issue. Now, we know that the multiple threshold method often oversplits species, but I can’t figure out when you have to choose the multiple threshold.

        One way to compare two results is AIC. In this case, the AIC difference is just around 2. So, the single threshold model is not really worse than the multiple threshold.

        It is not a formal test, but it might give you some information.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s