How to run “gmyc”

Once the splits package was installed, you are ready to run the gmyc to delimit species.

First, load the package, and load your tree,

>library(splits)

>tr <- read.tree(“./youtree.tre”)

The read.tree function reads a tree file written in Newick format. If your tree is in Nexus format, use read.nexus instead of read.tree.

Your tree must be ultrametric and fully bifurcating. A tree with polytomy can be easily converted to a dichotomous tree by the multi2di function. Making ultrametric trees can be done by any dating programs like  BEAST or r8s. The method using the relaxed clock is better. However, the GMYC is not hyper-sensitive to the dating methods. Even dating with the molecular clock often returns the result similar to the relaxed clock (for example, see Monaghan et al. 2009).

Calling gmyc runs the gmyc analysis. The default setting is the single threshold method.

>result <- gmyc(tr)

You will see outputs like below after a few seconds (or minutes/hours, depending on tree size).

node T loglik
2 -0.954603 863.3993
3 -0.9010095 864.1611

148 -0.00011231 861.0263
149 -4.374e-05 860.83


Mon Apr 22 12:45:51 2013
finish.

The analysis is done. Now, you can check the number of delimited species and the  significance of the GMYC model by the summary function.

>summary(result)

Result of GMYC species delimitation


method: single
likelihood of null model: 858.9326
maximum likelihood of GMYC model: 872.3755
likelihood ratio: 26.88581
result of LR test: 6.220952e-06***


number of ML clusters: 33
confidence interval: 31-34


number of ML entities: 46
confidence interval: 42-56


threshold time: -0.05174417

The “number of ML clusters” is the number of delimited “clusters”, groups including more than 2 samples. The “number of ML entities”  is the number of all delimited species including singletons (species represented by a single sample). The GMYC is usually robust to a moderate number of singletons (10-20% of singletons or more). Rare species will be reasonably delimited.

You can plot a lineage through time plot,  a likelihood surface and a tree with delimited species by calling plot.

>plot(result)

plot.gmyc

The red vertical line indicates the threshold time between inter – intraspecific branching. The red branches on the gene tree are delimited species.

If you want detailed associations of samples to species, use spec.list.

>spec.list(result)

GMYC_spec sample_name
1           1     spec1.5
2           1     spec1.4
3           1     spec1.3
4           1     spec1.2

This function returns a dataframe showing the sample-species associations. You may save this result of delimitation by calling the write.table function and use it for other analyses. (spec.list is available in the splits >= 1.0-15.)

Advertisements

16 thoughts on “How to run “gmyc”

  1. Pingback: The first year review | Tomochika Fujisawa's site

  2. Erica

    Hi

    I have a question, I used a ultrametric tree with outgroup and my results it’s were similar with that I had expected (6 clusters) but when I tested a ultrametric tree just with “ingroup” my results were about 14 clusters so which approach would be the best?

    Cheers

    Reply
    1. t.fujisawa Post author

      Dear Erica,

      Whether to remove outgroups really depends on your data. Generally, I think it is good to remove outgroups before running GMYC. They can be biases for waiting time estimation.

      Your result looks quite unstable. Can you first check if the results are significant for both with and without outgroups by “summary” function?

      Tomochika

      Reply
  3. Erica

    Dear Tomochika,

    Thanks for you quick reply, I used the ‘summary’ function and I had significant results just for my data+outgroup. I’m working with a very recent species complex so I believe that is the reason for my “bad” results when I use just the ‘ingroup’. What do you think about that?This conclusion makes sense for you?

    Thanks again

    Cheers

    Reply
    1. t.fujisawa Post author

      Dear Erica,

      I would use the result of ingroup data. The GMYC is not specifically designed to find recently diverged species. So, I think it is reasonable that it favored the single coalescent model over the multiple species model.

      Tomochika

      Reply
    1. t.fujisawa Post author

      Dear Jack,

      You can specify which method to use by method=”multiple” / “single”.

      Please check your results of the multiple threshold carefully. It may not improve the accuracy.

      Tomochika

      Reply
  4. Carlos M.

    Error in gmyc(tr) :
    Your ultrametric tree is not ultrametric, please check

    I don’t understand why I get this error. I selected a consensus tree obtained through Beast (tree annotator) and don’t know why it does not recognize it as ultrametric tree!
    Suggestions to solve this prob?
    Carlos M.

    Reply
    1. t.fujisawa Post author

      Hi Carlos,

      It sounds strange. A BEAST ouput tree must be ultrametric.
      Probably, I can not solve this problem only from this comment. Is it possible to send me the tree file to my email address? (It’s at the bottom of “ABOUT” page)

      Tomochika

      Reply
      1. t.fujisawa Post author

        Hi Andrea,

        In this case, some external branches are shorter than others for unknown reasons, and removing them solved the problem.

        I don’t know if the same problem is happening in your tree. But, I think it is worth checking branch lengths of your tree.

        Tomochika

      2. Andrea

        Hi,
        ok, good to know. I had to say I didn’t investigated very much, I just solved by loading the tree as native output of BEAST/TreeAnnotator (i.e., a nexus tree file) rather than tranforming into a newik file.

        Andrea

      3. t.fujisawa Post author

        Hi,

        Some programs introduce small errors on branch lengths by rounding them when you convert between formats.

        I think it’s safe to use the direct output of TreeAnnotator.

        Tomochika

  5. María José

    Dear Tomochika,
    How can I see the posterior probabilities for each node in GMYC analysis? I selected a consensus tree obtained through Beast (tree annotator) and I run it in R and in http://species.h-its.org/ but in both cases I don´’t see posterior probabilities in any node.
    Thanks.
    Mj

    Reply
    1. t.fujisawa Post author

      Dear Maria,

      The posterior probabilities of BEAST is removed when you read trees by “read.nexus” function.
      So, you cannot simultaneously see delimitation and posteriors of tree reconstruction by default.

      You could use a package called “phyloch” to read your nexus file, which can correctly read outputs of TreeAnnotator.

      Tomochika

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s