How to run “gmyc”

Once the splits package was installed, you are ready to run the gmyc to delimit species.

First, load the package, and load your tree,

>library(splits)

>tr <- read.tree(“./youtree.tre”)

The read.tree function reads a tree file written in Newick format. If your tree is in Nexus format, use read.nexus instead of read.tree.

Your tree must be ultrametric and fully bifurcating. A tree with polytomy can be easily converted to a dichotomous tree by the multi2di function. Making ultrametric trees can be done by any dating programs like  BEAST or r8s. The method using the relaxed clock is better. However, the GMYC is not hyper-sensitive to the dating methods. Even dating with the molecular clock often returns the result similar to the relaxed clock (for example, see Monaghan et al. 2009).

Calling gmyc runs the gmyc analysis. The default setting is the single threshold method.

>result <- gmyc(tr)

You will see outputs like below after a few seconds (or minutes/hours, depending on tree size).

node T loglik
2 -0.954603 863.3993
3 -0.9010095 864.1611

148 -0.00011231 861.0263
149 -4.374e-05 860.83


Mon Apr 22 12:45:51 2013
finish.

The analysis is done. Now, you can check the number of delimited species and the  significance of the GMYC model by the summary function.

>summary(result)

Result of GMYC species delimitation


method: single
likelihood of null model: 858.9326
maximum likelihood of GMYC model: 872.3755
likelihood ratio: 26.88581
result of LR test: 6.220952e-06***


number of ML clusters: 33
confidence interval: 31-34


number of ML entities: 46
confidence interval: 42-56


threshold time: -0.05174417

The “number of ML clusters” is the number of delimited “clusters”, groups including more than 2 samples. The “number of ML entities”  is the number of all delimited species including singletons (species represented by a single sample). The GMYC is usually robust to a moderate number of singletons (10-20% of singletons or more). Rare species will be reasonably delimited.

You can plot a lineage through time plot,  a likelihood surface and a tree with delimited species by calling plot.

>plot(result)

plot.gmyc

The red vertical line indicates the threshold time between inter – intraspecific branching. The red branches on the gene tree are delimited species.

If you want detailed associations of samples to species, use spec.list.

>spec.list(result)

GMYC_spec sample_name
1           1     spec1.5
2           1     spec1.4
3           1     spec1.3
4           1     spec1.2

This function returns a dataframe showing the sample-species associations. You may save this result of delimitation by calling the write.table function and use it for other analyses. (spec.list is available in the splits >= 1.0-15.)

Advertisements

26 thoughts on “How to run “gmyc”

  1. Pingback: The first year review | Tomochika Fujisawa's site

  2. Erica

    Hi

    I have a question, I used a ultrametric tree with outgroup and my results it’s were similar with that I had expected (6 clusters) but when I tested a ultrametric tree just with “ingroup” my results were about 14 clusters so which approach would be the best?

    Cheers

    Reply
    1. t.fujisawa Post author

      Dear Erica,

      Whether to remove outgroups really depends on your data. Generally, I think it is good to remove outgroups before running GMYC. They can be biases for waiting time estimation.

      Your result looks quite unstable. Can you first check if the results are significant for both with and without outgroups by “summary” function?

      Tomochika

      Reply
  3. Erica

    Dear Tomochika,

    Thanks for you quick reply, I used the ‘summary’ function and I had significant results just for my data+outgroup. I’m working with a very recent species complex so I believe that is the reason for my “bad” results when I use just the ‘ingroup’. What do you think about that?This conclusion makes sense for you?

    Thanks again

    Cheers

    Reply
    1. t.fujisawa Post author

      Dear Erica,

      I would use the result of ingroup data. The GMYC is not specifically designed to find recently diverged species. So, I think it is reasonable that it favored the single coalescent model over the multiple species model.

      Tomochika

      Reply
      1. gen

        Dear Dr. Fujisawa
        How much time can be considered “recent”?
        I’m studying a group of (currently recognized) 6 species and the earliest cladogenetic event took place 6 mya (but most species diverged ~3 mya). After running gmyc I get 7 clusters (congruent with the currently know species”), even though the results are not significant.
        Can it be related to the divergence times?

      2. t.fujisawa Post author

        Dear gen,

        It depends on types of organisms you are studying. Generally, divergence time of 2*Ne or smaller is considered very “recent” (Ne is the effective population size) because it is unlikely that you have a monophyletic gene tree within this short time.

        The larger species’ Ne is, the longer time you need to have a monophyletic tree. For example, if you assume Ne=10^5 and generation time is 10 years, the time required for monophyly is about 10^6 years, which is 1 million years.

        You could calculate this 2Ne from genetic variation within species and generation time and check if it is larger or smaller than 3-6 million.

        Best regards,
        Tomochika

    1. t.fujisawa Post author

      Dear Jack,

      You can specify which method to use by method=”multiple” / “single”.

      Please check your results of the multiple threshold carefully. It may not improve the accuracy.

      Tomochika

      Reply
  4. Carlos M.

    Error in gmyc(tr) :
    Your ultrametric tree is not ultrametric, please check

    I don’t understand why I get this error. I selected a consensus tree obtained through Beast (tree annotator) and don’t know why it does not recognize it as ultrametric tree!
    Suggestions to solve this prob?
    Carlos M.

    Reply
    1. t.fujisawa Post author

      Hi Carlos,

      It sounds strange. A BEAST ouput tree must be ultrametric.
      Probably, I can not solve this problem only from this comment. Is it possible to send me the tree file to my email address? (It’s at the bottom of “ABOUT” page)

      Tomochika

      Reply
      1. t.fujisawa Post author

        Hi Andrea,

        In this case, some external branches are shorter than others for unknown reasons, and removing them solved the problem.

        I don’t know if the same problem is happening in your tree. But, I think it is worth checking branch lengths of your tree.

        Tomochika

      2. Andrea

        Hi,
        ok, good to know. I had to say I didn’t investigated very much, I just solved by loading the tree as native output of BEAST/TreeAnnotator (i.e., a nexus tree file) rather than tranforming into a newik file.

        Andrea

      3. t.fujisawa Post author

        Hi,

        Some programs introduce small errors on branch lengths by rounding them when you convert between formats.

        I think it’s safe to use the direct output of TreeAnnotator.

        Tomochika

  5. María José

    Dear Tomochika,
    How can I see the posterior probabilities for each node in GMYC analysis? I selected a consensus tree obtained through Beast (tree annotator) and I run it in R and in http://species.h-its.org/ but in both cases I don´’t see posterior probabilities in any node.
    Thanks.
    Mj

    Reply
    1. t.fujisawa Post author

      Dear Maria,

      The posterior probabilities of BEAST is removed when you read trees by “read.nexus” function.
      So, you cannot simultaneously see delimitation and posteriors of tree reconstruction by default.

      You could use a package called “phyloch” to read your nexus file, which can correctly read outputs of TreeAnnotator.

      Tomochika

      Reply
  6. Alicia Rojas

    Hi, I have a very basic question.
    Where does my .tre local file needs to be saved in order to load it into R?

    Reply
    1. t.fujisawa Post author

      Hi Alicia,

      You can save wherever you want. Once you save your tree file, use R command “setwd” to change your working directory to the folder you saved the file.

      Tomochika

      Reply
  7. Maria

    Hi,

    I have a question concerning single threshold GMYC. I have been running sGMYC on a dataset containing > 100 cryptic species. As input tree I used a phylogeny obtained by MrBayes (using a strict clock model). Since the tree was not fully bifurcating, I used the multi2di to resolve polytomies.

    Although sGMYC runs successfully, the summary results look like this:

    Result of GMYC species delimitation

    method: single
    likelihood of null model: 1322.219
    maximum likelihood of GMYC model: NA
    likelihood ratio: NA
    result of LR test: NANA

    number of ML clusters: 54
    confidence interval: NA-NA

    number of ML entities: 130
    confidence interval: NA-NA

    threshold time: -0.03069196

    Do you have any idea what might be causing the ‘NA’ results and what to do about it? The graphs work and I do get a species list using spec.list. When scanning through this list, the delimited entities are more or less what I expected based on knowledge of other single gene automated molecular species delimitation methods.

    Many thanks already!

    Reply
    1. t.fujisawa Post author

      Hi Maria,

      I think that there are NAs in the likelihood values, which lead to the problems of likelhood ratio and confidence interval calculations. You can check it just by,

      >res$likelihood

      , where “res” is your result of GMYC.

      A common reason of the NAs is polytomy at the root node. If you have root polytomy, you can remove it by removing outermost outgroups, and then you can run the gmyc again on the tree without root polytomy.

      Best regards,
      Tomochika

      Reply
      1. Maria

        Dear Tomochika,

        Many thanks! Indeed, there is one NA in the likelihood values.

        However, I checked the input phylogenetic tree (after applying ‘multi2di’ in R) and it seems like there are multiple polytomies. It is thus not clear to me which polytomy is causing the trouble. Also, applying ‘multi2di’ should have solved this problem, so it is not clear to me why it still does not work.

        Is there any other solution rather than randomly removing some taxa of the tree? I would like to test all individuals currently in the tree (I did not include an outgroup). If this is not possible, can I still rely on the other results (species list, graphs, etc.) of GMYC to be correct?

        Many thanks already.

        All the best,
        Maria

      2. t.fujisawa Post author

        Dear Maria,

        Usually, only polytomy at the root node makes this type of problem. Polytomies at other internal nodes are properly handled in the GMYC function.

        Can you check where in the vector of likelihood the NA is? If it is in the second or third element in the vector, it is very likely that it is at the root node.

        As far as the NA is only at the root node, the other results, like number of species, do not change.

        Tomochika

  8. Pedro

    Dear Tomochika,

    I would like to know if It is possible to use a single locus gene tree generated using starbeast2 to perform gmyc and similar species delimitation methods. Would it violate any premises of those algorithms?

    Thanks in advance,

    Pedro

    Reply
    1. t.fujisawa Post author

      Dear Pedro,

      I don’t think using gene trees from starbeast2 violates the assumption of the GMYC.

      If you have collected a multilocus data set to do a starbeast analysis, I think it’s worth trying BP&P too.

      Tomochika

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s