Once the splits package was installed, you are ready to run the gmyc to delimit species.
First, load the package, and load your tree,
>library(splits)
>tr <- read.tree(“./youtree.tre”)
The read.tree function reads a tree file written in Newick format. If your tree is in Nexus format, use read.nexus instead of read.tree.
Your tree must be ultrametric and fully bifurcating. A tree with polytomy can be easily converted to a dichotomous tree by the multi2di function. Making ultrametric trees can be done by any dating programs like BEAST or r8s. The method using the relaxed clock is better. However, the GMYC is not hyper-sensitive to the dating methods. Even dating with the molecular clock often returns the result similar to the relaxed clock (for example, see Monaghan et al. 2009).
Calling gmyc runs the gmyc analysis. The default setting is the single threshold method.
>result <- gmyc(tr)
You will see outputs like below after a few seconds (or minutes/hours, depending on tree size).
node T loglik
2 -0.954603 863.3993
3 -0.9010095 864.1611
…
148 -0.00011231 861.0263
149 -4.374e-05 860.83
Mon Apr 22 12:45:51 2013
finish.
The analysis is done. Now, you can check the number of delimited species and the significance of the GMYC model by the summary function.
>summary(result)
Result of GMYC species delimitation
method: single
likelihood of null model: 858.9326
maximum likelihood of GMYC model: 872.3755
likelihood ratio: 26.88581
result of LR test: 6.220952e-06***
number of ML clusters: 33
confidence interval: 31-34
number of ML entities: 46
confidence interval: 42-56
threshold time: -0.05174417
The “number of ML clusters” is the number of delimited “clusters”, groups including more than 2 samples. The “number of ML entities” is the number of all delimited species including singletons (species represented by a single sample). The GMYC is usually robust to a moderate number of singletons (10-20% of singletons or more). Rare species will be reasonably delimited.
You can plot a lineage through time plot, a likelihood surface and a tree with delimited species by calling plot.
>plot(result)
The red vertical line indicates the threshold time between inter – intraspecific branching. The red branches on the gene tree are delimited species.
If you want detailed associations of samples to species, use spec.list.
>spec.list(result)
GMYC_spec sample_name
1 1 spec1.5
2 1 spec1.4
3 1 spec1.3
4 1 spec1.2
…
This function returns a dataframe showing the sample-species associations. You may save this result of delimitation by calling the write.table function and use it for other analyses. (spec.list is available in the splits >= 1.0-15.)
Pingback: The first year review | Tomochika Fujisawa's site
Hi
I have a question, I used a ultrametric tree with outgroup and my results it’s were similar with that I had expected (6 clusters) but when I tested a ultrametric tree just with “ingroup” my results were about 14 clusters so which approach would be the best?
Cheers
Dear Erica,
Whether to remove outgroups really depends on your data. Generally, I think it is good to remove outgroups before running GMYC. They can be biases for waiting time estimation.
Your result looks quite unstable. Can you first check if the results are significant for both with and without outgroups by “summary” function?
Tomochika
Dear Tomochika,
Thanks for you quick reply, I used the ‘summary’ function and I had significant results just for my data+outgroup. I’m working with a very recent species complex so I believe that is the reason for my “bad” results when I use just the ‘ingroup’. What do you think about that?This conclusion makes sense for you?
Thanks again
Cheers
Dear Erica,
I would use the result of ingroup data. The GMYC is not specifically designed to find recently diverged species. So, I think it is reasonable that it favored the single coalescent model over the multiple species model.
Tomochika
Dear Dr. Fujisawa
How much time can be considered “recent”?
I’m studying a group of (currently recognized) 6 species and the earliest cladogenetic event took place 6 mya (but most species diverged ~3 mya). After running gmyc I get 7 clusters (congruent with the currently know species”), even though the results are not significant.
Can it be related to the divergence times?
Dear gen,
It depends on types of organisms you are studying. Generally, divergence time of 2*Ne or smaller is considered very “recent” (Ne is the effective population size) because it is unlikely that you have a monophyletic gene tree within this short time.
The larger species’ Ne is, the longer time you need to have a monophyletic tree. For example, if you assume Ne=10^5 and generation time is 10 years, the time required for monophyly is about 10^6 years, which is 1 million years.
You could calculate this 2Ne from genetic variation within species and generation time and check if it is larger or smaller than 3-6 million.
Best regards,
Tomochika
Dear Tomochika,
May I know what is the command for multi threshold method?
Thanks.
Jack
Dear Jack,
You can specify which method to use by method=”multiple” / “single”.
Please check your results of the multiple threshold carefully. It may not improve the accuracy.
Tomochika
Error in gmyc(tr) :
Your ultrametric tree is not ultrametric, please check
I don’t understand why I get this error. I selected a consensus tree obtained through Beast (tree annotator) and don’t know why it does not recognize it as ultrametric tree!
Suggestions to solve this prob?
Carlos M.
Hi Carlos,
It sounds strange. A BEAST ouput tree must be ultrametric.
Probably, I can not solve this problem only from this comment. Is it possible to send me the tree file to my email address? (It’s at the bottom of “ABOUT” page)
Tomochika
Thanks for the help! Please check your email inbox. Cheers
Hi,
I get exactly the same error. Do you found what’s wrong?
Hi Andrea,
In this case, some external branches are shorter than others for unknown reasons, and removing them solved the problem.
I don’t know if the same problem is happening in your tree. But, I think it is worth checking branch lengths of your tree.
Tomochika
Hi,
ok, good to know. I had to say I didn’t investigated very much, I just solved by loading the tree as native output of BEAST/TreeAnnotator (i.e., a nexus tree file) rather than tranforming into a newik file.
Andrea
Hi,
Some programs introduce small errors on branch lengths by rounding them when you convert between formats.
I think it’s safe to use the direct output of TreeAnnotator.
Tomochika
Dear Tomochika,
How can I see the posterior probabilities for each node in GMYC analysis? I selected a consensus tree obtained through Beast (tree annotator) and I run it in R and in http://species.h-its.org/ but in both cases I don´’t see posterior probabilities in any node.
Thanks.
Mj
Dear Maria,
The posterior probabilities of BEAST is removed when you read trees by “read.nexus” function.
So, you cannot simultaneously see delimitation and posteriors of tree reconstruction by default.
You could use a package called “phyloch” to read your nexus file, which can correctly read outputs of TreeAnnotator.
Tomochika
Hi, I have a very basic question.
Where does my .tre local file needs to be saved in order to load it into R?
Hi Alicia,
You can save wherever you want. Once you save your tree file, use R command “setwd” to change your working directory to the folder you saved the file.
Tomochika
Hi,
I have a question concerning single threshold GMYC. I have been running sGMYC on a dataset containing > 100 cryptic species. As input tree I used a phylogeny obtained by MrBayes (using a strict clock model). Since the tree was not fully bifurcating, I used the multi2di to resolve polytomies.
Although sGMYC runs successfully, the summary results look like this:
Result of GMYC species delimitation
method: single
likelihood of null model: 1322.219
maximum likelihood of GMYC model: NA
likelihood ratio: NA
result of LR test: NANA
number of ML clusters: 54
confidence interval: NA-NA
number of ML entities: 130
confidence interval: NA-NA
threshold time: -0.03069196
Do you have any idea what might be causing the ‘NA’ results and what to do about it? The graphs work and I do get a species list using spec.list. When scanning through this list, the delimited entities are more or less what I expected based on knowledge of other single gene automated molecular species delimitation methods.
Many thanks already!
Hi Maria,
I think that there are NAs in the likelihood values, which lead to the problems of likelhood ratio and confidence interval calculations. You can check it just by,
>res$likelihood
, where “res” is your result of GMYC.
A common reason of the NAs is polytomy at the root node. If you have root polytomy, you can remove it by removing outermost outgroups, and then you can run the gmyc again on the tree without root polytomy.
Best regards,
Tomochika
Dear Tomochika,
Many thanks! Indeed, there is one NA in the likelihood values.
However, I checked the input phylogenetic tree (after applying ‘multi2di’ in R) and it seems like there are multiple polytomies. It is thus not clear to me which polytomy is causing the trouble. Also, applying ‘multi2di’ should have solved this problem, so it is not clear to me why it still does not work.
Is there any other solution rather than randomly removing some taxa of the tree? I would like to test all individuals currently in the tree (I did not include an outgroup). If this is not possible, can I still rely on the other results (species list, graphs, etc.) of GMYC to be correct?
Many thanks already.
All the best,
Maria
Dear Maria,
Usually, only polytomy at the root node makes this type of problem. Polytomies at other internal nodes are properly handled in the GMYC function.
Can you check where in the vector of likelihood the NA is? If it is in the second or third element in the vector, it is very likely that it is at the root node.
As far as the NA is only at the root node, the other results, like number of species, do not change.
Tomochika
Dear Tomochika,
I would like to know if It is possible to use a single locus gene tree generated using starbeast2 to perform gmyc and similar species delimitation methods. Would it violate any premises of those algorithms?
Thanks in advance,
Pedro
Dear Pedro,
I don’t think using gene trees from starbeast2 violates the assumption of the GMYC.
If you have collected a multilocus data set to do a starbeast analysis, I think it’s worth trying BP&P too.
Tomochika
Dear Tomochika,
I run ‘splits’ on R 3.0.2 for windows
When I analyze newick tree obtained by the BEAST I get an error ‘Your ultrametric tree is not ultrametric’ I use data without outgroup.
When I analyze nexus tree directly obtained from the treeAnnotator I get the same error.
What’s wrong with my tree and what kind of tree is better to use?
Hi Artem,
Which version o “splits” do you use? This type of error often happens when you use old “splits” package with recent “ape” package.
The “ape” developers made a change to the precision of test of ultrametricity. With their new function, most supposed ultrametric trees, like BEAST outputs, are not treated as ultrametric.
In the latest version of “splits”, this problem is fixed by using the old precision value.
Tomochika
Dear Tomochika,
Thanks a lot for your reply!
I installed ‘splits’ package by this command from your another post
>install.packages(“splits”, repos=”http://R-Forge.R-project.org”)
The archive has a name ‘splits_1.0-19.zip’
I think that’s ‘splits’ vesion.
Also I used the complicated BEAST model (Chronogram phylogenetic tree + Bayesian Skyline Plot model).
I find the solution to the problem ‘Your ultrametric tree is not ultrametric’ but only on the Linux. I think it’s convenient to create another topic. Many thanks to Liam Revell
His guess is that the problem has to do with numerical precision – that is, the rounding of your edge lengths when they are written to file. http://blog.phytools.org/2016/08/fixing-ultrametric-tree-whose-edges-are.html
What we should to do:
1) Install the next libraries in R: phytools, phangorn. You may use the follow commands:
install.packages(“phytools”, dependencies=T)
install.packages(“phangorn”, dependencies=T)
If you got a lot of error messages related with the RGL packet, you should install that by this command (via the terminal):
sudo apt-get install r-cran-rgl
2) Run the R and enter the next commands:
library(phytools)
library(phangorn)
Import your tree:
tree<-read.tree(file="path-to-your-tree")
tree ## command to check the correct detecting your tree.
Ultrametric checking:
is.ultrametric(tree)
## [1] FALSE
If it has a FALSE value you have to compute the non-negative least-squares edge lengths (NNLS) for ultrametric tree:
nnls<-nnls.tree(cophenetic(tree),tree,rooted=TRUE)
Then ultrametric checking again:
is.ultrametric(nnls)
## [1] TRUE
Finally, export your ultra-ultrametric tree as nexus:
writeNexus(nnls, file="file_name")
Then you can find your nexus tree in the $HOME directory (or use searching)
Phew… That's all
Dr. Fujisawa,
How can I obtain a prune tree cutting the orijinal tree at the time threshold given by GMYC?
Thanks
Hi Nick,
It is a bit tricky to write a code to prune the branch at the threshold.
Maybe, you can get a similar outcome by pruning branches within species to leave just single tips.
res <- gmyc(tree)
l <- spec.list(res)
tr2 <- drop.tip(tree, as.character(l[duplicated(l[,1]),2]))
Best regards,
Tomochika
Dear Dr. Fujisawa,
I got the LR test result as 0***. Is that a meaningful value or shows something I did wrong? In any paper I read, I have not seen a zero likelihood results. That is why I would like to ask.
Thank you.
Hi Mehmet,
Usually, 0*** means the p-value is so small that it is treated as 0.
You could directly check the p-value by the following commands,
res <- gmyc(tree)
1-pchisq(2*(max(res$likelihood)-res$likelihood[1]), 2)
If it is actually very small, "0***" is a reasonable output.
Tomochika
Dear Tomochika
I have run many GMYC analysis (with and without outgroup) and always the LR test is not significant. I am analyzing two species that I think is one. I am using 19 haplotypes to resolve this and sometimes one species as outgroup and other times I am using more species (4). When I analyze only the ingroup (19 haplotypes) the ML cluster and entities are more than two. Any suggestion.
All best
Ricardo
Dear Recardo,
Generally, GMYC doesn’t perform well when used on data with a small number of species.
This paper report a poor performance of GMYC with “species-poor” data sets.
https://academic.oup.com/sysbio/article/64/6/900/1666261
You can generally improve accuracy of delimitation by adding more taxa. If you don’t have many sequence from other species, using Haploweb may be an option.
Best regards,
Tomochika
Dear Tomochika
Thank you
Ricardo
Hello Tomochika,
I am just wondering if there’s a way to export the “gmyc tree” and import it to Tree editing software like the Fig Tree. I would like to edit the node labels and adjust the branches so it would be presentable and the edit the overlapping labels like the support values.
Cheers, Jasper
Hi Jasper,
It’s easy to do just by the follow command:
>write.tree(results$tree, file=”yourtree_file”)
“resuts” can be called another way depending on your previous command making GMYC analysis:
>results<-gmyc(treeFile)
GMYC tree will be written in your work directory if you set that. If you don't, then find your tree in "My Documents" in Windows or "Home" directory in Linux,
Regards,
Artem B.
Hi Jasper,
It’s easy to do just by the follow command:
>write.tree(results$tree, file=”treefile_name”)
“results” can be called another way depending on your previous command runing GMYC analysis:
>results<-gmyc(treeFile)
The GMYC tree will be written in your work directory if you set that. If you don't, then you can find your tree in "My Documents" in Windows or in "Home" directory in Linux,
Regards,
Artem B.
I just realised that we have got the same tree which was imported before 😀 Sorry, might be I got you wrong
HI Artem, thank you for your reply. I’m going to try that script and I hope I get it right. Best regards, Jasper
Hi Artem, I think you are correct. The exported tree is the one as the file I imported for GMYC analysis. I was wondering if there’s a way to export the GMYC tree including the support values. I would like to come up with a presentable tree that I could use in my thesis.
Best regards, Jasper
Hi Jasper, and thank you Artem,
You can use the “node.label” attribute to save the support value in a file.
result <- gmyc(tr)
tr$node.label <- gmyc.support(result)
write.tree(tr, "file.tre")
Then, you can use figtree to show a tree with support values.
This "node.label" can only keep one type of values, which means values like bootstrap support will be lost when you overwrite node.label with gmyc.support.
Tomochika
Hi Jasper,
What kind of support values do you want to get to put that in a presentation? I thought we got all support while tree reconstruction (e.g. bootstrap values). I only know the ML-test value as GMYC support but it is not related to a tree.
Dear Tomochika,
Thank you for replying! I wrote my previous post too long. I didn’t know about GMYC support. I should re-read your article.
Thank you Tomochika,
Those are helpful suggestions. I am a bit confused about the support values as generated by the gmyc.support function in R. Are they bootstrap values or AIC-based support value? Do these terms mean the same? Thanks again!
All best, Jasper
Hi Jasper and Artem,
Sorry, I was confused too. The gmyc.support calculates the AIC-based support value described in the Fujisawa&Barraclough(2013).
(I was too lazy to write a document about it in this website.)
If you want to copy bootstrap supports between two identical trees, you can do something like this,
tr1$node.label <- tr2$node.label
Any values on tree nodes can be stored in node.label and written on a file by write.tree.
Tomochika
Hi Tomochika, Thanks for your replies. I think I’m just interested in knowing what those support values generated by the GMYC are. I do not think my ultrametric trees (generated by BEAST) have bootstrap values but I think they show posterior probabilities. Thanks for clarifying!
Hi Tomochika,
I am just wondering what do gmyc support values mean? What I really want to ask is what gmyc support values are high enough to say that I have high confidence in the species delimitation of a node. Should gmyc support values be equal to 1.0 to indicate reliability of the delimited species?
Thank you for your time.
Regards, Jasper
Hi Jasper,
A rough interpretation of the gmyc support is “a probability that the node is included in the true species delimitation”, according to the original Burnham&Anderson’s argument.
And, yes, I think you can interpret it as a reliability of delimitation. The support values are usually high when you have neither within-species structure nor very close sister species, and you can certainly delimit them.
Tomochika
When I run the following code:
my_tree<-read.nexus('ultrametric.tree')
result 1 and only the first element will be used
2: In if (!is.binary.tree(tr)) { :
the condition has length > 1 and only the first element will be used
Hello,
Im new in R. Hope to get some help.
May i know if this summary result correct or not?
> summary(yule_gmyc)
Length Class Mode
method 1 -none- character
likelihood 171 -none- numeric
parameters 684 -none- numeric
entity 171 -none- numeric
cluster 171 -none- numeric
MRCA 171 -none- list
threshold.time 171 -none- numeric
tree 4 phylo list
After that, why i cannot plot the result in R?
> plot(yule_gmyc)
Error in xy.coords(x, y, xlabel, ylabel, log) :
‘x’ is a list, but does not have components ‘x’ and ‘y’
Thank you in advance
Hi Izwan,
Can you try
>summary.gmyc(yule_gmyc)
instead of summary(yule_gmyc)?
Can you also use plot.gmyc for plotting?
This is a problem on the registration of function names, and I am working to fix it.
Best regards,
Tomochika
I also have problems with R version 4.0+
With R 3.0 I have no issues at all.
Hi Artem,
Thank you for reporting the problem.
I am working on this right now.
Best,
Tomochika