You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution.
— Richard Dawkins
01. 输入文件格式
(1). Newick格式
1 | ((t2:0.04,t1:0.34):0.89,(t5:0.37,(t4:0.03,t3:0.67):0.9):0.59); |
Newick格式文件都是以分号(;)作为结尾,内部节点用一对匹配的括号表示,括号间的节点代表后代节点,例如(t2:0.04, t1:0.34)表示t2、t1的父节点。另外,同级节点之间用逗号分隔,tips用它们的名字表示。分支长度(从父节点到子节点)由子节点后面的实数表示,前面是冒号。与内部节点或分支相关联的数据(例如,自展值)可能编码为节点标签,并由冒号前的简单文本/数字表示。
(2). NEXUS格式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #NEXUS [R-package APE, Wed Nov 9 11:46:32 2016] BEGIN TAXA; DIMENSIONS NTAX = 5; TAXLABELS t5 t4 t1 t2 t3 ; END; BEGIN TREES; TRANSLATE 1 t5, 2 t4, 3 t1, 4 t2, 5 t3 ; TREE * UNTITLED = [&R] (1:0.89,((2:0.59,3:0.37):0.34, (4:0.03,5:0.67):0.9):0.04); END; |
(3). New Hampshire eXtended format
1 2 3 4 5 | (((ADH2:0.1[&&NHX:S=human], ADH1:0.11[&&NHX:S=human]):0.05[&&NHX:S=primates:D=Y:B=100],ADHY:0.1[&&NHX:S=nematode],ADHX:0.12[&&NHX:S=insect]):0.1[&&NHX:S=metazoa:D=N], (ADH4:0.09[&&NHX:S=yeast],ADH3:0.13[&&NHX:S=yeast], ADH2:0.12[&&NHX:S=yeast],ADH1:0.11[&&NHX:S=yeast]):0.1[&&NHX:S=Fungi]) [&&NHX:D=N]; |
(4). 其他软件的输出格式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | tree TREE1 = [&R] (((11[&length=9.4]:9.38,14[&length=6.4]:6.385096430786298) [&length=25.7]:25.43,4[&length=9.1]:8.821663252749829) [&length=3.0]:3.10,(12[&length=0.6]:0.56, (10[&length=1.6]:1.56,(7[&length=5.2]:5.19, ((((2[&length=3.3]:3.26,(1[&length=1.3]:1.32, (6[&length=0.8]:0.83,13[&length=0.8]:0.8311577761397366) [&length=2.4]:2.48917886025146) [&length=0.9]:0.9416178372674331) [&length=0.4]:0.49,9[&length=1.7]:1.757288031101215) [&length=2.4]:2.35,8[&length=2.1]:2.1125745387283246) [&length=0.2]:0.23,(3[&length=3.3]:3.31, (15[&length=5.2]:5.27,5[&length=3.2]:3.2710481368304585) [&length=1.0]:1.0409443024626412) [&length=1.9]:2.0372962536780435) [&length=2.8]:2.8446835614595685) [&length=5.3]:5.367459711197171) [&length=2.0]:2.0037467863383043) [&length=4.3]:4.360909907798238)[&length=0.0]; |
- MrBayes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | tree con_all_compat = [&U] (8[&prob=1.0]:2.94e-1[&length_mean=2.9e-1],10[&prob=1.0]:2.25e-1[&length_mean=2.2e-1], ((((1[&prob=1.0]:1.43e-1[&length_mean=1.4e-1],2[&prob=1.0]:1.92e-1[&length_mean=1.9e-1]) [&prob=1.0]:1.24e-1[&length_mean=1.2e-1],9[&prob=1.0]:2.27e-1[&length_mean=2.2e-1]) [&prob=1.0]:1.72e-1[&length_mean=1.7e-1],12[&prob=1.0]:5.11e-1[&length_mean=5.1e-1]) [&prob=1.0]:1.76e-1[&length_mean=1.7e-1], (((3[&prob=1.0]:5.46e-2[&length_mean=5.4e-2], (6[&prob=1.0]:1.03e-2[&length_mean=1.0e-2],7[&prob=1.0]:7.13e-3[&length_mean=7.2e-3]) [&prob=1.0]:6.93e-2[&length_mean=6.9e-2]) [&prob=1.0]:6.03e-2[&length_mean=6.0e-2], (4[&prob=1.0]:6.27e-2[&length_mean=6.2e-2],5[&prob=1.0]:6.31e-2[&length_mean=6.3e-2]) [&prob=1.0]:6.07e-2[&length_mean=6.0e-2]) [&prob=1.0]:1.80e-1[&length_mean=1.8e-1],11[&prob=1.0]:2.37e-1[&length_mean=2.3e-1]) [&prob=1.0]:4.05e-1[&length_mean=4.0e-1]) [&prob=1.0]:1.16e+000[&length_mean=1.162699558201079e+000]) [&prob=1.0][&length_mean=0]; |
杨子恒教授开发的PAML(Phylogenetic Analysis by Maximum Likelihood)软件包主要用于DNA或蛋白质序列的系统发育分析,其中BaseML与CodeML是两个主要子程序。BasseMl可利用多种碱基取代模型估计树拓扑、分支长度和替代参数,CodeML主要是估计同义与非同义替换率、密码子置换模型下正选择的似然比检验。CodeML输出文件均包含树拓扑结构和同义、非同义替换率的估计的mlc文件。
02. 在R中读取树文件
Table 1.1: Parser functions defined in treeio
Parser function | Description |
read.astral | parsing output of ASTRAL |
read.beast | parsing output of BEAST |
read.codeml | parsing output of CodeML (rst and mlc files) |
read.codeml_mlc | parsing mlc file (output of CodeML) |
read.fasta | parsing FASTA format sequence file |
read.hyphy | parsing output of HYPHY |
read.hyphy.seq | parsing ancestral sequences from HYPHY output |
read.iqtree | parsing IQ-Tree newick string, with ability to parse SH-aLRT and UFBoot support values |
read.jplace | parsing jplace file including output of EPA and pplacer |
read.jtree | parsing jtree format |
read.mega | parsing MEGA Nexus output |
read.mega_tabular | parsing MEGA tabular output |
read.mrbayes | parsing output of MrBayes |
read.newick | parsing newick string, with ability to parse node label as support values |
read.nhx | parsing NHX file including output of PHYLDOG and RevBayes |
read.paml_rst | parsing rst file (output of BaseML or CodeML) |
read.phylip | parsing phylip file (phylip alignment + newick string) |
read.phylip.seq | parsing multiple sequence alignment from phylip file |
read.phylip.tree | parsing newick string from phylip file |
read.r8s | parsing output of r8s |
read.raxml | parsing output of RAxML |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | library(ggtree) file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio") beast <- read.beast(file) beast ## 'treedata' S4 object that stored information of ## '/home/ygc/R/library/treeio/extdata/BEAST/beast_mcc.tree'. ## ## ...@ phylo: ## Phylogenetic tree with 15 tips and 14 internal nodes. ## ## Tip labels: ## A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ... ## ## Rooted; includes branch lengths. ## ## with the following features available: ## 'height', 'height_0.95_HPD', 'height_median', ## 'height_range', 'length', 'length_0.95_HPD', ## 'length_median', 'length_range', 'posterior', 'rate', ## 'rate_0.95_HPD', 'rate_median', 'rate_range'. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | file <- system.file("extdata/MrBayes", "Gq_nxs.tre", package="treeio") read.mrbayes(file) ## 'treedata' S4 object that stored information of ## '/home/ygc/R/library/treeio/extdata/MrBayes/Gq_nxs.tre'. ## ## ...@ phylo: ## Phylogenetic tree with 12 tips and 10 internal nodes. ## ## Tip labels: ## B_h, B_s, G_d, G_k, G_q, G_s, ... ## ## Unrooted; includes branch lengths. ## ## with the following features available: ## 'length_0.95HPD', 'length_mean', 'length_median', 'prob', ## 'prob_range', 'prob_stddev', 'prob_percent', 'prob+-sd'. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | brstfile <- system.file("extdata/PAML_Baseml", "rst", package="treeio") brst <- read.paml_rst(brstfile) brst ## 'treedata' S4 object that stored information of ## '/home/ygc/R/library/treeio/extdata/PAML_Baseml/rst'. ## ## ...@ phylo: ## Phylogenetic tree with 15 tips and 13 internal nodes. ## ## Tip labels: ## A, B, C, D, E, F, ... ## Node labels: ## 16, 17, 18, 19, 20, 21, ... ## ## Unrooted; includes branch lengths. ## ## with the following features available: ## 'subs', 'AA_subs'. |
03. 数据整合与过滤
3.1 使用tidytree将数据转换为数据框dataframe格式
所有被树解析/合并的数据都可以使用tidytree包转换成整洁的数据框。tidytree包提供操作带有关联数据的树。例如,外部数据可以链接到系统发育,或者从不同来源获得的进化数据可以使用tidyverse verbs进行合并。在对树数据进行操作后,可以将其转换回treedata对象,并导出到单个树文件中,在R中进一步分析或使用ggtree可视化。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | library(ape) library(tidytree) library(dplyr) set.seed(2017) tree <- rtree(4) tree ## ## Phylogenetic tree with 4 tips and 3 internal nodes. ## ## Tip labels: ## [1] "t4" "t1" "t3" "t2" ## ## Rooted; includes branch lengths. x <- as_tibble(tree) x ## # A tibble: 7 x 4 ## parent node branch.length label ## <int> <int> <dbl> <chr> ## 1 5 1 0.435 t4 ## 2 7 2 0.674 t1 ## 3 7 3 0.00202 t3 ## 4 6 4 0.0251 t2 ## 5 5 5 NA <NA> ## 6 5 6 0.472 <NA> ## 7 6 7 0.274 <NA> as.phylo(x) ## ## Phylogenetic tree with 4 tips and 3 internal nodes. ## ## Tip labels: ## [1] "t4" "t1" "t3" "t2" ## ## Rooted; includes branch lengths. |
1 2 3 4 5 | d <- tibble(label = paste0('t', 1:4), trait = rnorm(4)) y <- full_join(x, d, by = 'label') #通过物种名合并 y |
1 2 3 4 5 6 7 8 9 10 | ## # A tibble: 7 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 5 1 0.435 t4 0.943 ## 2 7 2 0.674 t1 -0.171 ## 3 7 3 0.00202 t3 0.570 ## 4 6 4 0.0251 t2 -0.283 ## 5 5 5 NA <NA> NA ## 6 5 6 0.472 <NA> NA ## 7 6 7 0.274 <NA> NA |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | as.treedata(y) ## 'treedata' S4 object'. ## ## ...@ phylo: ## Phylogenetic tree with 4 tips and 3 internal nodes. ## ## Tip labels: ## [1] "t4" "t1" "t3" "t2" ## ## Rooted; includes branch lengths. ## ## with the following features available: ## 'trait'. y %>% as.treedata %>% as_tibble # 直接合并多个对象 ## # A tibble: 7 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 5 1 0.435 t4 0.943 ## 2 7 2 0.674 t1 -0.171 ## 3 7 3 0.00202 t3 0.570 ## 4 6 4 0.0251 t2 -0.283 ## 5 5 5 NA <NA> NA ## 6 5 6 0.472 <NA> NA ## 7 6 7 0.274 <NA> NA |
Access related nodes(访问相关节点)
1 | child(y, 5) |
1 2 3 4 5 | ## # A tibble: 2 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 5 1 0.435 t4 0.943 ## 2 5 6 0.472 <NA> NA |
1 | parent(y, 2) |
1 2 3 4 | ## # A tibble: 1 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 6 7 0.274 <NA> NA |
1 | offspring(y, 5) |
1 2 3 4 5 6 7 8 9 | ## # A tibble: 6 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 5 1 0.435 t4 0.943 ## 2 7 2 0.674 t1 -0.171 ## 3 7 3 0.00202 t3 0.570 ## 4 6 4 0.0251 t2 -0.283 ## 5 5 6 0.472 <NA> NA ## 6 6 7 0.274 <NA> NA |
1 | ancestor(y, 2) |
1 2 3 4 5 6 | ## # A tibble: 3 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 5 5 NA <NA> NA ## 2 5 6 0.472 <NA> NA ## 3 6 7 0.274 <NA> NA |
1 | sibling(y, 2) |
1 2 3 4 | ## # A tibble: 1 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 7 3 0.00202 t3 0.570 |
1 | MRCA(y, 2, 3) |
1 2 3 4 | ## # A tibble: 1 x 5 ## parent node branch.length label trait ## <int> <int> <dbl> <chr> <dbl> ## 1 6 7 0.274 <NA> NA |
3.2 数据整合
3.2.1 合并codeml与beast的结果
1 2 3 4 5 6 7 8 | beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree") rst_file <- system.file("examples/rst", package="ggtree") mlc_file <- system.file("examples/mlc", package="ggtree") beast_tree <- read.beast(beast_file) codeml_tree <- read.codeml(rst_file, mlc_file) merged_tree <- merge_tree(beast_tree, codeml_tree) merged_tree |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## 'treedata' S4 object that stored information of ## '/home/ygc/R/library/ggtree/examples/MCC_FluA_H3.tree', ## '/home/ygc/R/library/ggtree/examples/rst', ## '/home/ygc/R/library/ggtree/examples/mlc'. ## ## ...@ phylo: ## Phylogenetic tree with 76 tips and 75 internal nodes. ## ## Tip labels: ## A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ... ## ## Rooted; includes branch lengths. ## ## with the following features available: ## 'height', 'height_0.95_HPD', 'height_median', ## 'height_range', 'length', 'length_0.95_HPD', ## 'length_median', 'length_range', 'posterior', 'rate', ## 'rate_0.95_HPD', 'rate_median', 'rate_range', 'subs', ## 'AA_subs', 't', 'N', 'S', 'dN_vs_dS', 'dN', 'dS', 'N_x_dN', ## 'S_x_dS'. |
1 2 3 4 5 6 7 8 9 10 | library(dplyr) df <- merged_tree %>% as_tibble() %>% select(dN_vs_dS, dN, dS, rate) %>% subset(dN_vs_dS >=0 & dN_vs_dS <= 1.5) %>% tidyr::gather(type, value, dN_vs_dS:dS) df$type[df$type == 'dN_vs_dS'] <- 'dN/dS' df$type <- factor(df$type, levels=c("dN/dS", "dN", "dS")) ggplot(df, aes(rate, value)) + geom_hex() + facet_wrap(~type, scale='free_y') |
04. 系统发育树可视化
4.1 基本语法
1 2 3 4 5 6 7 8 | ggplot(tree_object) + geom_tree() + theme_tree() ggtree(tree_object) geom_treescale #增加树分支比例的图例(遗传距离、发散时间等) geom_range #显示分支长度的不确定性(置信区间或范围等) geom_tiplab #添加分类群标签 geom_tippoint,geom_nodepoint #添加末端和内部节点 geom_hilight # 突出显示 geom_cladelabel #分组标签 |
1 2 3 4 5 6 7 8 9 10 11 | library("treeio") library("ggtree") nwk <- system.file("extdata", "sample.nwk", package="treeio") tree <- read.tree(nwk) ggplot(tree, aes(x, y)) + geom_tree() + theme_tree() ggtree(tree, color="firebrick", size=2, linetype="dotted") ggtree(tree, ladderize=FALSE) ggtree(tree, branch.length="none") |
4.2 系统发生树的展示布局
1 2 3 4 5 6 7 8 9 10 11 12 | library(ggtree) set.seed(2017-02-16) tree <- rtree(50) ggtree(tree) ggtree(tree, layout="slanted") ggtree(tree, layout="circular") ggtree(tree, layout="fan", open.angle=120) ggtree(tree, layout="equal_angle") ggtree(tree, layout="daylight") ggtree(tree, branch.length='none') ggtree(tree, branch.length='none', layout='circular') ggtree(tree, layout="daylight", branch.length = 'none') |
1 2 3 4 5 6 7 8 9 | ggtree(tree) + scale_x_reverse() ggtree(tree) + coord_flip() ggtree(tree) + layout_dendrogram() print(ggtree(tree), newpage=TRUE, vp=grid::viewport(angle=-30, width=.9, height=.9)) ggtree(tree, layout='slanted') + coord_flip() ggtree(tree, layout='slanted', branch.length='none') + layout_dendrogram() ggtree(tree, layout='circular') + xlim(-10, NA) ggtree(tree) + scale_x_reverse() + coord_polar(theta='y') ggtree(tree) + scale_x_reverse(limits=c(10, 0)) + coord_polar(theta='y') |
1 2 3 4 | beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree") beast_tree <- read.beast(beast_file) ggtree(beast_tree, mrsd="2013-01-01") + theme_tree2() |
4.3 展示不同树的组成
4.3.1 树的比例尺
1 2 3 4 5 6 7 8 9 | ggtree(tree) + geom_treescale() # geom_treescale() supports the following parameters: #x and y for tree scale position #width for the length of the tree scale #fontsize for the size of the text #linesize for the size of the line #offset for relative position of the line and the text #color for color of the tree scale |
1 2 3 | ggtree(tree) + geom_treescale(x=0, y=45, width=1, color='red') ggtree(tree) + geom_treescale(fontsize=6, linesize=2, offset=1) ggtree(tree) + theme_tree2() |
4.3.2 展示内部节点与末端
1 2 3 4 | ggtree(tree) + geom_point(aes(shape=isTip, color=isTip), size=3) p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10) p + geom_tippoint(color="#FDAC4F", shape=8, size=3) |
4.3.3 展示标签
1 2 | p + geom_tiplab(size=3, color="purple") ggtree(tree, layout="circular") + geom_tiplab(aes(angle=angle), color='blue') |
4.4.4 展示根的边
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ## with root edge = 1 tree1 <- read.tree(text='((A:1,B:2):3,C:2):1;') ggtree(tree1) + geom_tiplab() + geom_rootedge() ## without root edge tree2 <- read.tree(text='((A:1,B:2):3,C:2);') ggtree(tree2) + geom_tiplab() + geom_rootedge() ## setting root edge tree2$root.edge <- 2 ggtree(tree2) + geom_tiplab() + geom_rootedge() ## specify length of root edge for just plotting ## this will ignore tree$root.edge ggtree(tree2) + geom_tiplab() + geom_rootedge(rootedge = 3) |
4.4.5 系统发生树颜色设置
1 2 3 | ggtree(beast_tree, aes(color=rate)) + scale_color_continuous(low='darkgreen', high='red') + theme(legend.position="right") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | anole.tree<-read.tree("") svl <- read.csv("", row.names=1) svl <- as.matrix(svl)[,1] fit <- phytools::fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE) td <- data.frame(node = nodeid(anole.tree, names(svl)), trait = svl) nd <- data.frame(node = names(fit$ace), trait = fit$ace) d <- rbind(td, nd) d$node <- as.numeric(d$node) tree <- full_join(anole.tree, d, by = 'node') ggtree(tree, aes(color=trait), layout = 'circular', ladderize = FALSE, continuous = TRUE, size=2) + scale_color_gradientn(colours=c("red", 'orange', 'green', 'cyan', 'blue')) + geom_tiplab(hjust = -.1) + xlim(0, 1.2) + theme(legend.position = c(.05, .85)) ggtree(tree, aes(color=trait), continuous = TRUE, yscale = "trait") + scale_color_viridis_c() + theme_minimal() |
4.4.6 修改树的标尺度量
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | library("treeio") beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree") beast_tree <- read.beast(beast_file) beast_tree p1 <- ggtree(beast_tree, mrsd='2013-01-01') + theme_tree2() + labs(caption="Divergence time") p2 <- ggtree(beast_tree, branch.length='rate') + theme_tree2() + labs(caption="Substitution rate") mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio") mlc_tree <- read.codeml_mlc(mlcfile) p3 <- ggtree(mlc_tree) + theme_tree2() + labs(caption="nucleotide substitutions per codon") p4 <- ggtree(mlc_tree, branch.length='dN_vs_dS') + theme_tree2() + labs(caption="dN/dS tree") beast_tree2 <- rescale_tree(beast_tree, branch.length='rate') ggtree(beast_tree2) + theme_tree2() |
4.4.7 修改主题
1 2 3 4 | set.seed(2019) x <- rtree(30) ggtree(x, color="red") + theme_tree("steelblue") ggtree(x, color="white") + theme_tree("black") |
4.4.8 同时展示多个树
1 2 3 | trees <- lapply(c(10, 20, 40), rtree) class(trees) <- "multiPhylo" ggtree(trees) + facet_wrap(, scale="free") + geom_tiplab() |
1 2 3 | btrees <- read.tree(system.file("extdata/RAxML", "RAxML_bootstrap.H3", package="treeio")) ggdensitree(btrees, alpha=.3, colour='steelblue') + geom_tiplab(size=3) + xlim(0, 45) |
05. 系统发育树注释
5.1 树的注释
1 2 3 4 5 6 7 8 9 10 11 12 | library(ggtree) treetext = "(((ADH2:0.1[&&NHX:S=human], ADH1:0.11[&&NHX:S=human]): 0.05 [&&NHX:S=primates:D=Y:B=100],ADHY: 0.1[&&NHX:S=nematode],ADHX:0.12 [&&NHX:S=insect]): 0.1[&&NHX:S=metazoa:D=N],(ADH4:0.09[&&NHX:S=yeast], ADH3:0.13[&&NHX:S=yeast], ADH2:0.12[&&NHX:S=yeast], ADH1:0.11[&&NHX:S=yeast]):0.1[&&NHX:S=Fungi])[&&NHX:D=N];" tree <- read.nhx(textConnection(treetext)) ggtree(tree) + geom_tiplab() + geom_label(aes(x=branch, label=S), fill='lightgreen') + geom_label(aes(label=D), fill='steelblue') + geom_text(aes(label=B), hjust=-.5) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | geom_balance highlights the two direct descendant clades of an internal node geom_cladelabel annotate a clade with bar and text label geom_facet plot associated data in specific panel (facet) and align the plot with the tree geom_hilight highlight a clade with rectangle geom_inset add insets (subplots) to tree nodes geom_label2 modified version of geom_label, with subsetting supported geom_nodepoint annotate internal nodes with symbolic points geom_point2 modified version of geom_point, with subsetting supported geom_range bar layer to present uncertainty of evolutionary inference geom_rootpoint annotate root node with symbolic point geom_rootedge add root edge to a tree geom_segment2 modified version of geom_segment, with subsetting supported geom_strip annotate associated taxa with bar and (optional) text label geom_taxalink associate two related taxa by linking them with a curve geom_text2 modified version of geom_text, with subsetting supported geom_tiplab layer of tip labels geom_tippoint annotate external nodes with symbolic points geom_tree tree structure layer, with multiple layout supported geom_treescale tree branch scale legend |
1 | ggtree(tree) + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) + geom_tiplab() ###显示树文件的每个内部节点编号 |
5.2 图层的注释
5.2.1 分组标签
1 2 3 4 5 6 | set.seed(2015-12-21) tree <- rtree(30) p <- ggtree(tree) + xlim(NA, 6) p + geom_cladelabel(node=45, label="test label") + geom_cladelabel(node=34, label="another clade") |
1 2 | p + geom_cladelabel(node=45, label="test label", align=TRUE, offset = .2, color='red') + geom_cladelabel(node=34, label="another clade", align=TRUE, offset = .2, color='blue') |
1 2 | p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) + geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8) |
1 | p + geom_cladelabel(node=34, label="another clade", align=T, geom='label', fill='lightblue') |
1 2 3 4 5 | ggtree(tree, layout="daylight") + geom_cladelabel(node=35, label="test label", angle=0, fontsize=8, offset=.5, vjust=.5) + geom_cladelabel(node=55, label='another clade', angle=-95, hjust=.5, fontsize=8) |
1 2 3 4 5 | p + geom_tiplab() + geom_strip('t10', 't30', barsize=2, color='red', label="associated taxa", offset.text=.1) + geom_strip('t1', 't18', barsize=2, color='blue', label = "another label", offset.text=.1) |
5.2.2 背景高亮显示
1 2 3 4 | nwk <- system.file("extdata", "sample.nwk", package="treeio") tree <- read.tree(nwk) ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) + geom_hilight(node=17, fill="darkgreen", alpha=.6) |
1 2 | ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) + geom_hilight(node=23, fill="darkgreen", alpha=.6) |
1 | pg + geom_hilight(node=55) + geom_hilight(node=35, fill='darkgreen') |
1 2 3 | ggtree(tree) + geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) + geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1) |
5.2.3 类群之间相互关联
1 2 3 | ggtree(tree) + geom_tiplab() + geom_taxalink('A', 'E') + geom_taxalink('F', 'K', color='red', linetype = 'dashed', arrow=grid::arrow(length=grid::unit(0.02, "npc"))) |
5.2.4 分化时间的不确定性估计
1 2 3 4 5 | file <- system.file("extdata/MEGA7", "mtCDNA_timetree.nex", package = "treeio") x <- read.mega(file) p1 <- ggtree(x) + geom_range('reltime_0.95_CI', color='red', size=3, alpha=.3) p2 <- ggtree(x) + geom_range('reltime_0.95_CI', color='red', size=3, alpha=.3, center='reltime') p3 <- p2 + scale_x_range() + theme_tree2() |
5.3 进化软件输出的树注释
1 2 3 4 5 6 7 | file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio") beast <- read.beast(file) ggtree(beast, aes(color=rate)) + geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2) + geom_nodelab(aes(x=branch, label=round(posterior, 2)), vjust=-.5, size=3) + scale_color_continuous(low="darkgreen", high="red") + theme(legend.position=c(.1, .8)) |
1 2 3 4 5 6 7 8 9 | nwk <- system.file("extdata/HYPHY", "labelledtree.tree", package="treeio") ancseq <- system.file("extdata/HYPHY", "ancseq.nex", package="treeio") tipfas <- system.file("extdata", "pa.fas", package="treeio") hy <- read.hyphy(nwk, ancseq, tipfas) ggtree(hy) + geom_text(aes(x=branch, label=AA_subs), size=2, vjust=-.3, color="firebrick") |
1 2 3 4 5 6 7 8 9 10 11 12 | rstfile <- system.file("extdata/PAML_Codeml", "rst", package="treeio") mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio") ml <- read.codeml(rstfile, mlcfile) ggtree(ml, aes(color=dN_vs_dS), branch.length='dN_vs_dS') + scale_color_continuous(name='dN/dS', limits=c(0, 1.5), oob=scales::squish, low='darkgreen', high='red') + geom_text(aes(x=branch, label=AA_subs), vjust=-.5, color='steelblue', size=2) + theme_tree2(legend.position=c(.9, .3)) |
06. 系统发育树拓扑结构缩放
1 2 3 4 5 | library(ggtree) nwk <- system.file("extdata", "sample.nwk", package="treeio") tree <- read.tree(nwk) p <- ggtree(tree) + geom_tiplab() viewClade(p, MRCA(p, "I", "L")) |
1 2 3 4 | tree2 <- groupClade(tree, c(17, 21)) p <- ggtree(tree2, aes(color=group)) + theme(legend.position='none') + scale_color_manual(values=c("black", "firebrick", "steelblue")) scaleClade(p, node=17, scale=.1) |
1 2 3 4 5 6 | p2 <- p %>% collapse(node=21) + geom_point2(aes(subset=(node==21)), shape=21, size=5, fill='green') p2 <- collapse(p2, node=23) + geom_point2(aes(subset=(node==23)), shape=23, size=5, fill='red') print(p2) expand(p2, node=23) %>% expand(node=21) |
1 2 3 4 5 6 7 8 9 10 | p2 <- p + geom_tiplab() node <- 21 collapse(p2, node, 'max') %>% expand(node) collapse(p2, node, 'min') %>% expand(node) collapse(p2, node, 'mixed') %>% expand(node) collapse(p, 21, 'mixed', fill='steelblue', alpha=.4) %>% collapse(23, 'mixed', fill='firebrick', color='blue') scaleClade(p, 23, .2) %>% collapse(23, 'min', fill="darkgreen") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | data(iris) rn <- paste0(iris[,5], "_", 1:150) rownames(iris) <- rn d_iris <- dist(iris[,-5], method="man") tree_iris <- ape::bionj(d_iris) grp <- list(setosa = rn[1:50], versicolor = rn[51:100], virginica = rn[101:150]) p_iris <- ggtree(tree_iris, layout = 'circular', branch.length='none') groupOTU(p_iris, grp, 'Species') + aes(color=Species) + theme(legend.position="right") tree_iris <- groupOTU(tree_iris, grp, "Species") ggtree(tree_iris, aes(color=Species), layout = 'circular', branch.length = 'none') + theme(legend.position="right") |
1 2 3 | p1 <- p + geom_point2(aes(subset=node==16), color='darkgreen', size=5) p2 <- rotate(p1, 17) %>% rotate(21) flip(p2, 17, 21 |
1 2 | p3 <- open_tree(p, 180) + geom_tiplab() print(p3) |
07. 用数据绘制树
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | library(ggimage) library(ggtree) url <- paste0("", "metastyle/master/design/viz_targets_exercise/") x <- read.tree(paste0(url, "tree_boots.nwk")) info <- read.csv(paste0(url, "tip_data.csv")) p <- ggtree(x) %<+% info + xlim(-.1, 4) p2 <- p + geom_tiplab(offset = .6, hjust = .5) + geom_tippoint(aes(shape = trophic_habit, color = trophic_habit, size = mass_in_kg)) + theme(legend.position = "right") + scale_size_continuous(range = c(3, 10)) d2 <- read.csv(paste0(url, "inode_data.csv")) p2 %<+% d2 + geom_label(aes(label = vernacularName.y, fill = posterior)) + scale_fill_gradientn(colors = RColorBrewer::brewer.pal(3, "YlGnBu")) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | library(ggtree) remote_folder <- paste0("", "plotTree/master/tree_example_april2015/") ## read the phylogenetic tree tree <- read.tree(paste0(remote_folder, "tree.nwk")) ## read the sampling information data set info <- read.csv(paste0(remote_folder,"info.csv")) ## read and process the allele table snps<-read.csv(paste0(remote_folder, "alleles.csv"), header = F, row.names = 1, stringsAsFactor = F) snps_strainCols <- snps[1,] snps<-snps[-1,] # drop strain names colnames(snps) <- snps_strainCols gapChar <- "?" snp <- t(snps) lsnp <- apply(snp, 1, function(x) { x != snp[1,] & x != gapChar & snp[1,] != gapChar }) lsnp <- lsnp$pos <- as.numeric(rownames(lsnp)) lsnp <- tidyr::gather(lsnp, name, value, -pos) snp_data <- lsnp[lsnp$value, c("name", "pos")] ## read the trait data bar_data <- read.csv(paste0(remote_folder, "bar.csv")) ## visualize the tree p <- ggtree(tree) ## attach the sampling information data set ## and add symbols colored by location p <- p %<+% info + geom_tippoint(aes(color=location)) ## visualize SNP and Trait data using dot and bar charts, ## and align them based on tree structure p + geom_facet(panel = "SNP", data = snp_data, geom = geom_point, mapping=aes(x = pos, color = location), shape = '|') + geom_facet(panel = "Trait", data = bar_data, geom = ggstance::geom_barh, aes(x = dummy_bar_value, color = location, fill = location), stat = "identity", width = .6) + theme_tree2(legend.position=c(.05, .85)) |
ggtree真的是一个非常优秀的工具,值得每一个系统发育研究者学习,特别感谢Prof. Guangchuang Yu开发的优秀R包。
- LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2019, accepted. doi: 10.1093/molbev/msz240.
- G Yu, TTY Lam, H Zhu, Y Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(2):3041-3043. doi: 10.1093/molbev/msy194.
- G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi: 10.1111/2041-210X.12628.