单细胞转录组测序(五)monocle2包的功能总结

官网地址:http://cole-trapnell-lab.github.io/monocle-release/docs/

一、基础处理

选择适合您的数据的分布(需要)

  • expressionFamily参数

处理大型数据集(推荐)

  • 稀疏矩阵:newCellDataSet(as(umi_matrix, "sparseMatrix")

转换TPM / FPKM值成mRNA计数(可选)

  • rpc_matrix <- relative2abs(HSMM, method = "num_genes")

估计 size factors 和 dispersions(可选)

1
2
HSMM <- estimateSizeFactors(HSMM)
HSMM <- estimateDispersions(HSMM)

过滤低质量的细胞(推荐)

1
2
3
4
HSMM <- detectGenes(HSMM, min_expr = 0.1)

expressed_genes <- row.names(subset(fData(HSMM),
    num_cells_expressed >= 10))

二、分类与细胞计数

按类型对单元格分类(推荐)

  • HSMM <- classifyCells(HSMM, cth, 0.1)

不用Marker基因聚类细胞(可选)

减去“不感兴趣的”变量源的影响,以减少它们对聚类的影响

  • residualModelFormulaStr参数:接受R模型公式字符串 R model formula string ,该字符串指定了要在聚类之前减去的效果

例如:

1
2
3
4
HSMM <- reduceDimension(HSMM, max_components = 2, num_dim = 2,
            reduction_method = 'tSNE',
            residualModelFormulaStr = "~Media + num_genes_expressed",
            verbose = T)

分面绘制图

1
2
3
HSMM <- clusterCells(HSMM, num_clusters = 2)
plot_cell_clusters(HSMM, 1, 2, color = "Cluster") +
    facet_wrap(~CellType)

按细胞类型进行分面

使用标记基因对细胞进行聚类(推荐)

1
2
3
4
5
6
7
8
9
10
marker_diff <- markerDiffTable(HSMM[expressed_genes,],
            cth,
            residualModelFormulaStr = "~Media + num_genes_expressed",
            cores = 1)

candidate_clustering_genes <-
    row.names(subset(marker_diff, qval < 0.01))
marker_spec <-
  calculateMarkerSpecificity(HSMM[candidate_clustering_genes,], cth)
head(selectTopMarkers(marker_spec, 3))

Imputing cell type(可选)

三、构造单细胞轨迹

  • 轨迹步骤1:选择定义细胞进程的基因
1
2
3
4
5
6
diff_test_res <- differentialGeneTest(HSMM_myo[expressed_genes,],
              fullModelFormulaStr = "~Media")
ordering_genes <- row.names (subset(diff_test_res, qval < 0.01))

HSMM_myo <- setOrderingFilter(HSMM_myo, ordering_genes)
plot_ordering_genes(HSMM_myo)
  • 轨迹步骤2:降低数据维数
1
2
HSMM_myo <- reduceDimension(HSMM_myo, max_components = 2,
    method = 'DDRTree')

  • 轨迹步骤3:沿轨迹对细胞进行排序
1
HSMM_myo <- orderCells(HSMM_myo)

轨迹图绘图参数选择

  • 按时间
1
plot_cell_trajectory(HSMM_myo, color_by = "Hours")

按时间

  • 按状态
1
plot_cell_trajectory(HSMM_myo, color_by = "State")

按状态

  • 按Pseudotime
1
2
HSMM_myo <- orderCells(HSMM_myo, root_state = GM_state(HSMM_myo))
plot_cell_trajectory(HSMM_myo, color_by = "Pseudotime")

“刻面”轨迹图

1
2
plot_cell_trajectory(HSMM_myo, color_by = "State") +
    facet_wrap(~State, nrow = 1)

“刻面”轨迹图

plot_genes_jitter

1
2
3
4
5
blast_genes <- row.names(subset(fData(HSMM_myo),
gene_short_name %in% c("CCNB2", "MYOD1", "MYOG")))
plot_genes_jitter(HSMM_myo[blast_genes,],
    grouping = "State",
    min_expr = 0.1)

jitter图

plot_genes_in_pseudotime

1
2
3
4
5
6
7
SMM_expressed_genes <-  row.names(subset(fData(HSMM_myo),
num_cells_expressed >= 10))
HSMM_filtered <- HSMM_myo[HSMM_expressed_genes,]
my_genes <- row.names(subset(fData(HSMM_filtered),
          gene_short_name %in% c("CDK1", "MEF2C", "MYH3")))
cds_subset <- HSMM_filtered[my_genes,]
plot_genes_in_pseudotime(cds_subset, color_by = "Hours")

plot_genes_in_pseudotime

四、差异表达分析

基本差异分析

寻找能区分细胞类型或状态的基因

寻找随伪时间变化的基因

通过伪时态表达模式来聚类基因 (热图绘制)

1
2
3
4
5
6
7
diff_test_res <- differentialGeneTest(HSMM_myo[marker_genes,],
              fullModelFormulaStr = "~sm.ns(Pseudotime)")
sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1))
plot_pseudotime_heatmap(HSMM_myo[sig_gene_names,],
                num_clusters = 3,
                cores = 1,
                show_rownames = T)

热图

多方位差异表达分析

分析单细胞轨迹中的分支