官网地址:http://cole-trapnell-lab.github.io/monocle-release/docs/
一、基础处理
选择适合您的数据的分布(需要)
- expressionFamily参数
处理大型数据集(推荐)
- 稀疏矩阵:newCellDataSet(as(umi_matrix, "sparseMatrix")
转换TPM / FPKM值成mRNA计数(可选)
- rpc_matrix <- relative2abs(HSMM, method = "num_genes")
估计 size factors 和 dispersions(可选)
1 2 | HSMM <- estimateSizeFactors(HSMM) HSMM <- estimateDispersions(HSMM) |
过滤低质量的细胞(推荐)
1 2 3 4 | HSMM <- detectGenes(HSMM, min_expr = 0.1) expressed_genes <- row.names(subset(fData(HSMM), num_cells_expressed >= 10)) |
二、分类与细胞计数
按类型对单元格分类(推荐)
- HSMM <- classifyCells(HSMM, cth, 0.1)
不用Marker基因聚类细胞(可选)
减去“不感兴趣的”变量源的影响,以减少它们对聚类的影响
- residualModelFormulaStr参数:接受R模型公式字符串 R model formula string ,该字符串指定了要在聚类之前减去的效果
例如:
1 2 3 4 | HSMM <- reduceDimension(HSMM, max_components = 2, num_dim = 2, reduction_method = 'tSNE', residualModelFormulaStr = "~Media + num_genes_expressed", verbose = T) |
分面绘制图
1 2 3 | HSMM <- clusterCells(HSMM, num_clusters = 2) plot_cell_clusters(HSMM, 1, 2, color = "Cluster") + facet_wrap(~CellType) |
按细胞类型进行分面
使用标记基因对细胞进行聚类(推荐)
1 2 3 4 5 6 7 8 9 10 | marker_diff <- markerDiffTable(HSMM[expressed_genes,], cth, residualModelFormulaStr = "~Media + num_genes_expressed", cores = 1) candidate_clustering_genes <- row.names(subset(marker_diff, qval < 0.01)) marker_spec <- calculateMarkerSpecificity(HSMM[candidate_clustering_genes,], cth) head(selectTopMarkers(marker_spec, 3)) |
Imputing cell type(可选)
三、构造单细胞轨迹
- 轨迹步骤1:选择定义细胞进程的基因
1 2 3 4 5 6 | diff_test_res <- differentialGeneTest(HSMM_myo[expressed_genes,], fullModelFormulaStr = "~Media") ordering_genes <- row.names (subset(diff_test_res, qval < 0.01)) HSMM_myo <- setOrderingFilter(HSMM_myo, ordering_genes) plot_ordering_genes(HSMM_myo) |
- 轨迹步骤2:降低数据维数
1 2 | HSMM_myo <- reduceDimension(HSMM_myo, max_components = 2, method = 'DDRTree') |
- 轨迹步骤3:沿轨迹对细胞进行排序
1 | HSMM_myo <- orderCells(HSMM_myo) |
轨迹图绘图参数选择
- 按时间
1 | plot_cell_trajectory(HSMM_myo, color_by = "Hours") |
按时间
- 按状态
1 | plot_cell_trajectory(HSMM_myo, color_by = "State") |
按状态
- 按Pseudotime
1 2 | HSMM_myo <- orderCells(HSMM_myo, root_state = GM_state(HSMM_myo)) plot_cell_trajectory(HSMM_myo, color_by = "Pseudotime") |
“刻面”轨迹图
1 2 | plot_cell_trajectory(HSMM_myo, color_by = "State") + facet_wrap(~State, nrow = 1) |
“刻面”轨迹图
plot_genes_jitter
1 2 3 4 5 | blast_genes <- row.names(subset(fData(HSMM_myo), gene_short_name %in% c("CCNB2", "MYOD1", "MYOG"))) plot_genes_jitter(HSMM_myo[blast_genes,], grouping = "State", min_expr = 0.1) |
jitter图
plot_genes_in_pseudotime
1 2 3 4 5 6 7 | SMM_expressed_genes <- row.names(subset(fData(HSMM_myo), num_cells_expressed >= 10)) HSMM_filtered <- HSMM_myo[HSMM_expressed_genes,] my_genes <- row.names(subset(fData(HSMM_filtered), gene_short_name %in% c("CDK1", "MEF2C", "MYH3"))) cds_subset <- HSMM_filtered[my_genes,] plot_genes_in_pseudotime(cds_subset, color_by = "Hours") |
plot_genes_in_pseudotime
四、差异表达分析
基本差异分析
寻找能区分细胞类型或状态的基因
寻找随伪时间变化的基因
通过伪时态表达模式来聚类基因 (热图绘制)
1 2 3 4 5 6 7 | diff_test_res <- differentialGeneTest(HSMM_myo[marker_genes,], fullModelFormulaStr = "~sm.ns(Pseudotime)") sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1)) plot_pseudotime_heatmap(HSMM_myo[sig_gene_names,], num_clusters = 3, cores = 1, show_rownames = T) |
热图