下载数据集
https://support.10xgenomics.com/spatial-gene-expression/datasets
我选择的是:Mouse Brain Section (Coronal)
1 2 3 4 5 | $ tar -xvf V1_Adult_Mouse_Brain_fastqs.tar $ ls V1_Adult_Mouse_Brain_S5_L001_I1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L001_R2_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_R1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L001_I2_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_I1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_R2_001.fastq.gz V1_Adult_Mouse_Brain_S5_L001_R1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_I2_001.fastq.gz |
- 同一个样本的测序数据,这里总共有2条lane
- 每条lane因为是双索引的缘故,所以存在I1 I2 R1 R2共4个fastq文件、
-
所以总共有8条fastq
与之对应的情况是:
image.png
运行spaceranger count
此处选择自动对齐的方案
由于服务器没有连接外网:所以手动下载slide文件
https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/count
1 2 3 4 5 6 7 8 9 10 | $ spaceranger count --id=V1_Adult_Mouse_Brain --transcriptome=/share/nas1/Data/luohb/Visium/reference/refdata-cellranger-mm10-3.0.0/ --fastqs=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_fastqs --sample=V1_Adult_Mouse_Brain --image=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_image.tif --slide=V19L01-041 --area=C1 --slidefile=/share/nas1/Data/luohb/Visium/test2/V19L01-041.gpr --localcores=32 --localmem=128 |
顺利地跑完了,因为服务器同时还跑着几个比较大的任务,然后居然跑了接近13个小时。。。
image.png
查看结果文件
1 2 3 4 5 6 7 8 | $ ls _cmdline _finalstate _jobmode _mrosource _perf _sitecheck _tags _uuid _vdrkill _filelist _invocation _log outs _perf._truncated_ SPATIAL_RNA_COUNTER_CS _timestamp V1_Adult_Mouse_Brain.mri.tgz _versions $ cd outs/ $ ls analysis filtered_feature_bc_matrix metrics_summary.csv possorted_genome_bam.bam raw_feature_bc_matrix spatial cloupe.cloupe filtered_feature_bc_matrix.h5 molecule_info.h5 possorted_genome_bam.bam.bai raw_feature_bc_matrix.h5 web_summary.html |
-
查看web_summary.html
image.png
image.png
- 查看count管道输出几个包含自动二级分析结果的CSV文件
1 2 3 | $cd analysis/ $ls clustering diffexp pca tsne umap |
1. PCA降维结果:
1 2 3 | $cd /pca/10_components $ls components.csv dispersion.csv features_selected.csv projection.csv variance.csv |
投影
1 2 3 4 | $head -3 projection.csv Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10 AAACAAGTATCTCCCA-1,-10.281241313083257,-24.67223115562252,-0.19850052930601336,-2.1734929997144388,6.630976878797487,-0.12128746693282366,6.040708059434257,4.657495740394594,16.344239212184327,6.523601903899456 AAACAATCTACTAGCA-1,17.830458684877186,-27.53526668134934,15.877302377060623,9.74572143694312,-0.7208195934715782,-4.339470398396214,2.5444608437485288,-5.084679351848514,2.9247276185469495,-1.0731021612191327 |
components matrix
1 2 3 4 5 6 | $less -S components.csv PC,ENSMUSG00000051951,ENSMUSG00000089699,ENSMUSG00000025900,ENSMUSG00000025902,ENSMUSG00000033845,ENSMUSG00000025903,ENSMUSG00000104217,ENSMUSG00000033813,(略……) 1,9.807402710059275e-05,-0.0007359419037463138,0.0018506647696503106,0.0019216677830155664,-0.009477278899046813,-0.005003056852125207,0.0,-0.008498306263180 2,-0.0013017257339919546,0.0015759310908915448,0.0013809836795030965,0.0009513422156874659,0.007418499981929492,0.003222355732773671,0.0,0.00887178686827463, 3,-0.001920230193482586,0.003378841598139873,-0.00012165106820253075,-0.00024897415838216264,-0.0031447165300072175,-0.007787586978438225,0.0,-0.003148852394 (略……) |
总方差的比例
1 2 3 4 | $head -3 variance.csv PC,Proportion.Variance.Explained 1,0.030645967432188836 2,0.015067575203691749 |
归一化的离散度
1 2 3 4 | $head -3 dispersion.csv Feature,Normalized.Dispersion ENSMUSG00000051951,0.261762717719762 ENSMUSG00000089699,-1.5988672040435437 |
2. t-SNE结果文件:
1 2 3 4 5 6 7 8 9 10 | $cd ../../tsne/2_components/ $ls projection.csv $head -5 projection.csv Barcode,TSNE-1,TSNE-2 AAACAAGTATCTCCCA-1,-18.47081216664088,7.240054873818881 AAACAATCTACTAGCA-1,-4.219964329936257,-9.182632464702484 AAACACCAATAACTGC-1,14.744060324279337,13.360913482080413 AAACAGAGCGACTCCT-1,-11.72411901642397,-7.924228663324808 |
3. 聚类结果:
1 2 3 4 | $cd ../../clustering/ $ls graphclust kmeans_2_clusters kmeans_4_clusters kmeans_6_clusters kmeans_8_clusters kmeans_10_clusters kmeans_3_clusters kmeans_5_clusters kmeans_7_clusters kmeans_9_clusters |
对于每个聚类, spaceranger为每个点生成聚类分配cluster assignments
打开聚类3看看:
1 2 3 4 5 6 7 8 9 | $cd kmeans_3_clusters $ls clusters.csv $head -5 clusters.csv Barcode,Cluster AAACAAGTATCTCCCA-1,1 AAACAATCTACTAGCA-1,3 AAACACCAATAACTGC-1,2 AAACAGAGCGACTCCT-1,1 |
4. 差异表达分析:
1 2 3 4 | $cd ../../diffexp/ $ls graphclust kmeans_2_clusters kmeans_4_clusters kmeans_6_clusters kmeans_8_clusters kmeans_10_clusters kmeans_3_clusters kmeans_5_clusters kmeans_7_clusters kmeans_9_clusters |
这次看个总表:
1 2 3 4 5 6 7 | $cd /graphclust $ls differential_expression.csv $head -3 differential_expression.csv Feature ID,Feature Name,Cluster 1 Mean Counts,Cluster 1 Log2 fold change,Cluster 1 Adjusted p value,Cluster 2 Mean Counts,Cluster 2 Log2 fold change,Cluster 2 Adjusted p value,Cluster 3 Mean Counts,Cluster 3 Log2 fold change,Cluster 3 Adjusted p value,Cluster 4 Mean Counts,Cluster 4 Log2 fold change,Cluster 4 Adjusted p value,Cluster 5 Mean Counts,Cluster 5 Log2 fold change,Cluster 5 Adjusted p value,Cluster 6 Mean Counts,Cluster 6 Log2 fold change,Cluster 6 Adjusted p value,Cluster 7 Mean Counts,Cluster 7 Log2 fold change,Cluster 7 Adjusted p value,Cluster 8 Mean Counts,Cluster 8 Log2 fold change,Cluster 8 Adjusted p value,Cluster 9 Mean Counts,Cluster 9 Log2 fold change,Cluster 9 Adjusted p value ENSMUSG00000051951,Xkr4,0.09115907843838432,0.15688013442205495,0.9130108472807676,0.08789156406190936,0.094226986457139,1.0,0.059424476860418934,-0.5579910544947899,0.4792687534164091,0.09747791035014447,0.270272692975412,0.7950049780312995,0.08717356987748102,0.14776402072440886,1.0,0.05406634025868632,-0.6310298603360582,0.7980928917515894,0.15030400022885756,0.9570457266970553,0.22931236900985477,0.0606581027791399,-0.4319057525382224,1.0,0.10761817731957228,0.4400508833584902,1.0 ENSMUSG00000089699,Gm1992,0.0016574377897888059,1.3866145310996707,0.8220253607506287,0.0,0.423008752385563,1.0,0.0,0.22991150489664136,1.0,0.0033613072534532575,2.5793194965660433,0.5338242296758853,0.0,2.3542148981918345,1.0,0.003180372956393313,2.490599584065473,0.8676482778053517,0.0,1.5959470345290159,1.0,0.0,1.4568374963600368,1.0,0.0,2.146642828481177,1.0 |
5 .矩阵:Feature-Barcode Matrices
矩阵的每个元素是与特征(行)和条形码(列)关联的UMI的数量。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | $cd /share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs $ls analysis filtered_feature_bc_matrix metrics_summary.csv possorted_genome_bam.bam raw_feature_bc_matrix spatial cloupe.cloupe filtered_feature_bc_matrix.h5 molecule_info.h5 possorted_genome_bam.bam.bai raw_feature_bc_matrix.h5 web_summary.html $tree filtered_feature_bc_matrix filtered_feature_bc_matrix ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz 0 directories, 3 files $tree raw_feature_bc_matrix raw_feature_bc_matrix ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz 0 directories, 3 files |
1 2 3 4 | $gzip -cd filtered_feature_bc_matrix/features.tsv.gz |head -3 ENSMUSG00000051951 Xkr4 Gene Expression ENSMUSG00000089699 Gm1992 Gene Expression ENSMUSG00000102343 Gm37381 Gene Expression |
其中:
1 2 | 第一列 第二列 第三列 功能ID 基因名 标识特征的类型 |
尝试将矩阵加载到R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | library(Matrix) matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix/" barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz") features.path <- paste0(matrix_dir, "features.tsv.gz") matrix.path <- paste0(matrix_dir, "matrix.mtx.gz") mat <- readMM(file = matrix.path) feature.names = read.delim(features.path, header = FALSE, stringsAsFactors = FALSE) barcode.names = read.delim(barcode.path, header = FALSE, stringsAsFactors = FALSE) colnames(mat) = barcode.names$V1 rownames(mat) = feature.names$V1 dim(mat) [1] 31053 2698 |
尝试将矩阵加载到Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import csv import gzip import os import scipy.io matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix" mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx.gz")) features_path = os.path.join(matrix_dir, "features.tsv.gz") feature_ids = [row[0] for row in csv.reader(gzip.open(features_path), delimiter=" ")] gene_names = [row[1] for row in csv.reader(gzip.open(features_path), delimiter=" ")] feature_types = [row[2] for row in csv.reader(gzip.open(features_path), delimiter=" ")] barcodes_path = os.path.join(matrix_dir, "barcodes.tsv.gz") barcodes = [row[0] for row in csv.reader(gzip.open(barcodes_path), delimiter=" ")] |
6. 看图片
1 2 3 | $cd spatial/ $ls aligned_fiducials.jpg detected_tissue_image.jpg scalefactors_json.json tissue_hires_image.png tissue_lowres_image.png tissue_positions_list.csv |
tissue_hires_image.png:较高像素的明场图片
image.png
tissue_lowres_image.png:较低像素的明场图片
image.png
aligned_fiducials.jpg(尺寸与 tissue_hires_image.png相同):用于验证基准对齐是否成功
image.png
相应的像素坐标转换文件:scalefactors_json.json
1 2 | $cat scalefactors_json.json {"spot_diameter_fullres": 89.44476048022638, "tissue_hires_scalef": 0.17011142, "fiducial_diameter_fullres": 144.48769000651953, "tissue_lowres_scalef": 0.05 |
PS:这部有点像旧流程的ST_spot_detector的步骤了
其中:
- issue_hires_scalef:将原始全分辨率图像中的像素位置转换为tissue_hires_image.png中的像素位置的比例因子。
- tissue_lowres_scalef:将原始全分辨率图像中的像素位置转换为tissue_lowres_image.png中的像素位置的比例因子。
- fiducial_diameter_fullres:跨越原始全分辨率图像中基准点直径的像素数。
- spot_diameter_fullres:跨越原始全分辨率图像中组织点直径的像素数。
detected_tissue_image.jpg:
image.png
tissue_positions_list.txt:
1 2 3 | $head -2 tissue_positions_list.csv ACGCCTGACACGCGCT-1,0,0,0,1252,1211 TACCGATCCAACACTT-1,0,1,1,1372,1280 |
其中列对应着:
- barcode:与该点相关的条形码的顺序。
- in_tissue:二进制,指示该斑点位于组织的内部(1)还是外部(0)。
- array_row:点在阵列中的行坐标从0到77。该阵列有78行。
- array_col:阵列中点的列坐标。为了表示 the orange crate arrangement of the spots,此列索引对偶数行使用0到126的偶数,对奇数行使用1到127的奇数。注意,每行(偶数或奇数)有64个斑点。
- pxl_col_in_fullres:全分辨率图像中斑点中心的列像素坐标。
- pxl_row_in_fullres:全分辨率图像中斑点中心的行像素坐标。
7. BAM:Barcoded BAM
1 2 3 4 5 6 7 | $cd outs/ $samtools view possorted_genome_bam.bam |head -5 A00984:21:HMKLFDMXX:2:2117:10357:1235 16 1 3000100 255 25M199730N72M23S * 0 0 TTTTTTTTTTTTTTTTTTTTTTTTGCAAGAAAAAAAATCAGATAACCGAGGAAAATTATTCATTATGAAGTACTACTTTCCACTTCATTTCATCCCATGTACTCTGCGTTGATACCACTG F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF NH:i:1 HI:i:1 AS:i:83 nM:i:1 RE:A:I xf:i:0 ts:i:21 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:GACGACGATCCGCGTT CY:Z:FFFFFFFFFFFFFFFF CB:Z:GACGACGATCCGCGTT-1 UR:Z:CCTGTTTGTTGT UY:Z:FFFFFFFFFFFF UB:Z:CCTGTTTGTTGT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2 A00984:21:HMKLFDMXX:1:1306:5041:10034 16 1 3000100 255 25M199611N95M * 0 0 TTTTTTTTTTTTTTTTTTTTTTTTGAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCA FFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:108 nM:i:0 RE:A:I xf:i:0 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TGGTCTGTTGGGCGTA CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGGTCTGTTGGGCGTA-1 UR:Z:GTTACCCTATGT UY:Z:FFFFFFFFFFFF UB:Z:GTTACCCTATGT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1 A00984:21:HMKLFDMXX:2:2345:21206:5087 16 1 3010019 255 98M22S * 0 0 ATAGTGTCCCAGATTTCCTGGCTGTTTCTTGTTAGGATTTTTTTAGATTTAACATTTCTGTCATAGATTAATCTATTTTGCAGATGTAATCCCATGTACTCTGCGTTGATACCACTGCTT F:FFFFFFFFFFF::FFF:FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFF NH:i:1 HI:i:1 AS:i:90 nM:i:3 RE:A:I xf:i:0 ts:i:30 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:ACGGTCACCGAGACCCY:Z:FFFFFFFFFFFFF,F: CB:Z:ACGGTCACCGAGAACA-1 UR:Z:TCGATCTCGTAA UY:Z:FFFFFFFFFFFF UB:Z:TCGATCTCGTAA RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2 A00984:21:HMKLFDMXX:1:1164:15980:17738 16 1 3013014 255 17M186702N103M * 0 0 TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:108 nM:i:0 RE:A:I xf:i:0 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFF,FFFFFF CR:Z:TCAAGGTTACTACACC CY:Z:FFFFFFFFFFF:FFFF CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT UY:Z:FFFFFFFFFFFF UB:Z:CCGGGCAGTTAT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1 A00984:21:HMKLFDMXX:1:1451:3477:33912 16 1 3013014 255 17M186702N103M * 0 0 TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT FFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:108 nM:i:0 RE:A:I xf:i:0 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TCAAGGTTACTACACC CY:Z:FFFFFFFFFFF:F,FF CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT UY:Z:FFFFFFFFFFFF UB:Z:CCGGGCAGTTAT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1 |
貌似没看到官网讲的
进行R的下游分析
由于现在还没有现成的用于10X Visium空间转录组的R包,只好参考官网的R代码
官网地址:https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit
通过Loupe Browser 4.0.0进行下游分析
- 打开Xftp,打开
cloupe.cloupe image.png
-
查看tSNE
image.png
-
UMAP
image.png
-
Feacture Plot
image.png
Feature Plot视图可让您可视化每个点的一个或两个基因的表达水平。此视图使得根据一个或两个基因的表达水平对点组进行阈值化变得容易。特征(在这种情况下为基因)可以在Y轴顶部或X轴右侧的文本框中输入。这些选择器还包含一个控件,用于在线性和对数刻度之间切换轴的刻度。
image.png