本文描述了分子细胞科学卓越中心生物信息平台所开发的scRNA-seq分析流程的比对子模块,该模块基于kb_python软件,支持很多单细胞测序技术如10X genomic(V1-V3)、CELSEQ(V1-V3)、DROPSEQ、INDROPS、SCRUBSEQ、SURECELL。
wget -c http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_fastqs.tar
mv pbmc_1k_v3_fastqs.tar scRNAseq/HumanPbmc3k/Fastq/pbmc_1k_v3_fastqs.tar
tar xvf scRNAseq/HumanPbmc3k/Fastq/pbmc_1k_v3_fastqs.tar
print(system("ls -la scRNAseq/HumanPbmc1k/Fastq/pbmc_1k_v3_fastqs", intern = TRUE)[c(-1,-2,-3)])
library("Matrix")
library("ggplot2")
library("scico")
library("tibble")
library("DropletUtils")
library("Seurat")
read_count_output <- function(dir, name) {
dir <- normalizePath(dir, mustWork = TRUE)
m <- readMM(paste0(dir, "/", name, ".mtx"))
m <- Matrix::t(m)
m <- as(m, "dgCMatrix")
# The matrix read has cells in rows
ge <- ".genes.txt"
genes <- readLines(file(paste0(dir, "/", name, ge)))
barcodes <- readLines(file(paste0(dir, "/", name, ".barcodes.txt")))
colnames(m) <- barcodes
rownames(m) <- genes
return(m)
}
如上文述所,我们推荐使用Kallisto|bustools进行定量,该方法快速准确,是scRNAseq数据基因定量的首选。
kb count -i /sibcb2/bioinformatics/iGenome/Kallisto/GENECODE_hg38/transcriptome.idx \
-g /sibcb2/bioinformatics/iGenome/Kallisto/GENECODE_hg38/tx2g_symbol.txt \
-x 10XV3 \
-o scRNAseq/HumanPbmc1k/kb \
--filter bustools \
-t 2 \
scRNAseq/HumanPbmc1k/Fastq/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz \
scRNAseq/HumanPbmc1k/Fastq/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz \
scRNAseq/HumanPbmc1k/Fastq/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz \
scRNAseq/HumanPbmc1k/Fastq/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz
运行时间大约需要10分钟,结果文件目录大致如下:
|-- 10xv3_whitelist.txt
|-- counts_filtered
| |-- cells_x_genes
| |-- cells_x_genes.barcodes.txt
| |-- cells_x_genes.genes.txt
| `-- cells_x_genes.mtx
|-- counts_unfiltered
| |-- cells_x_genes
| |-- cells_x_genes.barcodes.txt
| |-- cells_x_genes.genes.txt
| `-- cells_x_genes.mtx
|-- filter_barcodes.txt
|-- inspect.json
|-- matrix.ec
|-- output.bus
|-- output.filtered.bus
|-- output.unfiltered.bus
|-- run_info.json
`-- transcripts.txt
产生的结果文件大致如下:
res_mat <- read_count_output("scRNAseq/HumanPbmc1k/kb/counts_unfiltered", name = "cells_x_genes")
dim(res_mat)