RNA-seq数据通路富集分析

分析流程示意图

该模块需要输入差异基因列表,分别得到上下调基因的GO、Pathway富集结果。当GO term 数量大于50时,我们会对GO term将功能相近的条目归为一类,然后给出GO term 的相似性热图,否则会直接给出富集气泡图,并且给出富集的表格;KEGG 通路富集结果则给出富集气泡图和表格,如下图所示。

Drawing

软件包准备

In [5]:
suppressPackageStartupMessages(library("clusterProfiler"))
suppressPackageStartupMessages(library("org.Hs.eg.db"))
suppressPackageStartupMessages(library("msigdbr"))

options(repr.plot.width=4, repr.plot.height=3, repr.plot.res = 200)

DGEtable = "RNAseq/DGE/WPS_Cancer_vs_Normal_DESeq2.txt"
outFolder = "RNAseq/Pathway"

if(!dir.exists(outFolder))
    dir.create(outFolder)

GO富集分析

In [7]:
Tx = read.table(file = DGEtable, row.names = 1, header = TRUE, sep = "\t")
head(Tx)
A data.frame: 6 x 6
treatment_TPMcontrol_TPMlog2FoldChangepvaluepadjDGE
<dbl><dbl><dbl><dbl><dbl><fct>
ENSG00000000003.15 TSPAN620.0400011.7700 0.9682494 1.887758e-27 1.549429e-26UP
ENSG00000000005.6 TNMD 0.03598 0.2433-1.9451103 6.022261e-06 1.476815e-05DN
ENSG00000000419.12 DPM173.9900048.8200 0.8308568 2.719630e-30 2.509226e-29UP
ENSG00000000457.14 SCYL3 7.25400 7.1880 0.1868544 5.718560e-03 1.003688e-02UP
ENSG00000000460.17 C1orf11210.13000 3.3990 1.76263031.641644e-1101.511235e-108UP
ENSG00000000938.13 FGR 7.9900065.7600-2.7090187 2.587081e-56 5.884496e-55DN
In [11]:
Symbol = sapply(strsplit(rownames(Tx), " "), function(z) strsplit(z, "\\.")[[1]][1])
UP = unique(Symbol[Tx$DGE == "UP"])
DN = unique(Symbol[Tx$DGE == "DN"])
PG = unique(Symbol)
In [12]:
upEnrich <- enrichGO(gene = UP, keyType = "ENSEMBL", universe = PG, OrgDb = org.Hs.eg.db, ont = "BP", 
                    pAdjustMethod = "BH", pvalueCutoff  = 0.01, qvalueCutoff  = 0.05,readable = TRUE)
In [19]:
upEnrich[1:4, 1:7]
A data.frame: 4 x 7
IDDescriptionGeneRatioBgRatiopvaluep.adjustqvalue
<chr><chr><chr><chr><dbl><dbl><dbl>
GO:0034470GO:0034470ncRNA processing 294/7344375/180832.316332e-511.466238e-471.286174e-47
GO:0042254GO:0042254ribosome biogenesis 233/7344279/180836.283834e-501.988834e-461.744591e-46
GO:0022613GO:0022613ribonucleoprotein complex biogenesis341/7344478/180833.601947e-437.600108e-406.666761e-40
GO:0140053GO:0140053mitochondrial gene expression 146/7344159/180831.672677e-422.647011e-392.321939e-39

为了方便作图,我们只展示其中最显著的20个Pathway。

In [30]:
options(repr.plot.width=8, repr.plot.height=6, repr.plot.res = 200)
dotplot(upEnrich,showCategory=20)
In [ ]:
上面分析了在上调中富集的Pathway,下面是在下调中富集的Pathway
In [ ]:
dnEnrich <- enrichGO(gene = DN, keyType = "ENSEMBL", universe = PG, OrgDb = org.Hs.eg.db, ont = "BP", 
                    pAdjustMethod = "BH", pvalueCutoff  = 0.01, qvalueCutoff  = 0.05,readable = TRUE)
In [41]:
options(repr.plot.width=8, repr.plot.height=4, repr.plot.res = 200)
dotplot(dnEnrich,showCategory=4)

KEGG富集分析

In [51]:
xG  <- as.list(org.Hs.egENSEMBL2EG)
xUP <- unique(unlist(xG[UP]))
xDN <- unique(unlist(xG[DN]))
xPG <- unique(unlist(xG[PG]))
In [58]:
upKEGG <- enrichKEGG(xUP, organism = "hsa", keyType = "kegg", pvalueCutoff = 0.01, 
                     pAdjustMethod = "BH", universe = xPG, minGSSize = 10, maxGSSize = 500, qvalueCutoff = 0.05, use_internal_data = FALSE)
options(repr.plot.width=8, repr.plot.height=6, repr.plot.res = 200)
dotplot(upKEGG,showCategory=20)

上面是上调的KEGG通路,再看下调的。

In [59]:
dnKEGG <- enrichKEGG(xDN, organism = "hsa", keyType = "kegg", pvalueCutoff = 0.01, 
                     pAdjustMethod = "BH", universe = xPG, minGSSize = 10, maxGSSize = 500, qvalueCutoff = 0.05, use_internal_data = FALSE)
options(repr.plot.width=8, repr.plot.height=6, repr.plot.res = 200)
dotplot(dnKEGG,showCategory=20)