Processing bisulfite sequencing with cpgtools

甲基化数据经过Bismark或者Bsmap处理后,一般生成bedGraph文件。CpGtools是处理甲基化数据很方便的工具。本文将做简单示例,更复杂的例子见CpGtools文档

文件格式

head dnaMethylation/BSMAP/SRX1869508/mCall/SRX1869508_CpG.bedGraph

chr1    10496   10497   91  561 50
chr1    10497   10498   0   0   5
chr1    10524   10525   90  552 59
chr1    10525   10526   90  664 68
chr1    10541   10542   95  584 27
chr1    10542   10543   94  698 38
chr1    10562   10563   82  502 109
chr1    10563   10564   80  595 141
chr1    10570   10571   92  568 43

我们可以写一个简单的脚本将bedGraph转换成CpGtools识别的格式:

python /sibcb2/bioinformatics/code/bedGraph2BED6.py \
    -I dnaMethylation/BSMAP/SRX1869508/mCall/SRX1869508_CpG.bedGraph \
    -A /sibcb2/bioinformatics/iGenome/Bismark/hg19/hg19.fa \
    -O dnaMethylation/CpGtools/SRX1869508_CpG_BED6.bed

head dnaMethylation/CpGtools/SRX1869508_CpG_BED6.bed

chr1    10496   10497   chr1:10496-10497    0.9182  +
chr1    10497   10498   chr1:10497-10498    0.0 -
chr1    10524   10525   chr1:10524-10525    0.9034  +
chr1    10525   10526   chr1:10525-10526    0.9071  -
chr1    10541   10542   chr1:10541-10542    0.9558  +
chr1    10542   10543   chr1:10542-10543    0.9484  -
chr1    10562   10563   chr1:10562-10563    0.8216  +
chr1    10563   10564   chr1:10563-10564    0.8084  -
chr1    10570   10571   chr1:10570-10571    0.9296  +
chr1    10571   10572   chr1:10571-10572    0.9633  -

Region profile

CpGtools的beta_profile_region.py程序可以计算一组片段上的平均甲基化水平:

beta_profile_region.py \
    -r /sibcb2/bioinformatics/software/cpgtools/hg19/hg19.RefSeq.union.1Kpromoter.bed.gz \
    -i dnaMethylation/CpGtools/SRX1869508_CpG_BED6.bed \
    -o dnaMethylation/CpGtools/SRX1869508_promoters
In [6]:
options(repr.plot.width=6, repr.plot.height=4, repr.plot.res = 200)
source("dnaMethylation/CpGtools/SRX1869508_promoters.r")

Gene-centered profile

同样的我们可以计算甲基化和基因结构的关系:

beta_profile_gene_centered.py \
    -r /sibcb2/bioinformatics/software/cpgtools/hg19/hg19.RefSeq.union.bed.gz \
    -i dnaMethylation/CpGtools/SRX1869508_CpG_BED6.bed \
    -o dnaMethylation/CpGtools/SRX1869508_gene
In [8]:
options(repr.plot.width=10, repr.plot.height=4, repr.plot.res = 200)
source("dnaMethylation/CpGtools/SRX1869508_gene.r")