ChIP-seq是研究转录因子和DNA结合的实验,分析ChIP-seq最为流行的方法是MACS2 (Model-based Analysis of ChIP-seq)。下图来自Wilbanks and Faccioti, PLoS One 2010, 它描述了MACS2分析ChIP-seq的基本原理。
下文以GSM3112103 (PU.1 ChIP)和GSM1233959 (H3K27Ac ChIP)作为例子,简要列出了主要分析代码。
单端比对:
bwa mem -t 6 /sibcb2/bioinformatics/iGenome/BWA/GENECODE/GRCh38 xChIPseq/Fastq/GSM3112103.fastq.gz | samtools view -@ 6 -bS -o GSM3112103_raw.bam
samtools view -h -q 10 -F 1796 | samtools sort -@ 6 -o GSM3112103.bam GSM3112103_raw.bam
双端比对:
bwa mem -t 6 /sibcb2/bioinformatics/iGenome/BWA/GENECODE/GRCh38 ChIPseq/Fastq/GSM1233978_R1.fastq.gz /sibcb2/bioinformatics/iGenome/BWA/GENECODE/GRCh38 ChIPseq/Fastq/GSM1233978_R2.fastq.gz | samtools view -@ 6 -bS -o GSM1233978_raw.bam
samtools view -h -q 10 -F 1796 | samtools sort -@ 6 -o GSM1233978.bam GSM1233978_raw.bam
单端Peak calling:
macs2 callpeak --SPMR -B -q 0.05 --keep-dup 1 -g 2.7e9 -n GSM3112103 -f AUTO \
-t GSM3112103/GSM3112103.bam \
-c GSM3112107.bam
产生信号文件:
cat GSM3112103_control_lambda.bdg | sort -k1,1 -k2,2n > GSM3112103_control_lambda_sorted.bdg
bedGraphToBigWig GSM3112103_control_lambda_sorted.bdg /sibcb2/bioinformatics/iGenome/BWA/GENECODE/GRCh38_chrlen.txt GSM3112103_control_lambda.bw
cat GSM3112103_treat_pileup.bdg | sort -k1,1 -k2,2n > GSM3112103_treat_pileup_sorted.bdg
bedGraphToBigWig GSM3112103_treat_pileup_sorted.bdg /sibcb2/bioinformatics/iGenome/BWA/GENECODE/GRCh38_chrlen.txt GSM3112103_treat_pileup_sorted.bw
rm *.bdg
MACS2还会产生ChIP-seq QC用于质控:
options(repr.plot.width=7, repr.plot.height=5, repr.plot.res = 200)
source("ChIPseq/GSM3112103_model.r")
双端Peak Calling:
macs2 callpeak --SPMR -B -q 0.05 --keep-dup 1 -g 2.7e9 --nomodel -n GSM1233959 -f BAMPE \
-t /sibcb2/bioinformatics/TBC/ChIPseq/BAM/GSM1233959/GSM1233959.bam \
-c /sibcb2/bioinformatics/TBC/ChIPseq/BAM/GSM1233959/GSM1233972.bam
宽峰(如H3K27me3 and H3K36me3)需要加上–broad –broad-cutoff 0.05
参数。