ATAC-Seq Analysis

ATAC-seq是非常流行的研究染色质可及性的实验方法,本文档描述了我们所采用的分析流程。

流程

数据预处理

主要包括数据获取、参考基因组及质控去接头。

数据获取

SRA Toolkit可用于数据的下载和转化:

prefetch <SRA accession>
fastq-dump --split-e --gzip <input file> -O <output_directory>

参考基因组:

wget ftp://ftp.ensembl.org/pub/release-101/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-101/gtf/mus_musculus/Mus_musculus.GRCm38.101.gtf.gz

去接头

Trim Galore用于去接头及质量低的碱基:

trim_galore --paired --fastqc -q 10 --length 30 --stringency 3 \
-o <output_directory> <READ1> <READ2>

参数如下:

>

Parameter Description
default auto-detection of adapter sequence
--paired paired-end input files
--fastqc Run FastQC in the default mode on the FastQ file once trimming is complete
-q 10 removes base calls with a Phred score of 10 or lower
--length 30 removes sequences that got shorter than 30 bp
--stringency 3 3 bp of overlapping sequence will be trimmed off from the 3' end of any read

比对

使用Bowtie2:

bowtie2-build -f <reference_genome> <bt2-idx>
bowtie2 -x <bt2-idx> -1 <trimmed_1> -2 <trimmed_2> \
    -t -q -N 1 -L 25 -X 2000 --no-mixed --no-discordant -p 10 -S <sam>

过滤不需要的Reads

samtools view -@ 10 -bS <Sam> <Bam>
samtools sort -@ 4 <Bam> -o <Sorted_Bam>
samtools index -b -@ 4 $BAM/<head1>_sorted.bam

samtools view -b -q 10 $BAM/<head1>_sorted.bam \
1 2 3 4 5 6 7 8 9 10 11 12 13 \
14 15 16 17 18 19 \
X Y|samtools sort -o $BAM/<head1>_q10.bam -

java -Xmx2g -jar picard.jar MarkDuplicates \
 I=input.bam \
 O= duplicates_removed.bam \
 REMOVE_DUPLICATES=true \
 M= marked_dup_metrics.txt

查看Insert Size

java -Xmx2g -jar picard.jar CollectInsertSizeMetrics \
 I= duplicates_removed.bam \
 O= insert_size_metrics.txt \
 H= insert_size_histogram.pdf \
 M=0.5

Peak calling

macs2 callpeak -t <Bam> -B --nomodel \
--shift -100 --extsize 200 \
--nolambda -g mm -n <prefix>_peakcall --outdir <output_directory> -f BAM

Call differential peaks:

macs2 bdgdiff --t1 Mouse_DUX_Pos_peakcall_du_treat_pileup.bdg \
            --t2 Mouse_DUX_Neg_peakcall_du_treat_pileup.bdg \
            --c1 Mouse_DUX_Pos_peakcall_du_control_lambda.bdg \
            --c2 Mouse_DUX_Neg_peakcall_du_control_lambda.bdg \
            -l 500 \
            -g 250 \
            -o POS_specific Neg_specific common

可视化

包括Track和Heatmap等。

Track

bamCoverage -b <Bam> -o <BigWig> \
-bs 10 \
--normalizeUsingRPKM

make_tracks_file -trackFiles \
SRR4032269.bw SRR4032270.bw SRR4032271.bw SRR4032272.bw \
-o atac_track.txt

pyGenomeTracks -tracks atac_track.txt --region chr2:10,000,000-11,000,000 \
--outFileName GSE85624_tracks.pdf

Heatmap

computeMatrix reference-point \
-S GSE85624_mDUX-GFPpos_ATAC.bw GSE85624_mDUX-GFPneg_ATAC.bw \
-R POS_specific Neg_specific common \
--referencePoint center -a 5000 -b 5000 \
-o $OUT/matrix.mat.gz

plotHeatmap -m $OUT/matrix.mat.gz \
 --colorMap Reds \
 --whatToShow 'heatmap and colorbar' \
 --refPointLabel Peak \
 --zMin 0 \
 --zMax 3 \
 --dpi 1000 \
 --boxAroundHeatmaps no \
 -o $OUT/Heatmap_diff_peak.pdf

测试结果

Mapping statistics.

peak distribution.

Peak overlap

参考文献

1.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 Dec;10(12):1213-8.

2.“SRA Toolkit.”https://github.com/ncbi/sra-tools

3.“Trim Galore.”https://github.com/FelixKrueger/TrimGalore

4.Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.

5.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.

6.“Picard Toolkit.” 2019. Broad Institute, GitHub Repository. http://broadinstitute.github.io/picard/; Broad Institute

7.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.

8.Ramírez, F., Dündar, F., Diehl, S., Grüning, B.A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).

9.Hendrickson PG, Doráis JA, Grow EJ, Whiddon JL, Lim JW, Wike CL, Weaver BD, Pflueger C, Emery BR, Wilcox AL, Nix DA, Peterson CM, Tapscott SJ, Carrell DT, Cairns BR. Conserved roles of mouse DUX and human DUX4 in activating cleavage-stage genes and MERVL/HERVL retrotransposons. Nat Genet. 2017 Jun;49(6):925-934.

10.Lucille Delisle, Maria Doyle, Florian Heyl, 2020 ATAC-Seq data analysis (Galaxy Training Materials). /training-material/topics/epigenetics/tutorials/atac-seq/tutorial.html Online; accessed Mon Oct 12 2020.

11.Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012