SRA Toolkit可用于数据的下载和转化:
prefetch <SRA accession>
fastq-dump --split-e --gzip <input file> -O <output_directory>
参考基因组:
wget ftp://ftp.ensembl.org/pub/release-101/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-101/gtf/mus_musculus/Mus_musculus.GRCm38.101.gtf.gz
Trim Galore用于去接头及质量低的碱基:
trim_galore --paired --fastqc -q 10 --length 30 --stringency 3 \
-o <output_directory> <READ1> <READ2>
参数如下:
>
Parameter Description default auto-detection of adapter sequence --paired paired-end input files --fastqc Run FastQC in the default mode on the FastQ file once trimming is complete -q 10 removes base calls with a Phred score of 10 or lower --length 30 removes sequences that got shorter than 30 bp --stringency 3 3 bp of overlapping sequence will be trimmed off from the 3' end of any read
使用Bowtie2:
bowtie2-build -f <reference_genome> <bt2-idx>
bowtie2 -x <bt2-idx> -1 <trimmed_1> -2 <trimmed_2> \
-t -q -N 1 -L 25 -X 2000 --no-mixed --no-discordant -p 10 -S <sam>
samtools view -@ 10 -bS <Sam> <Bam>
samtools sort -@ 4 <Bam> -o <Sorted_Bam>
samtools index -b -@ 4 $BAM/<head1>_sorted.bam
samtools view -b -q 10 $BAM/<head1>_sorted.bam \
1 2 3 4 5 6 7 8 9 10 11 12 13 \
14 15 16 17 18 19 \
X Y|samtools sort -o $BAM/<head1>_q10.bam -
java -Xmx2g -jar picard.jar MarkDuplicates \
I=input.bam \
O= duplicates_removed.bam \
REMOVE_DUPLICATES=true \
M= marked_dup_metrics.txt
java -Xmx2g -jar picard.jar CollectInsertSizeMetrics \
I= duplicates_removed.bam \
O= insert_size_metrics.txt \
H= insert_size_histogram.pdf \
M=0.5
macs2 callpeak -t <Bam> -B --nomodel \
--shift -100 --extsize 200 \
--nolambda -g mm -n <prefix>_peakcall --outdir <output_directory> -f BAM
macs2 bdgdiff --t1 Mouse_DUX_Pos_peakcall_du_treat_pileup.bdg \
--t2 Mouse_DUX_Neg_peakcall_du_treat_pileup.bdg \
--c1 Mouse_DUX_Pos_peakcall_du_control_lambda.bdg \
--c2 Mouse_DUX_Neg_peakcall_du_control_lambda.bdg \
-l 500 \
-g 250 \
-o POS_specific Neg_specific common
包括Track和Heatmap等。
bamCoverage -b <Bam> -o <BigWig> \
-bs 10 \
--normalizeUsingRPKM
make_tracks_file -trackFiles \
SRR4032269.bw SRR4032270.bw SRR4032271.bw SRR4032272.bw \
-o atac_track.txt
pyGenomeTracks -tracks atac_track.txt --region chr2:10,000,000-11,000,000 \
--outFileName GSE85624_tracks.pdf
computeMatrix reference-point \
-S GSE85624_mDUX-GFPpos_ATAC.bw GSE85624_mDUX-GFPneg_ATAC.bw \
-R POS_specific Neg_specific common \
--referencePoint center -a 5000 -b 5000 \
-o $OUT/matrix.mat.gz
plotHeatmap -m $OUT/matrix.mat.gz \
--colorMap Reds \
--whatToShow 'heatmap and colorbar' \
--refPointLabel Peak \
--zMin 0 \
--zMax 3 \
--dpi 1000 \
--boxAroundHeatmaps no \
-o $OUT/Heatmap_diff_peak.pdf
1.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 Dec;10(12):1213-8.
2.“SRA Toolkit.”https://github.com/ncbi/sra-tools
3.“Trim Galore.”https://github.com/FelixKrueger/TrimGalore
4.Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
5.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
6.“Picard Toolkit.” 2019. Broad Institute, GitHub Repository. http://broadinstitute.github.io/picard/; Broad Institute
7.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.
8.Ramírez, F., Dündar, F., Diehl, S., Grüning, B.A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
9.Hendrickson PG, Doráis JA, Grow EJ, Whiddon JL, Lim JW, Wike CL, Weaver BD, Pflueger C, Emery BR, Wilcox AL, Nix DA, Peterson CM, Tapscott SJ, Carrell DT, Cairns BR. Conserved roles of mouse DUX and human DUX4 in activating cleavage-stage genes and MERVL/HERVL retrotransposons. Nat Genet. 2017 Jun;49(6):925-934.
10.Lucille Delisle, Maria Doyle, Florian Heyl, 2020 ATAC-Seq data analysis (Galaxy Training Materials). /training-material/topics/epigenetics/tutorials/atac-seq/tutorial.html Online; accessed Mon Oct 12 2020.
11.Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012