Signal visualisation¶
Learning outcomes
Using deepTools
- to visualise ChIP signal in relation to annotated TSS
Signal visualisation with deepTools¶
One more thing that may come useful when analysing ChIP-seq data is visualising ChIP signal in relation to annotated features.
One such kind of features relevant for TFs are transcription start sites (TSS). In this exersice we use annotations for chromosomes 1 and 2. To do so we will:
convert
bedgraph
tobigWig
usingUCSC utilities
calculate scores per genome regions using among others the
bigWig
fileplot a heatmap of scores associated with genomic regions
Assuming the same files structure as in the main data processing tutorial, create a separate directory in ~/chipseq/analysis
and navigate to it. Copy the files needed for this exercise.
cd ~/chipseq/analysis/
mkdir vis
cd vis
cp ../../hg19/chrom.sizes.hg19 chrom.sizes.hg19
ln -s ../../data/ENCFF000PED.cov.norm1x.bedgraph
To calculate scores per genome with deepTools
computeMatrix we need bigWig file that we can obtain by converting bedgraph using UCSC utilities
:
module load ucsc-utilities/v398
bedGraphToBigWig ENCFF000PED.cov.norm1x.bedgraph chrom.sizes.hg19 hela_1.bw
module unload ucsc-utilities
We can now compute the matrix of scores for visualisation using computeMatrix. This tool calculates scores per genome regions and prepares an intermediate file that can be used with plotHeatmap
and plotProfiles
. Typically, the genome regions are genes, but any other regions defined in a BED file can be used. computeMatrix
accepts multiple score files (bigWig format) and multiple regions files (BED format). This tool can also be used to filter and sort regions according to their score.
We will need a BED
file with positions of TSS that we can copy to the working directory before running computeMatrix
e.g.
module load deepTools/3.3.2
cp ../../hg19/refGene_hg19_TSS_chr12_sorted_corr.bed ./
computeMatrix reference-point -S hela_1.bw \
-R refGene_hg19_TSS_chr12_sorted_corr.bed -b 5000 -a 5000 \
--outFileName matrix.tss.dat --outFileNameMatrix matrix.tss.txt \
--referencePoint=TSS -p 5
We can now create a heatmap for scores associated with genomic regions, i.e. plot the binding profile around TSS
plotHeatmap --matrixFile matrix.tss.dat \
--outFileName tss.hela_1.pdf \
--sortRegions descend --sortUsing mean
Have a look at the tss.hela_rep1.pdf
. What do you think?
This is a very basic plot. We can add on to it, for example we can cluster genes based on the signal profile around TSS. For more possibilities please check plotHetmap.
plotHeatmap --matrixFile matrix.tss.dat \
--outFileName tss.hela_rep1_k3_.pdf \
--sortRegions descend --sortUsing mean \
--kmeans 3