Signal visualisation

Learning outcomes

Using deepTools - to visualise ChIP signal in relation to annotated TSS

Signal visualisation with deepTools

One more thing that may come useful when analysing ChIP-seq data is visualising ChIP signal in relation to annotated features.

One such kind of features relevant for TFs are transcription start sites (TSS). In this exersice we use annotations for chromosomes 1 and 2. To do so we will:

  • convert bedgraph to bigWig using UCSC utilities

  • calculate scores per genome regions using among others the bigWig file

  • plot a heatmap of scores associated with genomic regions

Assuming the same files structure as in the main data processing tutorial, create a separate directory in ~/chipseq/analysis and navigate to it. Copy the files needed for this exercise.

cd ~/chipseq/analysis/
mkdir vis
cd vis

cp ../../hg19/chrom.sizes.hg19 chrom.sizes.hg19
ln -s ../../data/ENCFF000PED.cov.norm1x.bedgraph

To calculate scores per genome with deepTools computeMatrix we need bigWig file that we can obtain by converting bedgraph using UCSC utilities:

module load ucsc-utilities/v398

bedGraphToBigWig ENCFF000PED.cov.norm1x.bedgraph chrom.sizes.hg19 hela_1.bw

module unload ucsc-utilities

We can now compute the matrix of scores for visualisation using computeMatrix. This tool calculates scores per genome regions and prepares an intermediate file that can be used with plotHeatmap and plotProfiles. Typically, the genome regions are genes, but any other regions defined in a BED file can be used. computeMatrix accepts multiple score files (bigWig format) and multiple regions files (BED format). This tool can also be used to filter and sort regions according to their score.

We will need a BED file with positions of TSS that we can copy to the working directory before running computeMatrix e.g.

module load deepTools/3.3.2

cp ../../hg19/refGene_hg19_TSS_chr12_sorted_corr.bed ./

computeMatrix reference-point -S hela_1.bw \
-R refGene_hg19_TSS_chr12_sorted_corr.bed -b 5000 -a 5000 \
--outFileName matrix.tss.dat --outFileNameMatrix matrix.tss.txt \
--referencePoint=TSS -p 5

We can now create a heatmap for scores associated with genomic regions, i.e. plot the binding profile around TSS

plotHeatmap --matrixFile matrix.tss.dat \
--outFileName tss.hela_1.pdf \
--sortRegions descend --sortUsing mean

Have a look at the tss.hela_rep1.pdf. What do you think?

This is a very basic plot. We can add on to it, for example we can cluster genes based on the signal profile around TSS. For more possibilities please check plotHetmap.

plotHeatmap --matrixFile matrix.tss.dat \
--outFileName tss.hela_rep1_k3_.pdf \
--sortRegions descend --sortUsing mean \
--kmeans 3


Appendix

The plots generated in this tutorial.

Figure 1. Signal profiles and heatmaps centered on TSS for REST ChIP-seq in HeLa, replicate 1.

non-clustered

clustered (kmeans, k=3)

../../../_images/tss.hela_1.png ../../../_images/tss.hela_rep1_k3.png