Analysis of CUT&Tag H3K27me3 dataset
Background
H3K27me3 is a histone modification associated with facultative heterochromatin, i.e. chromatin that is silenced in some but not all cell types within a body. It is written by Polycomb Repressive Complex 2. Important PRC2 targets are HOX gene cluster which delineate the body plan during embryogenesis. HOX genes are silenced in pluripotent stem cells before the come into action during early development.
Datasets
GEO Accession |
Sample |
Reference |
---|---|---|
GSM3536498 |
hESC_H1_H3K27me3_CnT |
Kaya-Okur et al. 2019 [1] |
GSM3677833 |
NBIS_H1_H3K27me3_CnR |
Meers et al. 2019 [2] |
hESC_H1_H3K27me3_ChIP |
||
GSM1436879 |
WIBR2_H3K27me3_ChIP |
Theunissen et al. 2019 [3] |
H3K27me3-enriched regions from Court et al. 2019 [4]: Bivalent_primed.hg38.bed
(lift-overed from hg19).
Data preprocessing
Primary analysis of the initial FASTQ files was performed beforehand. Reads were mapped with bowtie2 with default parameters. Resulting BAM files were deduplicated using Picard and blacklisted regions were removed. BigWig files were generated with deepTools to 1x coverage. Resulting files are available under /sw/courses/epigenomics/quantitative_chip_simon/hESC_H3K27me3
.
You can download bigWig and peak annotations. Most of what we are going to do can be done locally on a regular laptop. When this is not the case, Uppmax-specific instructions will be given. In case something does not work properly, the output of most of these is available also in the workshop folder and in this documentation.
When running things on Uppmax, copy the files to your home directory:
cd
mkdir -p cnt_chip/bw
cd cnt_chip/bw
cp /sw/courses/epigenomics/quantitative_chip_simon/hESC_H3K27me3/bw/*.bw .
# Or you can create symlinks for the bigWig files instead:
for i in /sw/courses/epigenomics/quantitative_chip_simon/hESC_H3K27me3/bw/*.bw; do ln -s ${i}; done
Manual inspection of bigWig files using IGV
HOXA locus.
Smaller H3K27me3 domains.
Q: What are your first impressions browsing the H3K27me3 data? Consider the Y axis used for each dataset and what it means regarding signal/noise ratio.
Let’s look more systematically at known H3K27me3 peaks. We are using a published BED file of bivalent domains (Court et al. 2017 [4] ).
Warning
These plots were generated in the past using seqplots
. Unfortunately, this package is now deprecated. You can still
use older versions of R and this package, or just generate plots like this using another tool, like deepTools
. You can see more in their documentation. And there are also examples in other tutorials, such as the MINUTE-ChIP data tutorial in this section.
Q: The heatmaps show a similar colorscale but the signal to noise varies a lot (note the scale of CUT&TAG 0-1000!). Could there be a problem with CUT&Tag?
A common critique about the in-situ methods is that they may be generating false-positive peaks at accessible sites of the genome, since MNase likes to cut open chromatin and Tn5 also is known to integrate best in open chromatin (ATAC-Seq!).
So we could also plot the signal over active enhancers which are 1) known to have open chromatin and 2) are decorated by H3K27ac, thus cannot have H3K27me3.
Q: Is the aforementioned concern about false-positive peaks justified (consider the scale used here and above)?