4.4. Finding the subset of SNPs within DHSes

In this example, we would like to identify the set of SNPs that are within a DHS, printing out both the SNP element and the DHS it is contained within.

4.4.1. BEDOPS tools in use

We use bedmap to answer this question, as it traverses a reference BED file (in this example, SNPs), and identifies overlapping elements from the mapping BED file (in this example, DHSs).

4.4.2. Script

SNPs are in a BED-formatted file called SNPs.bed sorted lexicographically with sort-bed. The DNase-hypersensitive sites are stored in a sorted BED-formatted file called DHSs.bed. These two files are available in the Downloads section.

bedmap --skip-unmapped --echo --echo-map SNPs.bed DHSs.bed \
  > subsetOfSNPsWithinAssociatedDHS.bed

4.4.3. Discussion

The output of this bedmap statement might look something like this:

chr1    10799576    10799577    rs12.4.378  Systolic_blood_pressure Cardiovascular|chr1 10799460    10799610    MCV-1   9.18063

The output is delimited by pipe symbols (|), showing the reference element (SNP) and the mapped element (DHS).

If multiple elements are mapped onto a single reference element, the mapped elements are further separated by semicolons, by default.

4.4.4. Downloads

The bedmap tool can operate directly on Starch-formatted archives. Alternatively, use the unstarch tool to decompress Starch data files to sorted BED format.