6.3.3.12. wig2bed¶
The wig2bed
script converts both variable - and fixed -step, 1-based, closed [start, end]
UCSC Wiggle format (WIG) to sorted, 0-based, half-open [start-1, end)
extended BED data.
In the case where WIG data are sourced from bigWigToWig
or other tools that generate 0-based, half-open [start-1, end)
WIG, a --zero-indexed
option is provided to generate coordinate output without any re-indexing.
For convenience, we also offer wig2starch
, which performs the extra step of creating a Starch-formatted archive.
The utility also supports multiple embedded WIG sections in a single file, which are output to the BED file with modified ID fields, using the --multisplit
option.
6.3.3.12.1. Source¶
The wig2bed
script requires convert2bed. The wig2starch
script requires starch. Both dependencies are part of a typical BEDOPS installation.
6.3.3.12.2. Usage¶
The wig2bed
script parses WIG from standard input and prints sorted BED to standard output. The wig2starch
script uses an extra step to parse WIG to a compressed BEDOPS Starch-formatted archive, which is also directed to standard output.
The header data of a WIG file is usually discarded, unless you add the --keep-header
option. In this case, BED elements are created from these data, using the chromosome name _header
to denote content. Line numbers are specified in the start and stop coordinates, and unmodified header data are placed in the fourth column (ID field).
If the input data contain WIG elements with a start position of 0, the default use of wig2bed
and wig2starch
will exit early with an EINVAL
error. Add the --zero-indexed
option to denote that the input WIG data are zero-indexed, and re-run the conversion tool to print unmodified output coordinates.
Tip
If your WIG input is potentially zero-indexed, e.g., if derived from bigWigToWig
, where the bigWig
data are themselves sourced from BAM- or bedGraph-formatted data, then it is recommended to use the --zero-indexed
option as a safety measure.
If your data contain multiple WIG sections, use the --multisplit <basename>
option to split sections out to BED elements with modified ID fields. This option can be used in conjunction with the --keep-header
option to preserve metadata.
Tip
By default, all conversion scripts now output sorted BED data ready for use with BEDOPS utilities. If you do not want to sort converted output, use the --do-not-sort
option. Run the script with the --help
option for more details.
Tip
If sorting converted data larger than system memory, use the --max-mem
option to limit sort memory usage to a reasonable fraction of available memory, e.g., --max-mem 2G
or similar. See --help
for more details.
6.3.3.12.3. Example¶
To demonstrate these scripts, we use a sample multi-section WIG input called foo.wig
(see the Downloads section to grab this file). We can convert WIG to sorted BED data in the following manner:
$ wig2bed < foo.wig
chr1 147971108 147971158 id-1 -0.590000
chr1 147971146 147971196 id-2 0.120000
chr1 147971184 147971234 id-3 0.110000
chr1 147971222 147971272 id-4 -0.760000
...
Note
Even though our WIG input foo.wig
has multiple sections, we can omit the use of --multisplit
, because conversion and sorting puts everything into one sorted BED file. However, the header data of the WIG file is discarded.
If we want to preserve the header data, we can add the --keep-header
option. In this case, BED elements are created from these data, using the chromosome name _header
to denote content. Line numbers are specified in the start and stop coordinates, and unmodified header data are placed in the fourth column (ID field).
In the case of the sample input foo.wig
, we will also need to add the --multisplit
option, as header BED elements from each section will otherwise be collated in a non-sensical way. Adding --multisplit
ensures that header data are converted and stored in separate BED files.
To demonstrate, we next repeat the above conversion, adding the --keep-header
and --multisplit
options:
$ wig2bed --multisplit bar --keep-header < foo.wig > foo.bed
Conversion of this two-section WIG input results in output with modified ID fields to denote their section association:
$ more foo.bed
_header 0 1 bar.1 track type=wiggle_0 name=foo description=foo
_header 1 2 bar.2 track type=wiggle_0 name=testfixed
_header 2 3 bar.2 fixedStep chrom=chrX start=100 step=10 span=5
chr1 147971108 147971158 bar.1-id-1 -0.590000
chr1 147971146 147971196 bar.1-id-2 0.120000
chr1 147971184 147971234 bar.1-id-3 0.110000
chr1 147971222 147971272 bar.1-id-4 -0.760000
chrX 99 104 bar.2-id-11 1.900000
chrX 109 114 bar.2-id-12 2.300000
chrX 119 124 bar.2-id-13 -0.100000
chrX 129 134 bar.2-id-14 1.100000
chrX 139 144 bar.2-id-15 4.100000
Note
Note the conversion from 1- to 0-based coordinate indexing, in the transition from WIG to BED. While BEDOPS supports 0- and 1-based coordinate indexing, the coordinate change made here is believed to be convenient for most end users.
In the case where the WIG data contain elements that have a start position of 0, the default use of wig2bed
and wig2starch
will exit early with an EINVAL
error. Add the --zero-indexed
option to denote that the WIG input is zero-indexed and re-run to convert without any coordinate shift.
Note
Multiple WIG sections in the input file are merged together by the default wig2bed
behavior. When using the --multisplit
option, each WIG section instead receives its own ID prefix.