7. Summary¶
These tables summarize BEDOPS utilities by option, file inputs and BED column requirements.
7.1. Set operation and statistical utilities¶
7.1.1. bedextract
¶
- Efficiently extracts features from BED input.
- BEDOPS bedextract documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
--list-chr |
Print every chromosome found in input.bed |
1 | 1 | 3 |
<chromosome> |
Retrieve all rows for specified chromosome, e.g. bedextract chr8
input.bed |
1 | 1 | 3 |
<query> <reference> |
Grab elements of query that overlap elements in reference. Same
as bedops -e -1 query reference , except that this option fails
when query contains fully-nested BED elements. May use - to
indicate stdin for reference only. |
2 | 2 | 3 |
7.1.2. bedmap
¶
- Maps source signals from
map-file
onto qualified target regions fromref-file
. Calculates an output for everyref-file
element. - BEDOPS bedmap documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
--bases |
Reports the total number of bases from map-file that overlap the
ref-file ‘s element. |
1 | 2 | 3 |
--bases-uniq |
Reports the number of distinct bases from ref-file ‘s element
overlapped by elements in map-file . |
1 | 2 | 3 |
--bases-uniq-f |
Reports the fraction of distinct bases from ref-file ‘s element
elements in map-file . |
1 | 2 | 3 |
--bp-ovr <int> |
Require <int> bases of overlap between elements of input files. |
1 | 2 | 3 |
--chrom <chromosome> |
Process data for given <chromosome> only. |
1 | 2 | 3 |
--count |
Reports the number of overlapping elements in map-file . |
1 | 2 | 3 |
--cv |
Reports the Coefficient of Variation: the result of --stdev
divided by the result of --mean . |
1 | 2 | 5 |
--ec |
Error-check all input files (slower). | 1 | 2 | 3 |
--echo |
Echo each line from ref-file . |
1 | 2 | 3 |
--echo-map |
Reports the overlapping elements found in map-file . |
1 | 2 | 3 |
--echo-map-id |
Reports the IDs (4th column) from overlapping map-file elements. |
1 | 2 | 4 |
--echo-map-id-uniq |
List unique IDs from overlapping map-file elements. |
1 | 2 | 4 |
--echo-map-range |
Reports the genomic range of overlapping elements from map-file . |
1 | 2 | 3 |
--echo-map-score |
Reports the scores (5th column) from overlapping map-file
elements. |
1 | 2 | 5 |
--echo-map-size |
Calculates difference between start and stop coordinates (or size) of each mapped element. | 1 | 2 | 3 |
--echo-overlap-size |
Calculates size of overlap between each mapped element and its reference element. | 1 | 2 | 3 |
--echo-ref-name |
Reports the first 3 fields of ref-file element in
chrom:start-end format. |
1 | 2 | 3 |
--echo-ref-size |
Reports the length of the ref-file element. |
1 | 2 | 3 |
--faster |
(Advanced) Strong input assumptions are made. Review documents
before use. Compatible with --bp-ovr and --range overlap
options only. |
1 | 2 | 5 |
--fraction-ref <val> |
The fraction of the element’s size from ref-file that must
overlap the element in map-file . Expects 0 < val <= 1 . |
1 | 2 | 5 |
--fraction-map <val> |
The fraction of the element’s size from map-file that must
overlap the element in ref-file . Expects 0 < val <= 1 . |
1 | 2 | 5 |
--fraction-both <val> |
Both --fraction-ref <val> and --fraction-map <val> must be
true to qualify as overlapping. Expects 0 < val <= 1 . |
1 | 2 | 5 |
--fraction-either <val> |
Both --fraction-ref <val> and --fraction-map <val> must be
true to qualify as overlapping. Expects 0 < val <= 1 . |
1 | 2 | 5 |
--exact |
Shorthand for --fraction-both 1 . First three fields from
map-file must be identical to ref-file element. |
1 | 2 | 5 |
--indicator |
Reports the presence of one or more overlapping elements in
map-file as a binary value (0 or 1 ). |
1 | 2 | 3 |
--kth <val> |
Reports the value at the k th fraction. A generalized median-like
calculation, where --kth 0.5 is the median. (0 < val <= 1 ) |
1 | 2 | 5 |
--mad <mult=1> |
Reports the ‘median absolute deviation’ of overlapping elements in
map-file , multiplied by <mult> . |
1 | 2 | 5 |
--max |
Reports the highest score from overlapping elements in map-file . |
1 | 2 | 5 |
--max-element |
The lexicographically “smallest” element with the highest score from
overlapping elements in map-file . If no overlapping element
exists, NAN is reported (unless --skip-unmapped is used). |
1 | 2 | 5 |
--max-element-rand |
A randomly-chosed element with the highest score from overlapping
elements in map-file . If no overlapping element exists, NAN
is reported (unless --skip-unmapped is used). |
1 | 2 | 5 |
--mean |
Reports the average score from overlapping elements in map-file . |
1 | 2 | 5 |
--median |
Reports the median score from overlapping elements in map-file . |
1 | 2 | 5 |
--min |
Reports the lowest score from overlapping elements in map-file . |
1 | 2 | 5 |
--min-element |
The lexicographically “smallest” element with the lowest score from
overlapping elements in map-file . If no overlapping element
exists, NAN is reported (unless --skip-unmapped is used). |
1 | 2 | 5 |
--min-element-rand |
A randomly-chosed element with the lowest score from overlapping
elements in map-file . If no overlapping element exists, NAN
is reported (unless --skip-unmapped is used). |
1 | 2 | 5 |
--skip-unmapped |
Omits printing reference elements which do not associate with any mapped elements. | 1 | 2 | 3 |
--stdev |
Reports the square root of the result of --variance . |
1 | 2 | 5 |
--sum |
Reports the accumulated value from scores of overlapping elements in
map-file . |
1 | 2 | 5 |
--sweep-all |
Reads through entire map-file dataset to avoid early termination
that may cause SIGPIPE or other I/O errors. |
1 | 2 | 3 |
--tmean <low> <hi> |
Reports the mean score from overlapping elements in map-file ,
after ignoring the bottom <low> and top <hi> fractions of
those scores. (0 <= low <= 1 , 0 <= hi <= 1 , low + hi <=
1 ). |
1 | 2 | 5 |
--variance |
Reports the variance of scores from overlapping elements in
map-file . |
1 | 2 | 5 |
7.1.3. bedops
¶
- Offers set and multiset operations for files in BED format.
- BEDOPS bedops documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
--chrom <chromosome> |
Process data for given chromosome only. |
1 | No imposed limit | 3 |
--complement , -c |
Reports the intervening intervals between the input coordinate segments. | 1 | No imposed limit | 3 |
--chop , -w |
Breaks up merged regions into fixed-size chunks, optionally anchored on start coordinates a fixed distance apart. | 1 | No imposed limit | 3 |
--difference , -d |
Reports the intervals found in the first file that are not present in any other input file. | 2 | No imposed limit | 3 |
--ec |
Error-check input files (slower). | 1 | No imposed limit | 3 |
--element-of , -e |
Reports rows from the first file that overlap, by a specified percentage or number of base pairs, the merged segments from all other input files. | 2 | No imposed limit | 3 |
--header |
Accept headers (VCF, GFF, SAM, BED, WIG) in any input file. | 1 | No imposed limit | 3 |
--intersect , -i |
Reports the intervals common to all input files. | 2 | No imposed limit | 3 |
--merge , -m |
Reports intervals from all input files, after merging overlapping and adjoining segments. | 1 | No imposed limit | 3 |
--not-element-of , -n |
Reports exactly everything that --element-of does not, given the
same overlap criterion. |
2 | No imposed limit | 3 |
--partition , -p |
Reports all disjoint intervals from all input files. Overlapping segments are cut up into pieces at all segment boundaries. | 1 | No imposed limit | 3 |
--range L:R |
Add L bases to all start coordinates and R base to end
coordinates. Either value may be positive or negative to grow or
shrink regions, respectively. With the -e or -n operation,
the first (reference) file is not padded, unlike all other files. |
1 | No imposed limit | 3 |
--range S |
Pad input file(s) coordinates symmetrically by S bases. This is
shorthand for --range -S:S . |
1 | No imposed limit | 3 |
--symmdiff , -s |
Reports the intervals found in exactly one input file. | 2 | No imposed limit | 3 |
--everything , -u |
Reports the intervals from all input files in sorted order. Duplicates are retained in the output. | 1 | No imposed limit | 3 |
7.1.4. closest-features
¶
- For every element in
input-file
, find those elements inquery-file
nearest to its left and right edges. - BEDOPS closest-features documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
(no option) | NA | 2 | 2 | 3 |
--chrom <chromosome> |
Process data for given <chromosome> only. |
2 | 2 | 3 |
--dist |
Output includes the signed distances between the input-file
element and the closest elements in query-file . |
2 | 2 | 3 |
--ec |
Error-check all input files (slower). | 2 | 2 | 3 |
--no-overlaps |
Do not consider elements that overlap. Overlapping elements, otherwise, have highest precedence. | 2 | 2 | 3 |
--no-ref |
Do not echo elements from input-file . |
2 | 2 | 3 |
--closest |
Choose the nearest element from query-file only. Ties go to the
leftmost closest element. |
2 | 2 | 3 |
7.2. Sorting¶
7.2.1. sort-bed
¶
- Sorts input BED file(s) into the order required by other utilities. Loads all input data into memory.
- BEDOPS sort-bed documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
(no option) | NA | 1 | 1000 | 3 |
--max-mem <val> |
<val> specifies the maximum memory usage for the sort-bed
process, which is useful for very large BED inputs. For example,
--max-mem may be 8G , 8000M , or 8000000000 to specify
8 GB of memory. |
1 | 1000 | 3 |
--unique |
Report unique elements (those which only occur once) in output. | 1 | 1000 | 3 |
--duplicates |
Report duplicate elements (those which occur 2+ times) in output. | 1 | 1000 | 3 |
7.3. Compression and extraction¶
7.3.1. starch
¶
- Lossless compression of any BED file.
- BEDOPS starch documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
(no option) | NA | 1 | 1 | 3 |
--bzip2 or --gzip |
The internal compression method. The default --bzip2 method
favors storage efficiency, while --gzip favors compression and
extraction time performance. |
1 | 1 | 3 |
--note="foo bar..." |
Append note to output archive metadata (optional). | 1 | 1 | 3 |
--report-progress=N |
Write progress to standard error stream for every N input elements. | 1 | 1 | 3 |
7.3.2. unstarch
¶
- Extraction of a
starch
archive or attributes. - BEDOPS unstarch documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
(no option) | NA | 1 | 1 | NA |
--archive-type |
Show archive’s compression type (either bzip2 or gzip ). |
1 | 1 | NA |
--archive-version |
Show archive version (at this time, either 1.x or 2.x). | 1 | 1 | NA |
--archive-timestamp |
Show archive creation timestamp (ISO 8601 format). | 1 | 1 | NA |
--bases <chromosome> |
Show total, non-unique base counts for optional <chromosome>
(omitting <chromosome> shows total non-unique base count). |
1 | 1 | NA |
--bases-uniq <chromosome> |
Show unique base counts for optional <chromosome> (omitting
<chromosome> shows total, unique base count). |
1 | 1 | NA |
<chromosome> |
Decompress information for a single <chromosome> only. |
1 | 1 | NA |
--duplicatesExist or
--duplicatesExistAsString
with <chromosome> |
Report if optional <chromosome> or chromosomes contain duplicate
elements as 0/1 numbers or false/true strings |
1 | 1 | NA |
--elements <chromosome> |
Show element count for optional <chromosome> (omitting
<chromosome> shows total element count). |
1 | 1 | NA |
--elements-max-string-length |
Show element maximum string length for optional <chromosome>
(omitting <chromosome> shows maximum string length over all
chromosomes). |
1 | 1 | NA |
--is-starch |
Test if the <starch-file> is a valid starch archive, returning 0/1 for a false/true result | 1 | 1 | NA |
--list or --list-json |
Print the metadata for a starch file, either in tabular form or
with JSON formatting. |
1 | 1 | NA |
--list-chr or
--list-chromosomes |
List all chromosomes in starch archive (similar to
bedextract --list-chr ). |
1 | 1 | NA |
--nestedsExist or
--nestedsExistAsString
with <chromosome> |
Report if optional <chromosome> or chromosomes contain nested
elements as 0/1 numbers or false/true strings |
1 | 1 | NA |
--note |
Show descriptive note (if originally added to archive). | 1 | 1 | NA |
--signature with
<chromosome> |
Show SHA-1 signature of specified chromosome (Base64-encoded) or all signatures if chromosome is not specified. | 1 | 1 | NA |
--verify-signature with
<chromosome> |
Compare SHA-1 signature of specified chromosome with signature that is stored in the archive metadata, reporting error is mismatched. | 1 | 1 | NA |
7.3.3. starchcat
¶
- Merge multiple
starch
archive inputs into onestarch
archive output. - BEDOPS starchcat documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
(no option) | NA | 1 | No imposed limit | NA |
--bzip2 or --gzip |
The internal compression method. The default --bzip2 method
favors storage efficiency, while --gzip favors compression and
extraction time performance. |
1 | No imposed limit | NA |
--note="foo bar..." |
Append note to output archive metadata (optional). | 1 | No imposed limit | NA |
--report-progress=N |
Write progress to standard error stream for every N input elements. | 1 | No imposed limit | NA |
7.3.4. starchstrip
¶
- Extract or filter a
starch
archive by one or more specified chromosome names. - BEDOPS starchstrip documentation.
option | description | min. file inputs | max. file inputs | min. BED columns |
---|---|---|---|---|
(no option) | NA | 1 | No imposed limit | NA |
--include or --exclude
with <chromosomes> |
Writes output with inclusion or exclusion of specified chromosome name records (comma-delimited string). | NA | No imposed limit | NA |