Set operations on BED files are an important part of bioinformatics analysis. Because the output of different set operations can vary widely, this article records the basic behavior of common operations.

The test environment in this article is bedtools(v2.31.1). Process substitution is used in the examples so that no temporary test files need to be created. The format is: intersectBed -a <(echo -e "chr1\t100\t200") -b <(echo -e "chr1\t150\t300")

1. Default operation: output intervals in A that overlap B.

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5")
# Result
chr1	100	130	gene1
chr1	150	200	gene1

2. Keep the complete original A record, even when only part of it overlaps.
Parameter: -wa (write original A entry)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -wa
# Result
chr1	100	200	gene1
chr1	100	200	gene1

3. Output intervals in A that overlap B and append the complete overlapping B record.
Parameter: -wb

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -wb
# Result
chr1	100	130	gene1	chr1	50	130	gene3
chr1	150	200	gene1	chr1	150	300	gene4

4. Keep the complete original A record and the complete overlapping B record.
Parameters: -wa -wb

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -wa -wb
# Result
chr1	100	200	gene1	chr1	50	130	gene3
chr1	100	200	gene1	chr1	150	300	gene4

5. Keep the complete original A record and the complete overlapping B record, and output the overlap length.
Parameter: -wo (write overlap)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -wo
# Result
chr1	100	200	gene1	chr1	50	130	gene3	30
chr1	100	200	gene1	chr1	150	300	gene4	50

6. Keep all complete original A records and complete overlapping B records. For A records without overlap in B, fill missing fields with “-1” or “.”, and output the overlap length at the end.
Parameter: -wao (Write All and Overlap)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -wao
# Result
chr1	100	200	gene1	chr1	50	130	gene3	30
chr1	100	200	gene1	chr1	150	300	gene4	50
chr1	1000	1200	gene2	.	-1	-1	.	0

7. Keep the complete original A record. The effect is the same as -wa, but duplicate records are removed.
Parameter: -u (write original A entry once if any overlap)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -u
# Result: same as -wa, but duplicate rows are removed
chr1	100	200	gene1

8. Output only intervals in A that do not overlap B.
Parameter: -v (invert)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -v
# Result
chr1	1000	1200	gene2

9. Filter by overlap fraction.
Parameters: -f -F (minimum overlap fraction)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -f 0.5
# Result: only intervals where the overlap covers >=50% of A are output
chr1	150	200	gene1
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -F 0.35
# Result: only intervals where the overlap covers >=35% of B are output
chr1	100	130	gene1

10. Count how many times each interval in A overlaps B.
Parameter: -c (count overlaps)

Bash
# Command
intersectBed -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -c
# Result: each interval in A is output with the overlap count
chr1	100	200	gene1	2
chr1	1000	1200	gene2	0

bedtools merge is used to merge overlapping or adjacent genomic intervals into continuous, non-overlapping intervals.

1. Merge overlapping and adjacent intervals.

Bash
# Command
bedtools merge -i <(echo -e "chr1\t50\t130\tgene3\nchr1\t100\t200\tgene1\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5\nchr1\t1000\t1200\tgene2")
# Result
chr1	50	300
chr1	500	800
chr1	1000	1200

2. Set the maximum allowed gap length for merging.
Parameter: -d

Bash
# Command
bedtools merge -i <(echo -e "chr1\t50\t130\tgene3\nchr1\t100\t200\tgene1\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5\nchr1\t1000\t1200\tgene2") -d 200
# Result: intervals with gaps <=200 bp are merged into one interval
chr1	50	1200

3. Preserve column information during merging.
Parameters: -c -o

Bash
# Command
bedtools merge -i <(echo -e "chr1\t50\t130\tgene3\nchr1\t100\t200\tgene1\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5\nchr1\t1000\t1200\tgene2") -c 4 -o distinct
# The name column is preserved
chr1	50	300	gene1,gene3,gene4
chr1	500	800	gene5
chr1	1000	1200	gene2

-c and -o must be used together. The common -o operations are listed below:

Usage:

4. Strand-specific merge.
Parameter: -s

Bash
# Command
bedtools merge -i <(echo -e "chr1\t50\t130\tgene3\t.\t+\nchr1\t100\t200\tgene1\t.\t+\nchr1\t150\t300\tgene4\t.\t-\nchr1\t500\t800\tgene5\t.\t+\nchr1\t1000\t1200\tgene2\t.\t+") -s -c 4,6 -o distinct
# Merge intervals by strand
chr1	50	200	gene1,gene3	+
chr1	150	300	gene4	-
chr1	500	800	gene5	+
chr1	1000	1200	gene2	+

This tool can analyze overlaps across multiple BED files, identify genomic regions shared by different files, and report the coverage of each region across different files.

1. Basic usage.

Bash
# Command
bedtools multiinter -i <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5") -names A B
# Result
chr1	50	100	1	B	0	1
chr1	100	130	2	A,B	1	1
chr1	130	150	1	A	1	0
chr1	150	200	2	A,B	1	1
chr1	200	300	1	B	0	1
chr1	500	800	1	B	0	1
chr1	1000	1200	1	A	1	0

1. Default operation: remove regions from A that overlap B.

Bash
# Command
bedtools subtract -a <(echo -e "chr1\t100\t200\tgene1\nchr1\t1000\t1200\tgene2") -b <(echo -e "chr1\t50\t130\tgene3\nchr1\t150\t300\tgene4\nchr1\t500\t800\tgene5")
# Result
chr1	130	150	gene1
chr1	1000	1200	gene2