Bowtie

From CLAB

Jump to: navigation, search

Bowtie is an ultrafast memory-efficient short read aligner.

[edit] Availability

  • klab.ist.unomaha.edu 0.11.3

You can build your own Bowtie indexes from whatever sequences you like. We have built some indexes already which you are free to use:

 /clab_bdb/bowtie

[edit] Getting Started

The Getting Started document talks about how to use Bowtie. For example, The Bowtie source and binary packages come with a pre-built index of the E. coli genome, and a set of 1,000 35-bp reads simulated from that genome. It looks like this when you run it:

$ bowtie e_coli reads/e_coli_1000.fq 
r0	-	gi|110640213|ref|NC_008253.1|	3658049	ATGCTGGAATGGCGATAGTTGGGTGGGTATCGTTC	45567778999:9;;<===>?@@@@AAAABCCCDE	0	32:T>G,34:G>A
r1	-	gi|110640213|ref|NC_008253.1|	1902085	CGGATGATTTTTATCCCATGAGACATCCAGTTCGG	45567778999:9;;<===>?@@@@AAAABCCCDE	0	
r2	-	gi|110640213|ref|NC_008253.1|	3989609	CATAAAGCAACAGTGTTATACTATAACAATTTTGA	45567778999:9;;<===>?@@@@AAAABCCCDE	0	
...
r995	+	gi|110640213|ref|NC_008253.1|	2879570	TGGCACCTGCCGTTTGCTGTGCGACGAATCAACGC	EDCCCBAAAA@@@@?>===<;;9:99987776554	0	33:A>G
r996	-	gi|110640213|ref|NC_008253.1|	4769855	ATCCACATCAGGNCGAAGTGCCACAGTAACGCACC	45567778999:9;;<===>?@@@@AAAABCCCDE	0	22:G>N
r997	+	gi|110640213|ref|NC_008253.1|	2824573	AACCAACACGCCAAGCATCGCTTCACGGCTGACTC	EDCCCBAAAA@@@@?>===<;;9:99987776554	0	30:C>G,31:G>A,33:G>T
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
Reported 699 alignments to 1 output stream(s)

As another example, you could create a .fasta query file like this:

>id1
AAGGCTGTAACCATAGGA
>id2
TAACTGACCAGGCCTTAC
>id3
TCAGGCATCCAGTGTCAC
>id4
TACTGCTGAATTGCTGCT
>id5
AAGGCAGAGAATGATGTT
>id6
TGGTTCCAAGTTTACTTC
>id7
TGGATGGTTTAGCTTAGC
>id8
GGCCAGGCACCGTGATGT
>id9
AAATTCAAATGTTGGAAG

Then run bowtie against our local mouse mm9 index like so:

$ bowtie -f /clab_bdb/bowtie/mouse_mm9 query.fasta
id1	+	chr1	3017715	AAGGCTGTAACCATAGGA	IIIIIIIIIIIIIIIIII	0	
id2	+	chr1	3051429	TAACTGACCAGGCCTTAC	IIIIIIIIIIIIIIIIII	0	
id3	+	chr1	3051746	TCAGGCATCCAGTGTCAC	IIIIIIIIIIIIIIIIII	0	
id4	+	chr1	3064110	TACTGCTGAATTGCTGCT	IIIIIIIIIIIIIIIIII	0	
id5	+	chr1	3111648	AAGGCAGAGAATGATGTT	IIIIIIIIIIIIIIIIII	0	
id6	+	chr1	3122521	TGGTTCCAAGTTTACTTC	IIIIIIIIIIIIIIIIII	0	
id7	+	chr1	3187784	TGGATGGTTTAGCTTAGC	IIIIIIIIIIIIIIIIII	0	
id8	+	chr1	3190325	GGCCAGGCACCGTGATGT	IIIIIIIIIIIIIIIIII	0	
id9	+	chr1	3195403	AAATTCAAATGTTGGAAG	IIIIIIIIIIIIIIIIII	0	
# reads processed: 9
# reads with at least one reported alignment: 9 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 9 alignments to 1 output stream(s)

Bowtie is very memory hungry against large indexes. Above it spent the first few minutes of runtime loading most of the 2.9GB mouse_mm9 index into memory. It then runs very quickly and exits.

So you may want to reserve genome searches to server machines with large amounts of memory. Perhaps nodes of the sapling cluster?