Bowtie
From CLAB
Bowtie is an ultrafast memory-efficient short read aligner.
[edit] Availability
- klab.ist.unomaha.edu 0.11.3
You can build your own Bowtie indexes from whatever sequences you like. We have built some indexes already which you are free to use:
/clab_bdb/bowtie
[edit] Getting Started
The Getting Started document talks about how to use Bowtie. For example, The Bowtie source and binary packages come with a pre-built index of the E. coli genome, and a set of 1,000 35-bp reads simulated from that genome. It looks like this when you run it:
$ bowtie e_coli reads/e_coli_1000.fq r0 - gi|110640213|ref|NC_008253.1| 3658049 ATGCTGGAATGGCGATAGTTGGGTGGGTATCGTTC 45567778999:9;;<===>?@@@@AAAABCCCDE 0 32:T>G,34:G>A r1 - gi|110640213|ref|NC_008253.1| 1902085 CGGATGATTTTTATCCCATGAGACATCCAGTTCGG 45567778999:9;;<===>?@@@@AAAABCCCDE 0 r2 - gi|110640213|ref|NC_008253.1| 3989609 CATAAAGCAACAGTGTTATACTATAACAATTTTGA 45567778999:9;;<===>?@@@@AAAABCCCDE 0 ... r995 + gi|110640213|ref|NC_008253.1| 2879570 TGGCACCTGCCGTTTGCTGTGCGACGAATCAACGC EDCCCBAAAA@@@@?>===<;;9:99987776554 0 33:A>G r996 - gi|110640213|ref|NC_008253.1| 4769855 ATCCACATCAGGNCGAAGTGCCACAGTAACGCACC 45567778999:9;;<===>?@@@@AAAABCCCDE 0 22:G>N r997 + gi|110640213|ref|NC_008253.1| 2824573 AACCAACACGCCAAGCATCGCTTCACGGCTGACTC EDCCCBAAAA@@@@?>===<;;9:99987776554 0 30:C>G,31:G>A,33:G>T # reads processed: 1000 # reads with at least one reported alignment: 699 (69.90%) # reads that failed to align: 301 (30.10%) Reported 699 alignments to 1 output stream(s)
As another example, you could create a .fasta query file like this:
>id1 AAGGCTGTAACCATAGGA >id2 TAACTGACCAGGCCTTAC >id3 TCAGGCATCCAGTGTCAC >id4 TACTGCTGAATTGCTGCT >id5 AAGGCAGAGAATGATGTT >id6 TGGTTCCAAGTTTACTTC >id7 TGGATGGTTTAGCTTAGC >id8 GGCCAGGCACCGTGATGT >id9 AAATTCAAATGTTGGAAG
Then run bowtie against our local mouse mm9 index like so:
$ bowtie -f /clab_bdb/bowtie/mouse_mm9 query.fasta id1 + chr1 3017715 AAGGCTGTAACCATAGGA IIIIIIIIIIIIIIIIII 0 id2 + chr1 3051429 TAACTGACCAGGCCTTAC IIIIIIIIIIIIIIIIII 0 id3 + chr1 3051746 TCAGGCATCCAGTGTCAC IIIIIIIIIIIIIIIIII 0 id4 + chr1 3064110 TACTGCTGAATTGCTGCT IIIIIIIIIIIIIIIIII 0 id5 + chr1 3111648 AAGGCAGAGAATGATGTT IIIIIIIIIIIIIIIIII 0 id6 + chr1 3122521 TGGTTCCAAGTTTACTTC IIIIIIIIIIIIIIIIII 0 id7 + chr1 3187784 TGGATGGTTTAGCTTAGC IIIIIIIIIIIIIIIIII 0 id8 + chr1 3190325 GGCCAGGCACCGTGATGT IIIIIIIIIIIIIIIIII 0 id9 + chr1 3195403 AAATTCAAATGTTGGAAG IIIIIIIIIIIIIIIIII 0 # reads processed: 9 # reads with at least one reported alignment: 9 (100.00%) # reads that failed to align: 0 (0.00%) Reported 9 alignments to 1 output stream(s)
Bowtie is very memory hungry against large indexes. Above it spent the first few minutes of runtime loading most of the 2.9GB mouse_mm9 index into memory. It then runs very quickly and exits.
So you may want to reserve genome searches to server machines with large amounts of memory. Perhaps nodes of the sapling cluster?

