MpiBLAST
From CLAB
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community.
[edit] Availability
- sapling cluster nodes v1.5.0-pio
[edit] Usage
This is walk-through of how you'd run mpiBLAST on the sapling cluster (sapling.gds.unomaha.edu).
Create a temporary directory, and cd into it. Create a .pbs file to test that you are propery configured to run programs in the cluster (test.pbs):
#!/bin/bash cd $PBS_O_WORKDIR echo "Started `date`" cat $PBS_NODEFILE NPROCS=`wc -l < $PBS_NODEFILE` echo $NPROCS sleep 10 echo "Ended `date`"
Tell qsub to run test.pbs (on 22 nodes, for example):
qsub -q sapling -l nodes=22 test.pbs
You monitor qsub progress using qstat:
qstat qstat -n
Verify that you have finished normally with good output and no errors.
Create a directory in your home directory for your mpiBLAST run. In the examples below, I'm using /home/jhannah/RT780/ncbi-blastdb. Use only alphanumerics in your directory name, no spaces or special characters.
Create an ~/.ncbirc file. Here's an example:
[NCBI] Data=/home/apps/ncbi-data [BLAST] BLASTDB=/home/jhannah/RT780/ncbi-blastdb BLASTMAT=/home/apps/ncbi-data [mpiBLAST] Shared=/home/jhannah/RT780/ncbi-blastdb Local=/home/jhannah/RT780/ncbi-blastdb
Note that /home/apps/ncbi-data is full of default BLAST configuration files. Feel free to create your own custom configuration by creating a copy in your home directory, modifying it as you wish, and changing your .ncbirc file to point to that directory instead.
(See RT 780 for more details.)
Create a .pbs file to run mpiformatdb on one node. For example, here we're going to split the database fasta file into 22 parts because we expect we are going to run mpiBLAST on 22 nodes. Here's what mpiformatdb.pbs looks like:
#!/bin/bash cd $PBS_O_WORKDIR echo "Started `date`" cat $PBS_NODEFILE mpiformatdb -N 22 -p F -o T -i gbpln.fasta echo "Ended `date`"
Now tell qsub to run mpiformatdb.pbs on 1 node:
qsub -q sapling -l nodes=1 mpiformatdb.pbs
Now create a .pbs file which will run mpiBLAST. Here's an example (gbpln.pbs):
#!/bin/bash cd $PBS_O_WORKDIR echo "Started `date`" cat $PBS_NODEFILE NPROCS=`wc -l < $PBS_NODEFILE` mpirun -np $NPROCS -machinefile $PBS_NODEFILE /usr/local/mpiblast/bin/mpiblast \ -p blastn -d gbpln.fasta -i query.fasta -o gbpln.blast echo "Ended `date`"
Tell qsub to run gbpln.pbs (on 22 nodes, for example):
qsub -q sapling -l nodes=22 gbpln.pbs
[edit] Installation
1. Download and extract the latest mpiBLAST release from their site. 2. cd mpiBLAST-x.y.z-pio 3. ./configure --prefix=/usr/local/mpiblast/ --without-X11 4. make 5. make install

