MpiBLAST

From CLAB

Jump to: navigation, search

mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community.

[edit] Availability

  • sapling cluster nodes v1.5.0-pio

[edit] Usage

This is walk-through of how you'd run mpiBLAST on the sapling cluster (sapling.gds.unomaha.edu).

Create a temporary directory, and cd into it. Create a .pbs file to test that you are propery configured to run programs in the cluster (test.pbs):

#!/bin/bash
cd $PBS_O_WORKDIR
echo "Started `date`"
cat $PBS_NODEFILE
NPROCS=`wc -l < $PBS_NODEFILE`
echo $NPROCS
sleep 10
echo "Ended `date`"

Tell qsub to run test.pbs (on 22 nodes, for example):

qsub -q sapling -l nodes=22 test.pbs

You monitor qsub progress using qstat:

qstat
qstat -n

Verify that you have finished normally with good output and no errors.

Create a directory in your home directory for your mpiBLAST run. In the examples below, I'm using /home/jhannah/RT780/ncbi-blastdb. Use only alphanumerics in your directory name, no spaces or special characters.

Create an ~/.ncbirc file. Here's an example:

[NCBI]
Data=/home/apps/ncbi-data

[BLAST]
BLASTDB=/home/jhannah/RT780/ncbi-blastdb
BLASTMAT=/home/apps/ncbi-data

[mpiBLAST]
Shared=/home/jhannah/RT780/ncbi-blastdb
Local=/home/jhannah/RT780/ncbi-blastdb

Note that /home/apps/ncbi-data is full of default BLAST configuration files. Feel free to create your own custom configuration by creating a copy in your home directory, modifying it as you wish, and changing your .ncbirc file to point to that directory instead.

(See RT 780 for more details.)

Create a .pbs file to run mpiformatdb on one node. For example, here we're going to split the database fasta file into 22 parts because we expect we are going to run mpiBLAST on 22 nodes. Here's what mpiformatdb.pbs looks like:

#!/bin/bash
cd $PBS_O_WORKDIR
echo "Started `date`"
cat $PBS_NODEFILE
mpiformatdb -N 22 -p F -o T -i gbpln.fasta
echo "Ended `date`"

Now tell qsub to run mpiformatdb.pbs on 1 node:

qsub -q sapling -l nodes=1 mpiformatdb.pbs

Now create a .pbs file which will run mpiBLAST. Here's an example (gbpln.pbs):

#!/bin/bash
cd $PBS_O_WORKDIR
echo "Started `date`"
cat $PBS_NODEFILE
NPROCS=`wc -l < $PBS_NODEFILE`
mpirun -np $NPROCS -machinefile $PBS_NODEFILE /usr/local/mpiblast/bin/mpiblast \
   -p blastn -d gbpln.fasta -i query.fasta -o gbpln.blast
echo "Ended `date`"

Tell qsub to run gbpln.pbs (on 22 nodes, for example):

qsub -q sapling -l nodes=22 gbpln.pbs

[edit] Installation

1. Download and extract the latest mpiBLAST release from their site.
2. cd mpiBLAST-x.y.z-pio
3. ./configure --prefix=/usr/local/mpiblast/ --without-X11
4. make
5. make install