Bioinformatics - HPC Portal
Institute for Massively Parallel Applications and Computing Technologies (IMPACT)
Pyramid System Overview
The High Performance Computing Laboratory (HPCL) operates a Sun Opteron
Cluster (PYRAMID) running on the ROCKS cluster distribution. Pyramid contains 1048 CPUs based on AMD quad-core technology and aggregate memory capacity of 1TB. The aggregate disk capacity of Pyramid is more than 64TB. The system is interconnected with QDR Infiniband network which is one of the fastest available networks, with a bandwidth of 40Gbit/s and latency of ≈1μs. The main
purpose of the cluster is to support the computing needs of Institute of Massively Parallel Applications and Computing Technologies (IMPACT) faculty, staff and students. The cluster is also made available to the CNMC-GWU Clinical and Translational Science Institute (CNMC-GWU CTSI) faculty and researchers.
More details about Pyramid is available on Pyramid Wiki
Access to Pyramid
In order to access an HPCL machine, you need to create an HPCL account.
Steps to create an account is available at: HPCL Account Creation
After creating an HPCL account, you can follow the steps available on Pyramid Wiki to login.
Available Software Resources/Packages
HMMER
- From Washington University at St. Louis
Description: HMMER is
used for
searching sequence databases for homologs of protein sequences, and for
making protein sequence alignments. It implements methods using
probabilistic models called “profile hidden Markov models” (profile
HMMs).
Compared to BLAST, FASTA, and other sequence alignment and database
search tools based on older scoring methodology, HMMER aims to be
significantly more accurate and more able to detect remote homologs
because of the strength of its underlying mathematical models. In the
past, this strength came at significant computational expense, but in
the new HMMER3 project, HMMER is now essentially as fast as
BLAST.
More information: http://hmmer.janelia.org
|

|

|
NCBI
BLAST - From National Center for Biotechnology Information
Description: The Basic
Local
Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to
sequence databases and calculates the statistical significance of
matches. BLAST can be used to infer functional and evolutionary
relationships between sequences as well as help identify members of
gene families.
More information: http://www.ncbi.nlm.nih.gov/BLAST
|
MpiBLAST
- From Los Alamos National Laboratory
Description: MpiBLAST is
a
freely available, open-source, parallel
implementation of NCBI BLAST. mpiBLAST takes advantage of distributed
computational resources, i.e., a cluster, through explicit MPI
communication and thereby utilizes all available resources unlike
standard NCBI BLAST which can only take advantage of shared-memory
multi-processors (SMPs).
The primary advantage to using mpiBLAST versus traditional NCBI BLAST
is performance. mpiBLAST can increase performance by several orders of
magnitude while still retaining identical results as output from NCBI
BLAST.
More Information: http://www.mpiblast.org
|

|

|
Biopython
Description: The Biopython Project is an international association of
developers of freely available Python (http://www.python.org) tools for
computational molecular biology. The web site http://www.biopython.org
provides an online resource for modules, scripts, and web links for
developers of Python-based software for life science research.
More information: http://www.biopython.org
|
ClustalW
- From the European BioInformatics Institute
Description: ClustalW2 is a general purpose multiple sequence alignment
program for DNA or proteins. It produces biologically meaningful
multiple sequence alignments of divergent sequences. It calculates the
best match for the selected sequences, and lines them up so that the
identities, similarities and differences can be seen. Evolutionary
relationships can be seen via viewing Cladograms or Phylograms.
More information: http://www.ebi.ac.uk/clustalw/
|

|

|
MrBayes - From School of Computational
Science at the Florida State University
Description: MrBayes is a program for the Bayesian estimation of
phylogeny. Bayesian inference of phylogeny is based upon a quantity
called the posterior probability distribution of trees, which is the
probability of a tree conditioned on the observations. The conditioning
is accomplished using Bayes's theorem. The posterior probability
distribution of trees is impossible to calculate analytically; instead,
MrBayes uses a simulation technique called Markov chain Monte Carlo (or
MCMC) to approximate the posterior probabilities of trees.
More Information: http://mrbayes.csit.fsu.edu/
|
T_Coffee
- From Information Genomique et Structurale at Centre National de la
Recherche Scientifique
Description: T-Coffee is a multiple sequence alignment package. You can
use T-Coffee to align sequences or to combine the output of your
favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into
one unique alignment (M-Coffee).
T-Coffee can align Protein, DNA and RNA sequences. It is also able to
combine sequence information with protein structural information
(3D-Coffee/Expresso), profile information (PSI-Coffee) or RNA secondary
structures
More Information: http://www.tcoffee.org/homepage.html
|

|
Emboss
- From European Molecular Biology Institute
Description: EMBOSS is "The European Molecular Biology Open Software
Suite". EMBOSS is a free Open Source software analysis package
specially developed for the needs of the molecular biology (e.g.
EMBnet) user community. The software automatically copes with data in a
variety of formats and even allows transparent retrieval of sequence
data from the web. Also, as extensive libraries are provided with the
package, it is a platform to allow other scientists to develop and
release software in true open source spirit. EMBOSS also integrates a
range of currently available packages and tools for sequence analysis
into a seamless whole. EMBOSS breaks the historical trend towards
commercial software packages.
More Information: http://emboss.sourceforge.net/ |
Phylip
- From the Dept. of Biology at the University of Washington
Description: PHYLIP (the PHYLogeny Inference Package) is a package of
programs for inferring phylogenies (evolutionary trees). It is
available free over the Internet, and written to work on as many
different kinds of computer systems as possible. The source code is
distributed (in C), and executables are also distributed.
PHYLIP is probably the most widely-distributed phylogeny package. It is
the third most frequently cited phylogeny package, after PAUP* and
MrBayes, and ahead of MEGA. PHYLIP is also the oldest
widely-distributed package. It has been in distribution since October,
1980, and has over 28,000 registered users. It is regularly updated.
More Information: http://evolution.genetics.washington.edu/phylip.html
|

|
FASTA
- From the University of Virginia
Description: The FASTA programs find regions of local or global (new)
similarity between Protein or DNA sequences, either by searching
Protein or DNA databases, or by identifying local duplications within a
sequence. Other programs provide information on the statistical
significance of an alignment. Like BLAST, FASTA can be used to infer
functional and evolutionary relationships between sequences as well as
help identify members of gene families.
More Information: http://fasta.bioch.virginia.edu/
|
Glimmer
- From Center for Bioinformatics and Computational Biology at the
University of Maryland
Description: Glimmer is a
system for finding genes in microbial DNA, especially the genomes of
bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated
Markov ModelER) uses interpolated Markov models (IMMs) to identify the
coding regions and distinguish them from noncoding DNA. The IMM
approach uses a combination of Markov models from 1st through
8th-order, weighting each model according to its predictive power.
Glimmer uses 3-periodic nonhomogenous Markov models in its IMMs.
More Information: http://www.cbcb.umd.edu/software/glimmer/
|

|

|
TIGR Assembler - From the J. Craig Venter
Institute
Description: TIGR
Assembler is
a new tool for assembling large shotgun sequencing projects. Enabled
the first published whole-genome assembly of a free-living organism in
1995.
More Information: http://www.jcvi.org/cms/research/software/
|
POY - Phylogenetic Analysis of DNA and
other data using dynamic homology
Description: POY is a
phylogenetic analysis program that supports multiple kinds of data
(e.g. morphology, nucleotides, genes and gene regions, chromosomes,
whole genomes, etc). POY is particular in that it can perform true
alignment and phylogeny inference (i.e. input sequences need not to be
prealigned). Insertions, deletions, and rearrangements, can then be
included in the overall tree score (under Maximum Parsimony), or in the
model (under Maximum Likelihood). A variety of heuristic algorithms
have been developed for this purpose and are implemented in POY.
More Information: http://research.amnh.org/scicomp/scripts/download.php
|

|

|
BioPerl
Description: BioPerl is a toolkit of perl modules useful in building
bioinformatics solutions in Perl. It is built in an object-oriented
manner so that many modules depend on each other to achieve a task. The
collection of modules in the bioperl-live repository consist of the
core of the functionality of bioperl. Additionally auxiliary modules
for creating graphical interfaces (bioperl-gui), persistent storage in
RDMBS (bioperl-db), running and parsing the results from hundreds of
bioinformatics applications (Run package), software to automate
bioinformatic analyses (bioperl-pipeline) are all available as Git
modules in our repository.
Packages installed: perl-bioperl, perl-bioperl-run, perl-bioperl-gui,
perl-bioperl-db
More Information: http://www.bioperl.org/wiki/Main_Page
|
The R Project for Statistical Computing
Description: R is a language and
environment for statistical computing and graphics. It provides a wide
variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, ...)
and graphical techniques, and is highly extensible. R is similar to
the S language and environment which was developed at Bell Laboratories
by John Chambers and colleagues. The S language is often the vehicle of
choice for research in statistical methodology, and R provides an Open
Source route to participation in that activity.
To be able to run R on the Pyramid the Rmpi package has to be used.
The Rmpi is an interface (wrapper) to MPI (Message-Passing Interface).
The main goal of Rmpi is to port low level MPI functions into R so
that users do not have to know C or Fortran.
More Information:
R Project homepage
Rmpi homepage
R and Rmpi on Pyramid: User Guide
Rmpi Tutorial (ACADIA University): Tutorial homepage
|

|

|
NAMD - Scalable Molecular Dynamics
Description: NAMD, recipient of a 2002 Gordon Bell Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms and tens of processors on commodity clusters using gigabit ethernet. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.
More Information: NAMD homepage
NAMD on Pyramid: User Guide
|
|