http://bioinfo.uqam.ca/armadillo
armadillo.workflow@gmail.com


Armadillo, a simple approach of automating comparative genomics tasks using workflows.

Etienne Lord, Mickael Leclercq, Alix Boc, Abdoulaye BanirŽ Diallo, Vladimir Makarenkov
Department of Computer Science, Universite du Quebec a Montreal, Montreal, PO Box 8888
Downtown Station, Montreal, QC, H3C 3P8


The Armadillo workflow platform was first developed as an educational tool to help students understand the basic phylogenetic inference process using the PHYLIP package. However, it can now help life science researcher’s rapidly develop data-flux prototype and easily automate the analysis of multiple species phylogeny in an automated fashion, on a workstation. The current version (version 1.0) includes custom interface for software’s used in phylogenetic analysis such as search for orthologous genes, multiple sequence alignment (MSA), evolution model determination and inference of phylogeny using maximum likelihood or maximum parsimony. The workflows and their associated data are saved to a unique local database file, allowing the easy sharing of the developed workflow, annotation, analysis, and pipeline execution results to other re-searchers. Since the file is by itself a database, it can also be interrogated by using the database standard structured-query language (SQL). Additionally, this open-source software allows the scientific community to develop their own modules or to integrate other software’s into this application for their rapidstudy prototyping.


Sample workflow created with the Armadillo workflow platform. Multiple protein sequences are first aligned with Muscle before a phylogenetic tree is inferred with PhyML or with ProtTest (PhyML) with different evolutionnary models. Finally, a tree distance is inferred with the Robinson and Fould methodology.



Tutorials

Tutorials can be found on the website: http://bioinfo.uqam.ca/armadillo


Armadillo make use of the following libraries (and their associated packages):

LibrariesVersionLinks
Processing1.2.1http://processing.org/
BioJava1.7.1http://biojava.org/wiki/Main_Page
ReadSeq2.1.27http://iubio.bio.indiana.edu/soft/molbio/readseq/java/
XerialSQLite JDBC Driver3.6.20.1http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC



Armadillo includes the following executables in its distribution:

SoftwareVersionTypeReferencesWebsite
Alignment information 2.11 Alignment Suchard MA and Redelings BD BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny, Bioinformatics, 22:2047-2048, 2006. http://www.biomath.ucla.edu/msuchard/bali-phy/
BAli-phy 2.11 Alignment Suchard MA and Redelings BD BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny, Bioinformatics, 22:2047-2048, 2006. http://www.biomath.ucla.edu/msuchard/bali-phy/
ClustalW 1.83 Alignment Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. http://www.clustal.org/
ClustalW2 2.1 Alignment Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. http://www.clustal.org/
GBlocks 0.91b Alignment Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577. Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552. http://molevol.cmima.csic.es/castresana/Gblocks.html
Kalign 2.03 Alignment Timo Lassmann and Erik LL Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm BMC Bioinformatics 2005, 6:298 doi:10.1186/1471-2105-6-298) http://www.ebi.ac.uk/Tools/msa/kalign/
Muscle 3.8.31 Alignment Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.Nucleic Acids Res. 32(5):1792-1797. doi:10.1093/nar/gkh340 Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity BMC Bioinformatics, (5) 113. doi:10.1186/1471-2105-5-113 http://www.drive5.com/muscle/
Probcons 1.12 Alignment Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15: 330-40. http://probcons.stanford.edu/
AncestorCC 1.0 Ancestral Reconstruction Ancestors 1.0: a web server for ancestral sequence reconstruction. Diallo AB, Makarenkov V, Blanchette M. Bioinformatics. 2010 Jan 1;26(1):130-1. Epub 2009 Oct 22. Exact and heuristic algorithms for the Indel Maximum Likelihood Problem. Diallo AB, Makarenkov V, Blanchette M. J Comput Biol. 2007 May;14(4):446-61. http://ancestors.bioinfo.uqam.ca/ancestorWeb/
Create Local BlastDB 2.2.25 Blast BLAST+: architecture and applications. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BMC Bioinformatics. 2009 Dec 15;10:421 http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
Local Blast 2.2.25 Blast BLAST+: architecture and applications. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BMC Bioinformatics. 2009 Dec 15;10:421 http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
jModelTest (Nucleic Acid) 0.1 Evolutionary Model Testing Selection of models of DNA evolution with jModelTest. Posada D. Methods Mol Biol. 2009;537:93-112. http://darwin.uvigo.es/software/jmodeltest.html
ProtTest (Amino Acid) 2.4 Evolutionary Model Testing Abascal F, Zardoya R, Posada, D. 2005. ProtTest: Selection of best-fit models of protein evolution. Bioinformatics: 21(9):2104-2105. http://darwin.uvigo.es/software/prottest.html
HGT Detector 3.2 3.2 Horizontal Genes Tranfer Inferring and validating horizontal gene transfer events using bipartition dissimilarity. Boc A, Philippe H, Makarenkov V. Syst Biol. 2010 Mar;59(2):195-211. trex.bioinfo.uqam.ca
LatTrans 2003 Horizontal Genes Tranfer Louigi Addario-Berry, Michael T. Hallett, and Jens Lagergren, Towards Identifying Lateral Gene Transfer Events. Pacific Symposium on Biocomputing 2003: 279-290 http://www.math.mcgill.ca/louigi/
PhyloNet - RiataHGT 2.4 Horizontal Genes Tranfer L. Nakhleh, D. Ruths, and L. S. Wang. RIATA-HGT: A Fast and Accurate Heuristic for Reconstruction Horizontal Gene Proceedings of the 11th International Computing and Combinatorics Conference (COCOON 05). LNCS #3595 (L. Wang, editor), 84-93. 2005. http://bioinfo.cs.rice.edu/phylonet/
Random Tree Generator Random Sequences or Trees Kuhner, M., and J. Felsenstein. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459-468. Guindon, S., and O. Gascuel. 2002. Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Mol. Biol. Evol. 19:534-543. http://trex.labunix.uqam.ca/index.php?action=randomtreegenerator&project=trex
Random Tree Random Sequences or Trees (N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences,, Genome Research, 14:693-699 (2004)) http://bio.math.berkeley.edu/mavid/
Seq-Gen 1.3.2 Random Sequences or Trees Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Rambaut A, Grassly NC. Comput Appl Biosci. 1997 Jun;13(3):235-8. http://tree.bio.ed.ac.uk/software/seqgen/
PaML (baseml) 4.4 Selective Pressure Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: 1586-1591. http://abacus.gene.ucl.ac.uk/software/paml.html
PaML (codeml) 4.4 Selective Pressure Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: 1586-1591. http://abacus.gene.ucl.ac.uk/software/paml.html
PaML (yn00) 4.4 Selective Pressure (A) Nei-Gojobori (1986) method Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426 (B) Yang & Nielsen (2000) method Yang Z, Nielsen R (2000) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43 (C) LWL85, LPB93 & LWLm methods Li W.-H., C.-I. Wu, Luo (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2: 150-174. Li W-H (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-99 Pamilo P, Bianchi NO (1993) Evolution of the Zfx and Zfy genes - rates and interdependence between the genes. Mol. Biol. Evol. 10:271-281 Yang Z (2006) Computational Molecular Evolution. Oxford University Press, Oxford, England http://abacus.gene.ucl.ac.uk/software/paml.html
Q function 2011 Selective Pressure A whole genome study and identification of specific carcinogenic regions of the human papilloma viruses. Diallo AB, Badescu D, Blanchette M, Makarenkov V. J Comput Biol. 2009 Oct;16(10):1461-73. PMID: 19754274
fastDNAml 1.2.2 Tree fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Olsen GJ, Matsuda H, Hagstrom R, Overbeek R. Comput Appl Biosci. 1994 Feb;10(1):41-8. http://iubio.bio.indiana.edu/soft/molbio/evolve/fastdnaml/fastDNAml.html
Archaeopteryx - Viewer 0.957 Tree Han M.V. and Zmasek C.M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10:356. http://www.phylosoft.org/archaeopteryx/
Garli 2.0 Tree Zwickl, D. J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation,The University of Texas at Austin. https://www.nescent.org/wg_garli/Main_Page
MrBayes 3.1.2 Tree MrBayes 3: Bayesian phylogenetic inference under mixed models. Ronquist F, Huelsenbeck JP. Bioinformatics. 2003 Aug 12;19(12):1572-4. Altekar, G., S. Dwarkadas, J. P. Huelsenbeck, and F. Ronquist. 2004. Parallel Metropolis-coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20:407-415. http://molecularevolution.org/software/phylogenetics/mrbayes
PhyML 3.0.1 Tree "A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood." Guindon S, Gascuel O. Systematic Biology. 2003 52(5):696-704. This is a detailed description of the algorithm. http://code.google.com/p/phyml/
RootTree (MidPoint) Tree N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences,, Genome Research, 14:693-699 (2004). http://bio.math.berkeley.edu/mavid/
Scriptree - Viewer 17 Tree ScripTree: scripting phylogenetic graphics. Chevenet F, Croce O, Hebrard M, Christen R, Berry V. Bioinformatics. 2010 Apr 15;26(8):1125-6. http://lamarck.lirmm.fr/scriptree/
Robinson&Fould Tree - Distance Robinson, D.R. et L.R. Foulds. (1981). Comparison of phylogenetic trees. Math Biosci., volume 53, pages 131-147. trex.bioinfo.uqam.ca
TreeDist (Phylip) 3.69 Tree - Distance Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip/
CONSENSE (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166 http://evolution.genetics.washington.edu/phylip.html
DNADIST (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
DNAML (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
DNAML-Erate 1.0 Tree - Phylip Probabilistic Phylogenetic Inference with Insertions and Deletions. E. Rivas, S. R. Eddy. PLoS Comput. Biol., 4:e1000172, 2008. http://selab.janelia.org/software.html
DNAPARS (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
NEIGHBOR (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
PROTML (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
PRODIST (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
PROPARS (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip.html
RETREE (Phylip) 3.69 Tree - Phylip PHYLIP -- Phylogeny Inference Package (Version 3.2). Felsenstein, J. 1989. Cladistics 5: 164-166 http://evolution.genetics.washington.edu/phylip.html
SEQBOOT (Phylip) 3.69 Tree - Phylip Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://evolution.genetics.washington.edu/phylip/



Supported by grants from:
                     

Licence informations:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see<http://www.gnu.org/licenses/>.