ABSTRACT and statistics in molecular biology to archive,
The pace, by which
scientific knowledge is being produced and shared today, was never been so fast
in the past. Different areas of science are getting closer to each other to
give rise new disciplines. Bioinformatics is one of such newly emerging fields,
which makes use of computer, mathematics and statistics in molecular biology to
archive, retrieve, and analyse biological data. Although yet at infancy, it has
become one of the fastest growing fields, and quickly established itself as an
integral component of any biological research activity. It is getting popular
due to its ability to analyze huge amount of biological data quickly and
cost-effectively. Bioinformatics can assist a biologist to extract valuable
information from biological data providing various web- and/or computer-based
tools, the majority of which are freely available. The present review gives a
comprehensive summary of some of these tools available to a life scientist to
analyse biological data. Exclusively this review will focus on those areas of
biological research, which can be greatly assisted by such tools like analyzing
a DNA and protein sequence to identify various features, prediction of 3D
structure of protein molecules, to study molecular interactions, and to perform
simulations to mimic a biological phenomenon to extract useful information from
the biological data. The functioning and specificity of the tools like ENTREZ,
iTasser, GENSCAN, ORF finder; Modeller and some other softwares and tools given
on other pages and these are discussed in the following review.
an interdisciplinary science, emerged by the combination of various other
disciplines like biology, mathematics, computer science, and statistics, to
develop methods for storage, retrieval and analyses of biological data.Paulien
Hogeweg, a Dutch system-biologist, was the first person who used the term “Bioinformatics”
in 1970, referring to the use of information technology for studying biological
systems. The launch of userfriendly interactive automated modeling along with
the creation of SWISS-MODEL server around 18 years ago 4 resulted in massive
growth of this discipline. Since then, it has become an essential part of
biological sciences to process biological data at a much faster rate with the
databases and informatics working at the backend.
tools are routinely used for characterization of genes, determining structural
and physiochemical properties of proteins, phylogenetic analyses, and
performing simulations to study how biomolecule interact in a living cell.
Although these tools cannot generate information as reliable as experimentation,
which is expensive, time consuming and tedious, however, the in
silico analyses can still facilitate to reach an informed
decision for conducting a costly experiment. For example, a druggable molecule
must have certain ADMET (absorption, distribution, metabolism, excretion, and
toxicity) properties to pass through clinical trials. If a compound does not
have required ADMETs, it is likely to be rejected. To avoid such failures,
different bioinformatics tools have been developed to predict ADMET properties,
which allow researchers to screen a large number of compounds to select most
druggable molecule before launching of clinical trials. Earlier, a number of
reviews on various specialized aspects of bioinformatics have been written.
However, none of these articles makes it suitable for a scientist who does not
belong to computational biology. Here, we take the opportunity to introduce
various tools of bioinformatics to a non-specialist reader to help extract
useful information regarding his/her project. Therefore, we have selected only
those areas where these tools could be highly useful to obtain useful
information from biological data. These areas include analyses of DNA/protein
sequences, phylogenetic studies, predicting 3D structures of protein molecules,
molecular interactions and simulations as well as drug designing. The
organization of text in each section starts from a simplistic overview of each
area followed by key reports from literature and a tabulated summary of related
tools, where necessary, towards the end of each section.
Iterative Threading ASSEmbly Refinementis
a bioinformatics method for predicting three-dimensional structure model
of protein molecules from amino acid sequences.
It detects structure templates
from the Protein Data Bank by a technique called
fold recognition or threading. The full-length structure models are
constructed by reassembling structural fragments from threading templates using
Replica Exchange Monte Carlo Simulation. I-TASSER is one of the most
successful protein structure prediction methods in the
community-wide CASP experiments. I-TASSER has been extended for
structure-based protein function predictions, which provides annotations
on ligand binding
ontology and enzyme
commission by structurally matching
structural models of the target protein to the known proteins in protein
function databases. It has an on-line server built in the Yang Zhang Lab at the University of Michigan, Ann
Arbor, allowing users to submit
sequences and obtain structure and function predictions. A standalone package
of I-TASSER is available for download at the I-TASSER
The I-TASSER server allows users
to generate automatically protein structure and function predictions.
acid sequence with length from 10 to 1,500 residues
(user can provide optionally restraints and templates to assist I-TASSER
of special templates
of special templates
10 threading alignment from LOMETS
5 full-length atomic models (ranked based on cluster density)
10 proteins in PDB which are structurally closest to the predicted models
accuracy of the predicted models (including a confidence score of all models,
predicted TM-score and RMSD for the first model, and per-residue error of all
Classification (EC) and the confidence score
Ontology (GO) terms and the confidence score
sites and the confidence score
image of the predicted ligand-binding sites
Bioinformatics is a comparatively young discipline and has progressed
very fast in the last few years. It has made it possible to test our hypotheses
virtually and therefore allows to take a better and an informed decision before
launching costly experimentations. Although, more and more tools for analyzing
genomes, proteomes, predicting structures, rational drug designing and
molecular simulations are being developed; none of them is ‘perfect’.
Therefore, the hunt for finding a better package for solving the given problems
will continue. One thing is clear that the future research will be guided
largely by the availability of databases, which could be either generic or
specific. It can also be safely assumed, based on the developments in the field
of bioinformatics, that the bioinformatics tools and software packages would be
able to give results that are more accurate and thus more reliable
interpretations. Prospects in the field of bioinformatics include its future
contribution to functional understanding of the human genome, leading to
enhanced discovery of drug targets and individualized therapy. Thus,
bioinformatics and other scientific disciplines have to move hand in hand to
flourish for the welfare of humanity. And some other softwares and tools are
Tools and softwares
• ORF finder
NCBI ORF finder
Mount DW (2004) Sequence and
genome analysis. New York: Cold Spring.
Hesper B, Hogeweg P (1970)
Bioinformatica:eenwerkconcept. Kameleon 1:28-9.
Hogeweg P (2011) The roots of
bioinformatics in theoretical biology. PLoS Comput Biol 7: e1002021.
Peitsch MC (1996) ProMod and Swiss-Model:
Internet-based tools for automated comparative protein modelling. Biochem
Soc Trans 24: 274-279.
Dibyajyoti S, Bin ET, Swati P
(2013) Bioinformatics: The effects on the cost of drug discovery. Galle
Med J 18:44-50.
Ouzounis CA, Valencia A (2003) Early bioinformatics: the birth
of a discipline–a personal view. Bioinformatics 19: 2176-2190.
Molatudi M, Molotja N, Pouris A
study of bioinformatics research in South Africa. Scientometrics 81:47-59.
Ouzounis CA (2012) Rise and demise of
bioinformatics? Promise and progress. PLoS Comput Biol 8: e1002487.
Geer RC, Sayers EW (2003) Entrez: making use of its
power. Brief Bioinform 4: 179-184.
Parmigiani G, Garrett ES,
Irizarry RA, Zeger SL (2003) The analysis of gene expression data: an
overview of methods and software, Springer, New York.