What is Bioinformatics?
The genomic era
                        
                        
                        
The genomic era has seen a massive explosion in the amount of biological
                          information available due to huge advances in the fields of molecular
                          biology and genomics.
Bioinformatics is the application of
                          computer technology to the management and analysis of biological data.
                          The result is that computers are being used to gather,
                          store, analyse and merge biological data.
Bioinformatics is  an 
interdisciplinary research area that is the interface between
                          the biological and computational sciences.
                          The ultimate goal of bioinformatics is to 
uncover the wealth of biological
                          information hidden in the mass of data and 
obtain a clearer insight into
                          the fundamental biology of organisms. This new
 knowledge could have profound 
        impacts on fields as varied as human health, agriculture, the 
environment, 
        energy and biotechnology.
Why is bioinformatics important?
The greatest challenge facing the molecular biology 
                  community today is to make sense of the wealth of 
                  data that has been produced by the genome sequencing projects.
                   Traditionally, molecular biology research was 
                   carried out entirely at the experimental laboratory 
                   bench but the huge increase in the scale of data 
                   being produced in this genomic era has seen a 
                   need to incorporate computers into this research 
                   process. 
Sequence generation, and its subsequent storage, interpretation and analysis are entirely 
        computer dependent tasks. However, the molecular biology of an organism is a very complex
         issue with research being carried out at different levels including the genome, proteome, 
         transcriptome and metabalome levels. Following on from the explosion in volume of genomic data,
          similar increase in data  have been observed in the fields of proteomics, transcriptomics and metabalomics.
       
       
The first challenge facing the bioinformatics community today is the intelligent and 
        efficient storage of this mass of data. It is then their responsibility to provide 
        easy and reliable access to this data. The data itself is meaningless before analysis 
        and the sheer volume present makes it impossible for even a trained biologist to begin 
        to interpret it manually. Therefore, incisive computer tools must be developed to allow the  
        extraction of meaningful biological information. 
There are three central biological processes 
        around which bioinformatics tools must be developed:
                          
                        
                      
- DNA sequence determines protein sequence
- Protein sequence determines protein structure
- Protein structure determines protein function
The integration of information learned about these key biological processes should allow 
        us to achieve the long term goal of the complete understanding of the biology of organisms.
Biological databases
Biological databases are
 archives of consistent data that are stored in a uniform and 
            efficient manner. These databases contain data from a broad 
spectrum of molecular 
            biology areas. Primary or archived databases contain 
information and annotation of 
            DNA and protein sequences, DNA and protein structures and 
DNA and protein expression 
            profiles. 
Secondary or derived databases are so called because they contain the 
            results of analysis on the primary resources including information on sequence 
            patterns or motifs, variants and mutations and evolutionary relationships. 
            Information from the literature is contained in bibliographic databases, such as Medline.
It is essential that these databases are easily accessible and that an intuitive query system 
      is provided to allow researchers to obtain very specific information on a particular biological 
      subject. The data should be provided in a clear, consistent manner with some visualization tools 
      to aid biological interpretation. 
Specialist databases for particular subjects have been set-up for example EMBL database for nucleotide sequence data, UniProtKB/Swiss-Prot protein database and PDBe a 3D protein structure database.
Scientists also need to be able to integrate the information 
      obtained from the underlying heterogeneous databases in a sensible manner in order to be able 
      to get a clear overview of their biological subject. SRS  (Sequence Retrieval System) 
      is a powerful, querying tool provided by the EBI that links information from more than 150 heterogeneous 
      resources. 
 Biological applications
Once all of the biological data is stored consistently and is easily 
            available to the scientific community, the requirement is then to 
            provide methods for  extracting the meaningful information from 
            the mass of data. Bioinformatic tools are software programs that are 
            designed to carry out this analysis step. 
            
Factors that must be taken 
            into consideration when designing these tools are:
                        
- The end user (the biologist) may not be a frequent user of computer technology
- These software tools must be made available over the internet given the global distribution of 
      the scientific research community
The EBI provides a wide range of biological data analysis tools that fall into the following four 
      major categories:
Similarity Searching Tools 
Homologous sequences are sequences that are 
related by divergence from a common ancestor. 
        Thus the degree of similarity between two sequences can be 
measured while their homology is 
        a case of being either true of false. This set of tools can be 
used to identify similarities between novel query sequences of 
        unknown structure and function and database sequences whose 
structure and function have 
        been elucidated.
        
        
                        
Protein Function Analysis 
This group of programs allow you to compare your protein sequence to the secondary (or derived) 
        protein databases that contain information on motifs, signatures and protein domains. Highly 
        significant hits against these different pattern databases allow you to approximate the 
        biochemical function of your query protein.
        
        
Structural Analysis  
This set of tools allow you to compare 
structures with the known structure databases. The function of a protein
 is more directly a consequence of 
        its structure rather than its sequence with structural homologs 
tending to share functions. 
        The determination of a protein's 2D/3D structure is crucial in 
the study of its function.
Sequence Analysis
This set of tools allows you to carry out further, more detailed analysis on your query 
        sequence including evolutionary analysis, identification of mutations, hydropathy regions, 
        CpG islands and compositional biases. The identification of these and other biological 
        properties are all clues that aid the search to elucidate the specific function of your sequence.
Real world  applications of bioinformatics
                        
                                                The science of bioinformatics has many beneficial uses in the modern day world. 
                        
                                                These include the following:
                                                                        
                        
                             
       
                    
                        
                  
                  
                  
                  
                          
                     
                            
                       
                              
                        
                        
1. Molecular medicine
The human genome will have profound effects on the fields of biomedical research and 
clinical medicine. Every disease has a genetic component. This may be inherited (as is the 
case with an estimated 3000-4000 hereditary disease including Cystic Fibrosis and Huntingtons disease) 
or a result of the body's response to an environmental stress which causes alterations in the genome 
(eg. cancers, heart disease, diabetes..). 
                          
                                                    The completion of the human genome means that we can 
search for the genes directly associated with different diseases and begin to understand the 
molecular basis of these diseases more clearly. This new knowledge of the molecular mechanisms 
of disease will enable better treatments, cures and even preventative tests to be developed.
                          
                        
    
                        
1.1 More drug targets
                          At present all drugs on the market target only about 500 proteins. With 
        an improved understanding of disease mechanisms and using computational 
        tools to identify and validate new drug targets, more specific medicines 
        that act on the cause, not merely the symptoms, of the disease can be 
        developed. These highly specific drugs promise to have fewer side effects 
        than many of today's medicines.
                          
                        
1.2 Personalised medicine
                          Clinical medicine will become more personalised with the development of the 
        field of pharmacogenomics. This is the study of how an individual's genetic 
        inheritence affects the body's response to drugs. At present, some drugs 
        fail to make it to the market because a small percentage of the clinical 
        patient population show adverse affects to a drug due to sequence variants 
        in their DNA. 
                          
                                                    As a result, potentially life saving drugs never make it to 
        the marketplace. Today, doctors have to use trial and error to find the 
        best drug to treat a particular patient as those with the same clinical 
        symptoms can show a wide range of responses to the same treatment. In the 
        future, doctors will be able to analyse a patient's genetic profile and 
        prescribe the best available drug therapy and dosage from the beginning. 
                          
                        
1.3 Preventative medicine
                          
                          With the specific details of the genetic 
mechanisms of diseases being 
        unravelled, the development of diagnostic tests to measure a 
persons 
        susceptibility to different diseases may become a distinct 
reality. Preventative actions such as change of lifestyle or having 
treatment at 
        the earliest possible stages when they are more likely to be 
successful, 
        could result in huge advances in our struggle to conquer 
disease.
                          
    
                        
1.4 Gene therapy
                          In the not too distant future, the potential for using genes themselves to 
        treat disease may become a reality. Gene therapy is the approach used to 
        treat, cure or even prevent disease by changing the expression of a persons 
        genes. Currently, this field is in its infantile stage with clinical trials 
        for many different types of cancer and other diseases ongoing.
                        
2. Microbial genome applications
Microorganisms are ubiquitous, that is they are found everywhere. They have 
      been found surviving and thriving in extremes of heat, cold, radiation, salt, 
      acidity and pressure. They are present in the environment, our bodies, the air, 
      food and water. 
                          
                          Traditionally, use has been made of a variety of microbial properties in the 
        baking, brewing and food industries. The arrival of the complete genome 
        sequences and their potential to provide a greater insight into the microbial 
        world and its capacities could have broad and far reaching implications for 
        environment, health, energy and industrial applications.   For these reasons, 
        in 1994, the US Department of Energy (DOE) initiated the 
        MGP (Microbial Genome Project) to 
        sequence genomes of bacteria useful in energy production, environmental 
        cleanup, industrial processing and toxic waste reduction.
                          
                          By studying the genetic material of these organisms, scientists can begin 
        to understand these microbes at a very fundamental level and isolate the 
        genes that give them their unique abilities to survive under extreme conditions.
                          
                        
    
                        
2.1 Waste cleanup
                          Deinococcus radiodurans is known as the world's toughest bacteria and it is the most 
        radiation resistant organism known. Scientists are interested in this organism 
        because of its potential usefulness in cleaning up waste sites that contain 
        radiation and toxic chemicals. 
 
Microbial Genome Program (MGP) scientists are determining the DNA sequence of 
        the genome of C. crescentus, one of the organisms responsible for sewage treatment.
                          
    
                        
2.2 Climate change
                          Increasing levels of carbon dioxide emission, mainly through the expanding 
        use of fossil fuels for energy, are thought to contribute to global climate 
        change. Recently, the DOE (Department of Energy, USA) launched a 
        program to decrease atmospheric carbon dioxide levels. One method of doing 
        so is to study the genomes of microbes that use carbon dioxide as their sole 
        carbon source.
                        
2.3 Alternative energy sources  
                          Scientists are studying the genome of the microbe Chlorobium tepidum which 
        has an unusual capacity for generating energy from light.
                          
                        
2.4 Biotechnology
                          The archaeon Archaeoglobus fulgidus and the bacterium Thermotoga maritima        have potential for practical applications in industry and government-funded 
        environmental remediation. These microorganisms thrive in water temperatures 
        above the boiling point and therefore may provide the DOE, the Department 
        of Defence, and private companies with heat-stable enzymes suitable for 
        use in industrial processes.
                        Other industrially useful microbes include, Corynebacterium glutamicum which is of high industrial interest as a research 
        object because it is used by the chemical industry for the biotechnological 
        production of the amino acid lysine. The substance is employed as a source of 
        protein in animal nutrition. Lysine is one of the essential amino acids in 
        animal nutrition. Biotechnologically produced lysine is added to feed 
        concentrates as a source of protein, and is an alternative to soybeans or 
        meat and bonemeal.
                          
                          Xanthomonas campestris pv.  is grown commercially to produce the 
        exopolysaccharide xanthan gum, which is used as a viscosifying and stabilising 
        agent in many industries.
                          
                          Lactococcus lactis is one of the most important micro-organisms involved in 
        the dairy industry, it is a non-pathogenic rod-shaped bacterium that is 
        critical for manufacturing dairy products like buttermilk, yogurt and cheese. 
        This bacterium, Lactococcus lactis ssp., is also used to prepare 
        pickled vegetables, beer, wine, some breads and sausages and other fermented 
        foods. Researchers anticipate that understanding the physiology and genetic 
        make-up of this bacterium will prove invaluable for food manufacturers as 
        well as the pharmaceutical industry, which is exploring the capacity of 
        L. lactis to serve as a vehicle for delivering drugs.
                          
                        
                        
2.5 Antibiotic resistance
                          Scientists have been examining the genome of Enterococcus faecalis a leading 
        cause of bacterial infection among hospital patients. They have discovered a 
        virulence region made up of a number of antibiotic-resistant genes that may 
        contribute to the bacterium's transformation from a harmless gut bacteria to 
        a menacing invader. The discovery of the region, known as a pathogenicity 
        island, could provide useful markers for detecting pathogenic strains and 
        help to establish controls to prevent the spread of infection in wards.
                          
                        
2.6 Forensic analysis of microbes
                          Scientists used their genomic tools to help distinguish between the strain of Bacillus anthracis that was used in the summer of 2001 terrorist attack in Florida 
        with that of closely related anthrax strains.
                          
                        
2.7 The reality of bioweapon creation
                          Scientists have recently built the virus poliomyelitis using entirely artificial 
        means. They did this using genomic data available on the Internet and 
        materials from a mail-order chemical supply. The research was financed by 
        the US Department of Defence as part of a biowarfare response program to 
        prove to the world the reality of bioweapons. The researchers also hope 
        their work will discourage officials from ever relaxing programs of 
        immunisation. This project has been met with very mixed feeelings, and more.
                          
                        
    
                        
2.8 Evolutionary studies
                          The sequencing of genomes from all three domains of life, eukaryota, bacteria 
        and archaea means that evolutionary studies can  be performed in a quest to 
        determine the tree of life and the last universal common ancestor.
                        For more interesting stories, check the archive at the Genome News Network (GNN).
                          
                        For information on structural, functional and 
comparative analysis of genomes and genes from a wide variety of 
organisms see The Institute of Genomic Research (TIGR).
                        
3. Agriculture
The sequencing of the genomes of plants and animals should have enormous 
      benefits for the agricultural community. Bioinformatic tools can be used to 
      search for the genes within these genomes and to elucidate their functions. 
      This specific genetic knowledge could then be used to produce stronger, 
      more drought, disease and insect resistant crops and improve the quality 
      of livestock making them healthier, more disease resistant and more productive. 
                        
3.1 Crops
                          
                          Comparative genetics of the plant genomes has 
shown that the organisation 
        of their genes has remained more conserved over evolutionary 
time than was previously believed. These findings suggest that 
information 
        obtained from the model crop systems can be used to suggest 
improvements 
        to other food crops. Arabidopsis thaliana (water cress) and Oryza sativa (rice) are examples of available complete plant genomes.
                        
3.2 Insect resistance
                          
                          Genes from Bacillus thuringiensis that can control a number of serious pests 
        have been successfully transferred to cotton, maize and potatoes. This new 
        ability of the plants to resist insect attack means that the amount of 
        insecticides being used can be reduced and hence the nutritional quality 
        of the crops is increased.
    
                        
3.3 Improve nutritional quality
                           
                          Scientists have recently succeeded in transferring genes into rice to 
        increase levels of Vitamin A, iron and other micronutrients. This work could 
        have a profound impact in reducing occurrences of blindness and anaemia caused 
        by deficiencies in Vitamin A and iron respectively.
                          
Scientists have inserted a gene from yeast into the tomato, and the 
result is a plant whose fruit stays longer on the vine and has an 
extended shelf life,and more.
 
    
                        
3.4 Grow in poorer soils and drought resistant
                          
                          Progress has been made in developing cereal varieties that have a greater 
        tolerance for soil alkalinity, free aluminium and iron toxicities. These 
        varieties will allow agriculture to succeed in poorer soil areas, thus 
        adding more land to the global production base. Research is also in progress 
        to produce crop varieties capable of tolerating reduced water conditions.
 
    
                        
4. Animals
                          
                          Sequencing projects of many farm animals including cows, pigs and sheep 
        are now well under way in the hope that a better understanding of the 
        biology of these organisms will have huge impacts for improving the 
        production and health of livestock and ultimately have benefits for 
        human nutrition.
                        
5. Comparative studies
                        
                          Analysing and comparing the genetic material 
        of different species  is an important method for studying the 
        functions of genes, the mechanisms of inherited diseases and species 
        evolution. Bioinformatics tools can be used to make comparisons between 
        the numbers, locations and biochemical functions of genes in different 
        organisms. 
                        
                          Organisms that are suitable for use in experimental research are termed 
        model organisms. They have a number of properties that make them ideal 
        for research purposes including short life spans, rapid reproduction, 
        being easy to handle, inexpensive and they can be manipulated at the 
        genetic level.
                        
                          An example of a human model organism is the mouse. Mouse and human 
        are very closely related (>98%) and for the most part we see a 
        one to one correspondence between genes in the two species. Manipulation 
        of the mouse at the molecular level and genome comparisons between the 
        two species can and is revealing detailed information on the functions 
        of human genes, the evolutionary relationship between the two species 
        and the molecular mechanisms of many human diseases.