Introduction to Bioinformatics E-mail: hqzhu@pku.edu.cn / 1 2 What is Bioinformatics / Computational Biology? NIH Bioinformatics (Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.) Computational Biology (The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. ) Walter Gilbert (1932- ) Harvard University, Biological Laboratories W. Gilbert, Towards A Paradigm Shift in Biology, Nature, 349(1991)99 Bioinformatics Computational Biology

1 9 1016:50~18:40 305 / 2/2 16 2=32 2 E-mail: hqzhu@pku.edu.cn Tel: 6276-7261(office) 3 URL http://www.coe.pku.edu.cn/subpaget.asp?id=25

~ 4 ~ 6 50% 50% 50% 1. Westhead et al., Bioinformatics 2003 2. Durbin, Biological sequence analysis, 2002 3. Brown2002 4. 2002 5. Pevsner 2006 6. Boogerd 2008 7. Klipp 2007 8. Google http://www.google.cn/intl/zh-cn/ http://en.wikipedia.org/wiki/main_page

1.1 HGP HGP 20 HGP 1942-46 (1961-69) 1961Kennedy (1962 Kennedy Kennedy Rice) (1990-2003) (HGP Human Genome Project) 1 2

DNA (Genome) 3.2 10 9 bp2 DNA: () A, C, G, T Gene DNA HGP 1984.12 DNA 1986.3 DulbeccoScience : (DOE) 1987 NIH 550 1989 Watson 1990.10 HGP HGP DNA Watson,1990, Science James Watson Walter Gilbert

HGP 15(1990 2005) 30DNA 10 4 1995 (H. inf) 1996 HGP DNA HGP H.Inf Saccharomyces cerevisiae Caenorhabditis elegans 1997 (E. coli) 1998 Celera 1999.7 5 2000 Celera 2000.6.26 Celera 2001.2.15 Nature 2001.2.16 Science Celera Drosophila melanogaster Arabidopsis thaliana

2001 2 15Nature 2001 2 16Science At the White House on June 26, Francis Collins (r), Director of the National Human Genome Research Institute, President Clinton, and J. Craig Venter, President of Celara Genomics, lauded the thousands of scientists who contributed to the genome sequence. 2001 8 26 2002 2003 4 14 6 2003 102004 10

5,700 (Archaea) (Bacteria) (Eukaryo) (Virus) (Viroid) (Phage) (Organelle) (Plasmid) 58 778 87 1,773 48 502 1,764 739 2 http://www.ebi.ac.uk/genomes/ 2009 2

Gene-swapping collectives Microbiome Metagenomics Microbiome DNA Bacterial genome dynamics. There are three main forces that shape bacterial genomes: gene gain, gene loss and gene change. All three of these can take place in a single bacterium. Some of the changes that result from the interplay of these forces are shown. (Pallen & Wren, Nature, 2007)

HGP 1 2 3 1998 Celera 4 1989 2 1/3 1990 6 3 1990 7 1995 7 1999 9 1 5 The Sun, the Genome, and the Internet Tools of Scientific Revolution Freeman Dyson

HGP HGP: Pandora's Box DNA DNA X 1.2

Walter Gilbert (1932- ) Harvard University, Biological Laboratories W. Gilbert, Towards A Paradigm Shift in Biology, Nature, 349(1991)99 1 2 3 / /

What is Bioinformatics / Computational Biology? NIH Bioinformatics (Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.) Computational Biology (The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. ) Bioinformatics Computational Biology Bioinformatics Dr. Hwa A. Lim 1987 Bio-informatique Bioinformatics 1955Bioinformatics University of Texas at Dallas Adjunct Professor 1981(Imperial College, London University)1986 Rochester University30 1992 1995 1997 D Trends Biothechnology Information Nano Binformatics / / DNA RNA DNA

When I give talks to young scientists seeking advice about areas of future intense scientific excitement, computational biology is my number one recommendation. Francis Collins, Director of HGP at NIH The next step in the project is the interpretation phase. That is really the fun part of the whole project because then we finally have the complete order of all layers of genetic codes and we have to discover what it all means. J. Craig Venter, Head of Celera Genomics Inc. 1.3

/ Data-mining, Knowledge Discovery Extracts the hidden patterns from huge quantities of experimental data, and forms hypotheses as a result. DNA RNA 1.3.1 Simulation-based Analysis Tests hypotheses with in silico experiments, providing predictions to be tested by in vitro and in vivo studies. data-mining, knowledge discovery () () ( ) AQ

McCulloch-Pitts RNA RNA / / (HMM) () Motif / /

1.3.2 DNA RNA Predict the dynamics of systems so that the validity of the underlying assumptions can be tested. (Computation) 1 (System Analysis) (New Biotechnologies) (Systems Biology) 2Petri (Experimental Data)

System Nodes Edges 1.4 (Systems Biology) (General Biology) (Experimental, Theoretical, Computational Biology ) (Biotechnology)

Like it or not, big biology is here to stay. 2006 2020

1 2 1 What is life 2 3 4 1 2 3 4

1. 2 3 Markov MarkovHMM ANN DNA RNA 1 2 3BLAST 1 2 3 1 2 1 2. 3. 4 5 1. 2. 3 4