Microsoft PowerPoint - 04 Models of Amino Acid and Codon Substitution.ppt

Similar documents
ENGG1410-F Tutorial 6

[9] R Ã : (1) x 0 R A(x 0 ) = 1; (2) α [0 1] Ã α = {x A(x) α} = [A α A α ]. A(x) Ã. R R. Ã 1 m x m α x m α > 0; α A(x) = 1 x m m x m +

OncidiumGower Ramsey ) 2 1(CK1) 2(CK2) 1(T1) 2(T2) ( ) CK1 43 (A 44.2 ) CK2 66 (A 48.5 ) T1 40 (


%

Microsoft Word - TIP006SCH Uni-edit Writing Tip - Presentperfecttenseandpasttenseinyourintroduction readytopublish

% % 34

Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies to all d

VASP应用运行优化

20

Shanghai International Studies University THE STUDY AND PRACTICE OF SITUATIONAL LANGUAGE TEACHING OF ADVERB AT BEGINNING AND INTERMEDIATE LEVEL A Thes


Microsoft Word - ChineseSATII .doc

分子生物学实验专题手册.ai

untitled

Microsoft PowerPoint - STU_EC_Ch08.ppt

硕 士 学 位 论 文 论 文 题 目 : 北 岛 诗 歌 创 作 的 双 重 困 境 专 业 名 称 : 中 国 现 当 代 文 学 研 究 方 向 : 中 国 新 诗 研 究 论 文 作 者 : 奚 荣 荣 指 导 老 师 : 姜 玉 琴 2014 年 12 月

中国的知识分子与民间(社会)

- I -

<4D F736F F D2035B171AB73B6CBA8ECAB73A6D3A4A3B6CBA158B3AFA46CA9F9BB50B169A445C4D6AABAB750B94AB8D6B9EFA4F1ACE3A873>

Stochastic Processes (XI) Hanjun Zhang School of Mathematics and Computational Science, Xiangtan University 508 YiFu Lou talk 06/

中 國 學 研 究 期 刊 泰 國 農 業 大 學 บ นทอนเช นก น และส งผลก บการด ดแปลงจากวรรณกรรมมาเป นบทภาพยนตร และบทละคร โทรท ศน ด วยเช นก น จากการเคารพวรรณกรรมต นฉบ บเป นหล

: ( ),,,,, 1958,,, , 263, 231, ,,,,,,, 4, 51, 5, 46, 1950, :,, 839, 3711, ( ) ( ) 20 ( ),, 56, 2, 17, 2, 8, 1,,,,, :,,,, ;,,,,

東吳大學

.., + +, +, +, +, +, +,! # # % ( % ( / 0!% ( %! %! % # (!) %!%! # (!!# % ) # (!! # )! % +,! ) ) &.. 1. # % 1 ) 2 % 2 1 #% %! ( & # +! %, %. #( # ( 1 (

# % & ) ) & + %,!# & + #. / / & ) 0 / 1! 2

课题调查对象:

: ( ),,

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I

論 文 摘 要 本 文 乃 係 兩 岸 稅 務 爭 訟 制 度 之 研 究, 蓋 稅 務 爭 訟 在 行 訴 訟 中 一 直 占 有 相 當 高 的 比 例, 惟 其 勝 訴 率 一 直 偏 低, 民 87 年 10 月 28 日 行 訴 訟 法 經 幅 修 正 後, 審 級 部 分 由 一 級 一

A Community Guide to Environmental Health

标题

苗 栗 三 山 國 王 信 仰 及 其 地 方 社 會 意 涵 The Influences and Implications of Local Societies to Three Mountain Kings Belief, in Taiwan Miaoli 研 究 生 : 林 永 恩 指 導

Microsoft PowerPoint _代工實例-1

天 主 教 輔 仁 大 學 社 會 學 系 學 士 論 文 百 善 孝 為 先? 奉 養 父 母 與 接 受 子 女 奉 養 之 態 度 及 影 響 因 素 : 跨 時 趨 勢 分 析 Changes in attitude toward adult children's responsibilit

<4D F736F F D D31332DA655B0CFB9EAAC49A5AEA8E0B942B0CAB943C0B8BDD2B57BB27BAA70A4C0AA522D2D2DBBB2A46A >

中國飲食色彩初探

Microsoft PowerPoint - CH 04 Techniques of Circuit Analysis

θ 1 = φ n -n 2 2 n AR n φ i = 0 1 = a t - θ θ m a t-m 3 3 m MA m 1. 2 ρ k = R k /R 0 5 Akaike ρ k 1 AIC = n ln δ 2

(Microsoft Word - 001\253\312\255\261.doc)

南華大學數位論文


元代題畫女性詩歌研究

國家圖書館典藏電子全文

國立中山大學學位論文典藏.PDF

前 言 一 場 交 換 學 生 的 夢, 夢 想 不 只 是 敢 夢, 而 是 也 要 敢 去 實 踐 為 期 一 年 的 交 換 學 生 生 涯, 說 長 不 長, 說 短 不 短 再 長 的 路, 一 步 步 也 能 走 完 ; 再 短 的 路, 不 踏 出 起 步 就 無 法 到 達 這 次

一步一步教你使用NCBI

天 主 教 輔 仁 大 學 社 會 學 系 學 士 論 文 小 別 勝 新 婚? 久 別 要 離 婚? 影 響 遠 距 家 庭 婚 姻 感 情 因 素 之 探 討 Separate marital relations are getting better or getting worse? -Exp

48 5 ( Pinctada) ( P. fucata ), 3 [ 7 ] ( P. maxima ) ( P. albina), ( P. margaritifera) ( P. nigra) RAPD, ( P. chemnitzi), ( Pteria penguin) [ 1-2 ] [

环境指标

Microsoft Word - 桂政发(2016)20号.doc

<4D F736F F D2032B8ADACC2A7672DB4E5C3C0A9F3C0B8A141A6DCB5BDA9F3C3C02E646F63>

Prasenjit Duara 3 nation state Northwestern Journal of Ethnology 4 1. A C M J M M

Fun Time (1) What happens in memory? 1 i n t i ; 2 s h o r t j ; 3 double k ; 4 char c = a ; 5 i = 3; j = 2; 6 k = i j ; H.-T. Lin (NTU CSIE) Referenc

南華大學數位論文

BC04 Module_antenna__ doc

國立中山大學學位論文典藏.PDF

新能源汽车蓝皮书

國立中山大學學位論文典藏.PDF

本人声明

4. 每 组 学 生 将 写 有 习 语 和 含 义 的 两 组 卡 片 分 别 洗 牌, 将 顺 序 打 乱, 然 后 将 两 组 卡 片 反 面 朝 上 置 于 课 桌 上 5. 学 生 依 次 从 两 组 卡 片 中 各 抽 取 一 张, 展 示 给 小 组 成 员, 并 大 声 朗 读 卡

优 雅 女 主 任 艺 高 人 胆 大 记 我 院 妇 科 主 任 主 任 医 师 金 海 红 筅 物 资 供 应 处 高 志 忠 画 面 1: 宫 颈 癌 子 宫 癌, 大 大 小 小 的 妇 科 肿 瘤, 行 宫 腔 镜 手 术 ; 空 旷 的 环 境, 泛 着 白 光 的 无 影 灯, 各 种

Microsoft Word - ED-774.docx

2005硕士论文模版

chap5.DOC

某制鞋厂苯接触者2014年在岗期间职业性健康检查结果分析*


... (1)....(3)..(5)... (22) (22)......(22)... (22)... (23)....(33)... (33)....(34)....(47)....(51).....(52)......(61)......(62)


<4D F736F F D205F FB942A5CEA668B443C5E9BB73A740B5D8A4E5B8C9A552B1D0A7F75FA6BFB1A4ACFC2E646F63>

mode of puzzle-solving

22期xin

60 教 育 資 料 集 刊 第 四 十 五 輯 2010 各 國 初 等 教 育 ( 含 幼 兒 教 育 ) The Centennial Change from Imitation to Innovation : A Strategic Adjustment in the Reform of C


Time Estimation of Occurrence of Diabetes-Related Cardiovascular Complications by Ching-Yuan Hu A thesis submitted in partial fulfillment of the requi

PowerPoint Presentation

,,,,, (,1988: 630) 218

<4D F736F F D203338B4C12D42A448A4E5C3C0B34EC3FE2DAB65ABE1>

% % % % % % ~

Microsoft Word - ChiIndexofNHE-03.doc

蕭登福 五氣生天.PDF

论 文 独 创 性 声 明 本 人 郑 重 声 明 所 提 交 的 学 位 论 文, 是 本 人 在 导 师 的 指 导 下, 独 立 撰 写 完 成 的. 除 文 中 己 经 注 明 引 用 的 内 容 外, 本 论 文 不 含 其 他 个 人 或 其 他 机 构 已 经 发 表 或 撰 写 过

52 1. T H H H ~ ~ ~ K23 107

Chn 116 Neh.d.01.nis

18世纪东亚儒教思想的地形

IPCC CO (IPCC2006) 1 : = ( 1) 1 (kj/kg) (kgc/gj) (tc/t)

<4D F736F F D20B0EAA5DFC170A658A46ABEC7A764ACFCAA4BBAD3A468B2A6B77EBDD7A4E E31315FABC8A965B77CA7B9A6A8AAA95F312E646F636D>

mm 400 mm 15 mm EOF mm/10a Fig. 1 Distributions

20-25%,

高餐通識教育學刊 第六期 On the Tea Culture in Verses of the Yuan Dynasty Shu-Hung Chuang Assistant professor, National Open University Abstract Tang poetry, Sung

host society. Unlike other specialized guild organizations or political institution the ethnic Chinese associations in the Netherlands exhibit a multi

9330.doc

Untitled-3

Microsoft PowerPoint - ATF2015.ppt [相容模式]

國立中山大學學位典藏

Microsoft Word - 先玉335 copy.doc

™¯™L

本 論 文 獲 行 政 院 客 家 委 員 會 一 年 客 家 研 究 優 良 博 碩 士 論 文 獎 助

致 谢 本 论 文 能 得 以 完 成, 首 先 要 感 谢 我 的 导 师 胡 曙 中 教 授 正 是 他 的 悉 心 指 导 和 关 怀 下, 我 才 能 够 最 终 选 定 了 研 究 方 向, 确 定 了 论 文 题 目, 并 逐 步 深 化 了 对 研 究 课 题 的 认 识, 从 而 一

東方設計學院文化創意設計研究所

Transcription:

Nei and Gojobori(986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * Pathway I: Pathway II: Pathway I: Pathway II: AAT(N) ACT(T) ACG(T) AAT(N) AAG(K) ACG(T) TTA(L) CTA(L) CTC(L) TTA(L) TTC(F) CTC(L) Number of synonymous substitutions: 5.5 Number of nonsynonymous substitutions: 4.5

Nei and Gojobori(986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * TTA ACA ATT CTA AAA ATC GTA AGA ATG Number of synonymous sites: 2/9 * 3 = 2/3 Number of nonsynonymous sites: 7/9 * 3 = 7/3

Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * 0 /3 2/3 2/3 /3 4/3 2/3 * 3 2 2 8/3 7/3 7/3 8/3 5/3 7/3 * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * 0 /3 0 2/3 2/3 * 3 2 2 8/3 2 3 2 7/3 7/3 * Number of synonymous sites: (8/3 + 7/3) / 2 = 35/6 Number of nonsynonymous sites: (63/3 + 64/3) / 2 = 27/6

Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * Number of synonymous substitutions: 5.5 Number of nonsynonymous substitutions: 4.5 Number of synonymous sites: (8/3 + 7/3) / 2 = 35/6 Number of nonsynonymous sites: (63/3 + 64/3) / 2 = 27/6 p-distance (synonymous): 5.5 / (35/6) = 0.942857 p-distance (nonsynonymous): 4.5 / (27/6) = 0.22598

Jukes and Cantor s (969) one-parameter model At time t, the probability that the nucleotide in both sequences is the same: I ( t) = 3 8αt 4 + 4 e Ancestral sequence Sequence Sequence 2 The probability that the two sequences are different at a site at time tis: 3 ( 8αt D = I ) ( t) = e 4D 8αt = ln 4 3 The actual number of substitutions per site since the divergence between the two sequences, K = 2(3αt) t t K = 3 4 ln D 4 3

Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * p-distance (synonymous): 5.5 / (35/6) = 0.942857 p-distance (nonsynonymous): 4.5 / (27/6) = 0.22598 K = 3 4 ln D 4 3 JC69 one parameter distance (synonymous): =? JC69 one parameter distance (nonsynonymous): = 0.249996

Modified Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * TTA ACA ATT CTA AAA ATC Ts / Tv = /2 α / β = Ts / Tv = α / β = 2 TTA ACA ATT CTA ACA ATC GTA AGA ATG GTA AAA ATG GTA AGA ATG Number of synonymous sites: 2/2 * 3 = 0.5 Number of nonsynonymous sites: 0/2 * 3 = 2.5

Modified Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * 0 /2 /2 /2 5/4 /2 * 3 2 2 5/2 2 5/2 5/2 7/4 5/2 * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * 0 /2 0 3/4 3/4 * 3 2 2 5/2 2 3 2 9/4 9/4 * Number of synonymous sites: (25/4 + 6) / 2 = 49/8 Number of nonsynonymous sites: (83/4 + 2) / 2 = 67/8

Modified Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * Number of synonymous substitutions: 5.5 Number of nonsynonymous substitutions: 4.5 Number of synonymous sites: (25/4 + 6) / 2 = 49/8 Number of nonsynonymous sites: (83/4 + 2) / 2 = 67/8 p-distance (synonymous): 5.5 / (49/8) = 0.897959 p-distance (nonsynonymous): 4.5 / (67/8) = 0.25569

Modified Nei and Gojobori (986) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * p-distance (synonymous): 5.5 / (49/8) = 0.897959 p-distance (nonsynonymous): 4.5 / (67/8) = 0.25569 K = 3 4 ln D 4 3 JC69 one parameter distance (synonymous): =? JC69 one parameter distance (nonsynonymous): = 0.25453

Li, Wu and Luo (985) Nondegenerate Twofold degenerate Fourfold degenerate If all possible changes at this site are nonsynonymous If one of the three possible changes is synonymous If all possible changes at the site are synonymous First, count the numbers of the three types of sites in each of the two sequences compared, and then compute the averages, denoting them by L 0 (nondegenerate), L 2 (twofold), L 4 (fourfold), respectively.

Nondegenerate Protein-coding sequences Li, Wu and Luo (985) Twofold degenerate Fourfold degenerate

Li, Wu and Luo (985) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * 002 * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * 002 * L 0 (nondegenerate) = L 2 (twofold) = L 4 (fourfold) =

Li, Wu and Luo (985) ATG GTC ACT CAT TTA ATA AAT CGG ATA TAA M V T H L I N R I * 000 004 004 002 202 002 002 204 002 * ATG GTT ACG CAA CTC ATG ACG AGG ATT TGA M V T Q L M T R I * 000 004 004 002 004 000 004 202 002 * L 0 (nondegenerate) = 36/2 = 8 L 2 (twofold) = /2 = 5.5 L 4 (fourfold) = 7/2 = 3.5

Li, Wu and Luo (985) ATG GTC ACT CAT TTA ATA AAT CGG ATA CGG*0 M V T H L I N R I R 000 004 004 002 202 002 002 204 002 204 ATG GTT ACG CAA CTC ATG ACG AGG ATT CGG*0 M V T Q L M T R I R 000 004 004 002 004 000 004 202 002 204 L 0 (nondegenerate) = 36/2 + 0 = 28 L 2 (twofold) = /2 + 0 = 5.5 L 4 (fourfold) = 7/2 + 0 = 3.5

Li, Wu and Luo (985) First, count the numbers of the three types of sites in each of the two sequences compared, and then compute the averages, denoting them by L 0 (nondegenerate), L 2 (twofold), L 4 (fourfold), respectively. The nucleotide differences in each class are further classified into transitional (S i ) and transversional (V i ) differences (i = 0, 2, 4). S 0 S 2 synonymous S 4 V 0 nonsynonymous V 2 V 4

Li, Wu and Luo (985) ATG GTC ACT CAT TTA ATA AAT CGG ATA CGG*0 M V T H L I N R I R 000 004 004 002 202 002 002 204 002 204 S 4 ATG GTT ACG CAA CTC ATG ACG AGG ATT CGG*0 M V T Q L M T R I R 000 004 004 002 004 000 004 202 002 204 S 4 L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = S 2 = S 4 = V 0 = V 2 = V 4 =

Nondegenerate Protein-coding sequences Li, Wu and Luo (985) Twofold degenerate Fourfold degenerate V 2 S 2 S 2 S 2 V 2 V 2

Li, Wu and Luo (985) ATG GTC ACT CAT TTA ATA AAT CGG ATA CGG*0 M V T H L I N R I R 000 004 004 002 202 002 002 204 002 204 S 4 ATG GTT ACG CAA CTC ATG ACG AGG ATT CGG*0 M V T Q L M T R I R 000 004 004 002 004 000 004 202 002 204 S 4 V 2 S 0 L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 0.5 S 2 = S 4 = V 0 = V 2 = 0.5 V 4 =

Li, Wu and Luo (985) ATG GTC ACT CAT TTA ATA AAT CGG ATA CGG*0 M V T H L I N R I R 000 004 004 002 202 002 002 204 002 204 S 4 V 4 V 2 S 2 V 2 V 2 V 0 V 2 S 2 S 2 ATG GTT ACG CAA CTC ATG ACG AGG ATT CGG*0 M V T Q L M T R I R 000 004 004 002 004 000 004 202 002 204 S 4 V 4 V 2 S 0 V 4 S 0 V 0 V 4 S 2 S 2 L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2

Li, Wu and Luo (985) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 ts 0 = S 0 / L 0 = ts 2 = S 2 / L 2 = ts 4 = S 4 / L 4 = tv 0 = V 0 / L 0 = tv 2 = V 2 / L 2 = tv 4 = V 4 / L 4 =

Li, Wu and Luo (985) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 ts 0 = S 0 / L 0 = 0.0357 ts 2 = S 2 / L 2 = 0.63 ts 4 = S 4 / L 4 = 0.074 tv 0 = V 0 / L 0 = 0.0357 tv 2 = V 2 / L 2 = 0.63 tv 4 = V 4 / L 4 = 0.48

Kimura s (980) two-parameter model At time t, the probability that the nucleotide in both sequences is the same: A = 2 α B K = = ts A + B ( t) = = I ( t) = 4 + 4 e 8β t 4( α + β )t + 2 e Ancestral sequence t Sequence Sequence 2 8β t 4( α + β )t 8βt + e e tv( t ) = e 4 2 4 ( ) t = ln ln 2 ( 2 ) t = ln 2 β 2 2 2ts ( α + 2 ) t = ln ln 2 β 2ts tv 2tv 4 2 2tv 2 + tv 4 t 2tv

Li, Wu and Luo (985) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 ts 0 = S 0 / L 0 = 0.0357 ts 2 = S 2 / L 2 = 0.63 ts 4 = S 4 / L 4 = 0.074 tv 0 = V 0 / L 0 = 0.0357 tv 2 = V 2 / L 2 = 0.63 tv 4 = V 4 / L 4 = 0.48 A 0 = A 2 = A 4 = B 0 = B 2 = B 4 = K 0 = K 2 = K 4 =

Li, Wu and Luo (985) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 ts 0 = S 0 / L 0 = 0.0357 ts 2 = S 2 / L 2 = 0.63 ts 4 = S 4 / L 4 = 0.074 tv 0 = V 0 / L 0 = 0.0357 tv 2 = V 2 / L 2 = 0.63 tv 4 = V 4 / L 4 = 0.48 A 0 = 0.038 A 2 = 0.2333 A 4 = 0.0878 B 0 = 0.037 B 2 = 0.947 B 4 = 0.757 K 0 = 0.0752 K 2 = K 4 = 0.2635

Li, Wu and Luo (985) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 A 0 = 0.038 A 2 = 0.2333 A 4 = 0.0878 B 0 = 0.037 B 2 = 0.947 B 4 = 0.757 K 0 = 0.0752 K 2 = K 4 = 0.2635 L A 2 2 K S = L2 + 3 + L 4 L 4 K 4 L B 2 2 K A = 2L2 + 3 + L 0 L K 0 0

Li, Wu and Luo (985) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 A 0 = 0.038 A 2 = 0.2333 A 4 = 0.0878 B 0 = 0.037 B 2 = 0.947 B 4 = 0.757 K 0 = 0.0752 K 2 = K 4 = 0.2635 L2 A2 + L4K 4 L2 B2 + L0K 0 K S = = 0.3844 K 0. 337 L A = = 2 2L2 + L4 + L0 3 3

Li (993) & Pamilo and Bianchi (993) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 A 0 = 0.038 A 2 = 0.2333 A 4 = 0.0878 B 0 = 0.037 B 2 = 0.947 B 4 = 0.757 K 0 = 0.0752 K 2 = K 4 = 0.2635 L A + L A K S = + 2 2 4 4 B4 L2 + L4 L B + 0 0 K A = A0 + L0 + L L 2 2 B 2

Li (993) & Pamilo and Bianchi (993) L 0 (nondegenerate) = 28 L 2 (twofold) =5.5 L 4 (fourfold) =3.5 S 0 = 2/2 = S 2 = 5/2 = 2.5 S 4 = 2/2 = V 0 = 2/2 = V 2 = 5/2 = 2.5 V 4 = 4/2 = 2 A 0 = 0.038 A 2 = 0.2333 A 4 = 0.0878 B 0 = 0.037 B 2 = 0.947 B 4 = 0.757 K 0 = 0.0752 K 2 = K 4 = 0.2635 K S L A + L A = B 2 2 4 4 + 4 = L2 + L4 0.343 K A = A L B L 0 0 2 2 0 + = 0 + L + L 2 B 0.34

Homework: Calculate S 0, S 2, S 4, V 0, V 2, and V 4 between human HLA-A and HLA-B genes for the first 240 nucleotides.

Estimation of distance between two protein sequences

* * * * * * * * * * * * * * * M T S L A V T P L I K H S V A S Y T L P C N T The actual number of substitutions per site, K, = 5 / 23 in this case We subdivide tinto npieces; t = The number of substitutions per site per unit time: t n p = K n For each site, the probability that we find xsubstitutions is a binomial probability: b ( x; n, p) = x p x! n! ( n x)! n x ( p)

國立交通大學國立交通大學國立交通大學國立交通大學生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所林勇欣老師林勇欣老師林勇欣老師林勇欣老師 ( ) ( ) ( ) ( )( ) ( ) x K K n x x n x x n x n K n K K x n x n n n K n K x x n n n n p p x n x n p n x b = + = =! 2! 2!!!, ; For each site, the probability that we find xsubstitutions is a binomial probability: When ( )!, ; 0 x e K p n x b e n K t n K x K n expressed as p(x; K), Poisson probability

* * * * * * * * * * * * * * * M T S L A V T P L I K H S V A S Y T L P C N T p ( x; K ) = K e x! x K For each site, the Poisson probability that we find x substitutions I D = p K e 0! K K ( 0; K ) = = e = I = e 0 K The JC69 one-parameter model for nucleotide sequences K = 3 4 ln D 4 3 ( ) K = ln D K = 9 20 ln D 20 9

The substitution rate λ varies among sites according to the gamma distribution among sites: g(λ) 20 8 6 4 2 0 8 6 4 g Γ ( λ) b = Γ ( a) 2 a = λ / V b = λ / V e ( ) a a = t e 0 a λ λ bλ t dt λ a => shape => scale a=0.5 a=.0 a=2.0 2 0 0 0. 0.2 0.3 0.4 0.5 λ

Nonuniform rates K = 9 20 a 20D 9 / a K = a ( ) ) / a D

Li (997) Molecular Evolution

Li (997) Molecular Evolution

PAM matrix (for point-accepted mutations) Yang (2006) Computational Molecular Evolution

PAM matrix (for point-accepted mutations) Yang (2006) Computational Molecular Evolution

PAM matrix (for point-accepted mutations) PAM matrices are based on global alignments of closely related proteins..the PAM is the matrix calculated from comparisons of sequences with no more than % divergence. 2.Other PAM matrices are extrapolated from PAM. 3.This kind of empirical amino acid substitution matrices are also used in alignment of multiple protein sequences. PAM00 for t = substitution per site http://www.ncbi.nlm.nih.gov/education/blastinfo/scoring2.html

BLOSUM matrix (blocks substitution matrix) BLOSUM matrices are based on local alignments..blosum 62 is a matrix calculated from comparisons of sequences with no more than 62% similarity. 2.All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. 3.BLOSUM 62 is the default matrix in BLAST 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives may be more sensitive with a different matrix. http://www.ncbi.nlm.nih.gov/education/blastinfo/scoring2.html

BLOSUM matrix (blocks substitution matrix) Generally speaking... The Blosum matrices are best for detecting local alignments. The Blosum62 matrix is the best for detecting the majority of weak protein similarities. The Blosum45 matrix is the best for detecting long and weak alignments. http://www.ebi.ac.uk/help/matrix.html

Differences between PAM and BLOSUM PAM matrices are based on an explicit evolutionary model (i.e. replacements are counted on the branches of a phylogenetic tree), whereas the BLOSUM matrices are based on an implicit model of evolution. The PAM matrices are based on mutations observed throughout a global alignment, this includes both highly conserved and highly mutable regions. The BLOSUM matrices are based only on highly conserved regions in series of alignments forbidden to contain gaps. http://en.wikipedia.org/wiki/substitution_matrix

Differences between PAM and BLOSUM The method used to count the replacements is different: unlike the PAM matrix, the BLOSUM procedure uses groups of sequences within which not all mutations are counted the same. Higher numbers in the PAM matrix naming scheme denote larger evolutionary distance, while larger numbers in the BLOSUM matrix naming scheme denote higher sequence similarity and therefore smaller evolutionary distance. Example: PAM50 is used for more distant sequences than PAM00; BLOSUM62 is used for closer sequences than Blosum50. http://en.wikipedia.org/wiki/substitution_matrix

Equivalent PAM and Blosum matrices The following matrices are roughly equivalent... PAM00 ==> Blosum90 PAM20 ==> Blosum80 PAM60 ==> Blosum60 PAM200 ==> Blosum52 PAM250 ==> Blosum45 http://www.ebi.ac.uk/help/matrix.html http://www.ncbi.nlm.nih.gov/education/blastinfo/scoring2.html

Homework: Use MEGA to calculate different genetic distances (different models including models for nucleotide, synonymous-nonsynonymous, and amino acids, etc.) for your mitochondrial cytochrome b sequences alignment file. Compare these distances.