Mascot introduction and application for protein identification 講員 : 蔡沛倫鎂陞科技股份有限公司 Mass Solutions Technology 時間 : 2011 年 07 月 21 日 地點 : 台灣大學
蛋白質資料庫搜尋鑑定 (Mascot) Mascot 蛋白質鑑定的技術原理介紹 各項使用參數的意義及建議設定值 鑑定分析結果報告呈現 新版報告呈現方式
What is MASCOT? MASCOT 是一套利用質譜的圖譜與資料庫序列進行比對來鑑定蛋白質的軟體 MASCOT 提供的比對方式有以下三大類 : 1. Peptide mass fingerprint 2. Sequence Query 3. MS/MS Ion search MASCOT 預設可使用的資料庫如下 : 1.MSDB 2.NCBInr 3.SwissProt 4.dbEST ( nucleic acid database) MASCOT 提供使用者自建 database 比對的功能, 方便使用者進行 recombinant protein searching
Database Example >gi 386828 gb AAA59172.1 insulin [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPK TRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLE NYCN >gi 1351907 sp P02769 ALBU_BOVIN Serum albumin precursor (Allergen Bos d 6) (BSA) MKWVTFISLLLLFSSAYSRGVFRRDTHKSEIAHRFKDLGEEHFKGLVLIAFSQYL QQCPFDEHVKLVNELTEFAKTCVADESHAGCEKSLHTLFGDELCKVASLRETYG DMADCCEKQEPERNECFLSHKDDSPDLPKLKPDPNTLCDEFKADEKKFWGKYL YEIARRHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPKIETMREKVLASS ARQRLRCASIQKFGERALKAWSVARLSQKFPKAEFVEVTKLVTDLTKVHKECCH GDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAEVEKDAIPENLP PLTADFAEDKDVCKNYQEAKDAFLGSFLYEYSRRHPEYAVSVLLRLAKEYEATL EECCAKDDPHACYSTVFDKLKHLVDEPQNLIKQNCDQFEKLGEYGFQNALIVRY TRKVPQVSTPTLVEVSRSLGKVGTRCCTKPESERMPCTEDYLSLILNRLCVLHE KTPVSEKVTKCCTESLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTE KQIKKQTALVELLKHKPKATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPK LVVSTQTALA
MASCOT 網頁免費版 : http://www.matrixscience.com/search_form_select.html In House 版 : http://localhost/search_form_select.html
Peptide mapping 分析 for Protein ID Peptide mass fingerprinting Digested proteins
Peptide Mass Fingerprint
Peptide Mass Fingerprint No MS/MS information
Sequence Query 1454.4 ions(b-610,707,804,1086) ions(y- 2909) ions(2106,2632,2545)
Sequence Query
LC/ESI/MS/MS for Protein ID Digested proteins Waters CapLC pump MASCOT results MS/MS data processing and database searching
LC/MS/MS Peak List 範例.pkl format charge intensity Fragment ion m/z m/z
LC/MS/MS Peak List 範例.mgf format intensity Fragment ion m/z
MS/MS Ions Search Search Parameters database taxonomy enzyme missed cleavages fixed modifications variable modifications protein MW protein pi estimated mass measurement error
Database search
Probability based search algorithm Typically 95% confidence level is accepted
Each MSMS (each query number) has an individual corresponding table
Bold: the first time a particular match appears in the report Red: the first ranking peptide match appears
Protein modifications
Export search result For the publication guideline
What s the difference between public web version and in-house version? 1.Query number: web:10mb or 1200 spectra in a single MS/MS search in-house: no limitation 2.Searching Databases: web:adding databases is not available in-house:users can setup their own databases 3.Enzyme: web:can t perform no enzyme search in-house:no limitation 4.modifications: web:3 variable modification in-house:no limitation
What s the difference between public web version and in-house version? 5.Quantitation: web:can only use default quantitation methods, and can t co-operate with MASCOT Distiller in-house: no limitation 6.Security control: web:n/a in-house:administrator can config all user s rights to use MASCOT in-house server, and search results handling
Mascot 進階功能
Search Result
Search Result
何時進行 error tolerance search? 當進行 MS/MS Ions search 時發現有許多 MS/MS spectrum 無法比對到任何 peptide, 而且這些 MS/MS spectrum 的品質很好時, 可能的原因如下 : 1. 低估誤差值 ( e.g. 質譜誤差過大時 ) 2.Precursor ion 的 charge state 判斷錯誤 3. 使用的蛋白脢沒有特異性或不佳 4. 未知的 PTM 或化學修飾 5. 資料庫中不存在這些 peptide 序列 MASCOT error tolerance search 功能則是針對後三點進行資料庫比對 PTM: post-translational modification
Error tolerant search
Automated decoy database search 在龐大的資料比對中, 使用 randomized 的 database 進行比對, 用以增加比對結果的可信度 Reverse database search or Random database search False positive rate= no. of id. In decoy database search no. of id. In normal database search
Automated decoy database search 點選後可以看 decoy database 比對的結果 以這張結果為例, 利用 decoy database 進行比對分數達 homology 的 false-postive rate 為 4.05%, 比對分數達 identity 的 falsepostive rate 為 0%
New features in Mascot Protein Family Report Percolator Support Search Multiple Databases Export search results as mzidentml Batch automate quantitation with Mascot Daemon Support for mzml format peak lists 64-bit executables for Windows Result report caching makes all large reports faster to load Support for Perl 5.10 Multiplex quantitation now supports isobaric precursors Reports now show counts of distinct sequences as well as counts of matched spectra... and lots more
New reports load much more rapidly
Protein Family Report Protein Family Report
Protein Family Report
Protein Family Report
Protein Family Report
Protein Family Report (Quantitation)
Protein Family Report (Unassinged)
Protein Family Report
Protein Family Report
Protein Family Report
Protein Family Report AJAX (shorthand for asynchronous JavaScript and XML)
New reports load much more rapidly Search of 114,943 ms-ms spectra against a database with 1,319,480 sequences Mascot 2.2: 35 minutes to load report No progress reports 430Mb memory for 1487 proteins Mascot 2.3 dramatic improvement
Percolator Percolator is an algorithm that uses semi-supervised machine learning to improve the discrimination between correct and incorrect spectrum identifications. PSM=Peptide Spectrum Match FDR=False Discovery Rate SVM=Support Vector Machine
Quantitation
Quantitation
Search Multiple Databases
Select multiple fasta files for searching Why: Best to concatenate a few of your own sequences onto the end of SwissProt or NCBInr Want to search SwissProt and Trembl, or a species database and contaminants. Concatenating fasta files not easy because: Files are often huge May need to also update the reference file Different accession formats
Select multiple fasta files for searching
Thanks for your attention 聯絡資訊 鎂陞科技股份有限公司 221 台北縣汐止市新台五路一段 79 號 5 樓 Tel: 02-26989511; Fax: 02-26989512 info@mass-solutions.com.tw http://mass-solutions.com.tw