Available Data Sources for Epidemiologic Research in China Siyan Zhan, M.D, Ph.D Center of Post-marketing Safety Evaluation, Peking University Health Science Center
Epidemiology and its applications Epidemiology is defined to study of the distribution and determinants of health related states or events in specified populations, and the application of this study to prevent and control of health problems.(john Last,1988). Applications Describe th health of a population Discover the determinants of disease Prevent/Control disease Planning and evaluating health services
Key data for Epidemiology Individual-based approach (Numerator) cases Population-based approach (Denominator) incidence rate Causality (Control) Relative risk
How to obtain key data for Epi Specific studies Cross-sectional study: prevalence Sampling survey Cohort study: incidence Prospective Retrospective Surveillance or registries: specific rate Vital Statistics: birth rate and mortality Medical Records Case series: fatality Systematic review and meta analysis
全球基金 150 万的投入, 现场随访 2 年, 数据整理 分析近 1 年, 论文撰写到发表 1 年 Classical prospective cohort Take long time Big budget
Big Data Emerging 2008 2011 2012 6
Application of Big data to health care 7
(Epidemiology 2015,26(3):390-394)
10
Healthcare Databases in China Vital Statistics and Surveillance Medical Insurance Data Electronic Medical Records Discharge Summary Reports
Vital Statistics-Annals General epidemiological data: official statistics
Surveillance Mortality surveillance Infectious disease surveillance Chronic diseases (CVD, cancer) Risk factor surveillance Hospital acquired infections surveillance ADR surveillance
Mortality Surveillance The national disease surveillance system-cause of death
Infectious Disease Surveillance Notifiable infect ious diseases (categories A, B, C), vital statistics
Chronic Disease Surveillance http://www.chinacdc.cn/gwswxx/mbsqc/201109/t20110906_52141.htm
Risk Factors Surveillance
Cardiovascular Disease Surveillance http://www.healthyheart-china.com/website/information/
ADR Surveillance
Healthcare Databases in China Vital Statistics and Surveillance Medical Insurance Data Electronic Medical Records Discharge Summary Reports
Medical Insurance Database
Medical Insurance Database
Medical insurance databases Region:China Population Size: 8.5million Diagnosis Data and Code: Free text Procedure Data and Code :No Laboratory Information :No Drug Information :Drug name/dosage/prescription time/cost Cost Data:Cost for every prescription Database Owner:China Medical Insurance Research Council Accessibility:Being negotiated
Patients who prescribed with atorvastatin in database n=262655 Prescribed with both the atorvastatin and marker drugs n=11494 Eligible atorvastatinmarker drug pair for PSSA n=11404 Eligible atorvastatinmarker drug pair for sensitivity analysis; n=3646 The time prescribed the drugs was not in the time frame; n=18 Prescribe the atorvastatin and maker drug on the same day; n=72 Atorvastatin are associated with an increased risk of liver injury. Age is a potential risk factor. We provided the evidence from big data to support existing hypothesis.
Healthcare Databases in China Vital Statistics and Surveillance Medical Insurance Data Electronic Medical Records Discharge Summary Reports
EMR-Data Structure (NHFPC,2013) 5 main areas Summary of records Outpatient (emergency) records Inpatient records Transfer records Institute information Several independent but linkable notes Main data is based on notes, 17 different datasets
Inpatients EMR in 20 Hospitals Region:China Population Size: 2 millions Diagnosis Data and Code: ICD-9/10 Procedure Data and Code :No Laboratory Information :Free text Drug Information :Drug name/dosage/prescription time/cost Cost Data:Cost for category Database Owner:China Academy of Chinese Medical Sciences (CACMS) Accessibility:Being negotiated
Example:Utilization of EMR Did AMI guideline change the clinical practice? 第二部分选取 AMI 为实例进行探索性研究 Objectives: 1. Analyze whether current treatments for AMI comply with the guideline 2. Evaluate the contributing factors for incompliance 3. Investigate the impact on patient outcomes following clinical guidelines
Study Population/Data Included in the Multiple records per ID Analysis Inpatient records n= 7616 Dx n= 73246 Rx n= 795036 Lab n= 1238364 Excluding 137 duplicate records 7479 remained Disease identification 9140 records Transpose Med identification 24 h post-adm Drug identification Transpose 7479 records 7050 records 5717 records Impute age and sex Exclude: 3 hospitals with <30 pts(n=15) Age <18(n=2) Data before 2005(n=96) 4827 records 4714 records Exclude : missing age and sex n=890)
Advantages of the analysis based on EMR data Reflect current practice using real world data Obtain large data in a time and cost efficient manner Limitations of the analysis based on EMR data Administrative data, not for the research purpose, missing data, no coding
Healthcare Databases in China Vital Statistics and Surveillance Medical Insurance Data Electronic Medical Records Discharge Summary Reports
Discharge Summary Reports
Discharge Summary Reports
Discharge Summary Reports
Discharge Summary Reports
Discharge Summary Reports Region:China Population Size: 30millions Diagnosis Data and Code: ICD-10 Procedure Data and Code :ICD-9-CM-3 Laboratory Information :No Drug Information :No Cost Data:Cost for category Database Owner:Office of the Ministry of Health Hospital accreditation Accessibility:All data, but permit
Systematic review of risk adjustment for medical quality indicators and risk adjustment tool of comorbidity development on the basis of discharge summary report
Pharmacoepi 38
A Systematic Review of Publications Based on Electronic Medical Records or Medicare Systems and Related Databases in China Yang Zhang, Yinchu Cheng, Yuji Feng, Kui Huang, Xiaofeng Zhou, Siyan Zhan. A certain number of published articles based on EMRs or Medicare systems in China. Lack of standard coding and use of free text in EMDs and CDs have presented challenges for conducting high quality studies.
挑战与机遇并存, 需要携手前行! 北京市海淀区学院路 38 号北京大学公共卫生学院 408 室邮编 :100191 电话 :010-82801191 转 1055 传真 :010-82805162 邮箱 :siyan-zhan@bjmu.edu.cn 网址 :http://cpse.bjmu.edu.cn 谢谢 北京大学医学部药品上市后安全性研究中心 40
大数据使用的问题 数据库链接的前提 现有登记数据库有质量保证, 且可及 链接的技术方法和法律 伦理保证 标准化 规范化 质量控制 疾病 症状 体征等的定义, 检验 操作要标准和规范, 有质量控制 利益共享与利益分配 考虑制度设计, 形成研究者利益共同体, 构建激励机制, 做了贡献的人获益
打破壁垒, 链接数据孤岛 42
北京市心血管疾病防控大数据平台 43
分布式网络 (distributed network)
数据映射 研究结果 抗结核用药书写情况 :14 种药物,104 种书写形式 可映射书写形式 83 种, 可映射百分比 79.8% 药物 45 45 书写形式 吡嗪酰胺 吡嗪酰胺 Z Z. PZA PZA 吡嗪酰胺片 吡嗪酰胺片( 免费 ) 乙胺丁醇 乙胺丁醇 E E. EMB EMB E- EmB 乙胺丁醇片 乙胺丁醇片( 免费 ) 卡那霉素 卡那霉素 卡那霉素针 ( 免费 ) 卡那霉素针 KM Km K Km. km 丁胺卡那霉素丁胺卡那霉素 阿米卡星针 ( 免费 ) 阿米卡星针 Am AM Am. 卷曲霉素 卷曲霉素 卷曲霉素针 卷曲霉素针 ( 免费 ) CM Cm cm Cm. 左氧氟沙星 左氧氟沙星 左氧氟沙星片 左氧氟沙星片 ( 免费 ) Lfx LFX lfx L Lfx. V V. E-V V. 氧氟沙星 氧氟沙星 氧氟沙星片 氧氟沙星片 ( 免费 ) OFX Ofx Ofx. ofx 莫西沙星 莫西沙星 莫西沙星片 莫西沙星片 ( 免费 ) Mfx Mfx. MFX mfx 环丝氨酸 环丝氨酸 环丝氨酸胶囊 环丝氨酸胶囊 ( 免费 ) CS Cs Cs. 丙硫异烟胺 丙硫异烟胺 丙硫异烟胺片 丙硫异烟胺片 ( 免费 ) PTO Pto Pto. pto Th pto. 对氨基水杨酸对氨基水杨酸 对氨基水杨酸钠 对氨基水杨酸颗粒 ( 免费 ) 对氨基水杨酸 颗粒 PAS pas Pas P P. Pa 阿莫西林 / 克拉维酸 阿莫西林 / 克拉维酸 ( 免费 ) 阿莫西林/ 克拉维酸 clv Clv CLV Amx/Clv Amx 克拉霉素 克拉霉素 克拉霉素 ( 免费 ) clr CLR Clr 利福布汀 利福布汀 RB Rb rb
通用数据模型 (Common Data Model) 周晓枫, 刘青, 蔡兵. 全球上市后药品主动监测系统概况. 药物流行病学杂志. 2012,21(7):338-42