Natural Language Question Answering over Large- scale Linked Data Kang Liu National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academy of Sciences 8/30/2014 Kun Ming
Knowledge Graph: Linked Data 超过 5.7 亿实体超过 18 亿条事实 ( 关系 ) 百度知 心 2,653,873 概念 搜狗知 立 方
More Linked Data
How to access these Linked Data Which software has been developed by organizations founded in California, USA? Experts or Developer SELECT DISTINCT?uri WHERE {?uri rdf:type dbo:software.?uri dbo:developer?x1.?x1 rdf:type dbo:company.?x1 dbo:foundationplace dbr:california. } SPARQL 2(Integer) version Android {Answer} developer Google foundationplace Apache_License license type programmedin California Java Software developer Oracle Linked Data
How to access these Linked Data Which software has been developed by organizations founded in California, USA? QA System SELECT DISTINCT?uri WHERE {?uri rdf:type dbo:software.?uri dbo:developer?x1.?x1 rdf:type dbo:company.?x1 dbo:foundationplace dbr:california. } SPARQL 2(Integer) version Android {Answer} developer Google foundationplace Apache_License license type programmedin California Java Software developer Oracle Linked Data
Pipeline Process Which software has been developed by organizations founded in California, USA? Question software, developed by, organizations, founded in, California Phrase dbo:software, dbo:developer, dbo:company, dbo:foundationplace, dbr:california. Semantic Item <dbo:software, dbo:developer, dbo:company> <dbo:company, dbo:foundationplace, dbr:california> Semantic Triple SELECT DISTINCT?uri WHERE {?uri rdf:type dbo:software.?uri dbo:developer?x1.?x1 rdf:type dbo:company.?x1 dbo:foundationplace dbr:california. } SPARQL
Challenges Manually designed patterns Phrase detenction rules NN NNP:Entity NN:Class Property VB:Property Semantic item grouping patterns (syntactic patterns) Verb and its arguments Adjectives and its arguments Prepositionally modified tokens and its objects (?x, dbo:productor, dbo:film)
Challenges Manually designed patterns Phrase detenction rules NN NNP:Entity NN:Class Property VB:Property Semantic item grouping patterns (syntactic patterns) Verb and its arguments Adjectives and its arguments Prepositionally modified tokens and its objects Can we automatically learn rules or (?x, patterns? dbo:productor, dbo:film)
Challenges Ambiguities Phrase Detection: { California }, { California, USA } Which software has been developed by organizations founded in California, USA? Phrase Mapping: California: {California_State}, {California_Film} Semantic Item Grouping: {dbo:software, dbo:developer, dbo:company} {dbo:software, dbo:foundationplace, dbo:company}
Challenges Ambiguities Phrase Detection: { California }, { California, USA } Which software has been developed by organizations founded in California, USA? Phrase Mapping: California: {California_State}, {California_Film} Semantic Item Grouping: {dbo:software, dbo:developer, dbo:company} {dbo:software, dbo:foundationplace, dbo:company} Can we make joint inference?
Our Solution Pattern Learning Meta patterns Not only verb and its arguments All syntactic paths maybe possible Joint Inference First-order logic formulas Markov Logic Network p1(a,b) p3(a,a) p2(a) p2(b) p3(b,b) 1 φi p( y) = exp( w fc ( y)) Z φ i ( φi, wi ) L c C n i p4(a) p1(b,a) p4(b)
Predicates Hidden Predicates Observed Predicates hasphrase(i) hasresource(i, j) hasrelation (ri, rj, rr) The ith candidate phrase has been chosen The ith phrase is mapped to the jth semantic item The semantic item ri and rj can be grouped together with the relation type rr
Predicates Hidden Predicates Observed Predicates hasphrase(i) hasresource(i, j) hasrelation (ri, rj, rr) The ith candidate phrase has been chosen The ith phrase is mapped to the jth semantic item The semantic item ri and rj can be grouped together with the relation type rr
Hard Formulas Formulas
Soft Formulas Formulas
Soft Formulas Formulas
Framework 问题预处理 : 问题类型 Focus 去除 无 用词等 问句 In which movies directed by Garry Marshall was Julia Roberts starring? DBpedia Wikipedia Word2vec Reverb&Patty 统计信息 短语检测 & 资源映射 & 特征提取 问句 MLN 模型谓词和公式 MLN 联合消岐 构造查询图 资源映射候选结构匹配候选 资源映射结果结构匹配结果 查询图 生成查询 SPARQL 语句 10
Framework 问题预处理 : 问题类型 Focus 去除 无 用词等 问句 In which movies directed by Garry Marshall was Julia Roberts starring? movies directed by Garry Marshall was Julia Roberts starring in DBpedia Wikipedia Word2vec Reverb&Patty 统计信息 短语检测 & 资源映射 & 特征提取 问句 MLN 模型谓词和公式 MLN 联合消岐 构造查询图 资源映射候选结构匹配候选 资源映射结果结构匹配结果 查询图 生成查询 SPARQL 语句 10
Framework 问题预处理 : 问题类型 Focus 去除 无 用词等 问句 In which movies directed by Garry Marshall was Julia Roberts starring? movies directed by Garry Marshall was Julia Roberts starring in DBpedia Wikipedia Word2vec Reverb&Patty 统计信息 MLN 模型谓词和公式 短语检测 & 资源映射 & 特征提取 MLN 联合消岐 构造查询图 问句 资源映射候选结构匹配候选 资源映射结果结构匹配结果 >hascandidateresource >hasheadpos movies" "dbo:film " movies " "NNS "directed by" "dbo:director "directed" "VBN" "by" "dbo:publisher" "Garry Marshall " "NNP "Garry Marshall " "dbr:garry_marshall >hasdeppath >hasresourcetype " movies " "directed by" "nsubj- prep" "dbo:film" "Concept "directed by" "Garry Marshall" "pobj- nn "dbo:director" "Property >istypecompatible "dbr:garry_marshall""concept "dbo:film" "dbo:director" "1_1" "dbo:director" "dbr:garry_marshall " "2_1 查询图 生成查询 SPARQL 语句 10
Framework 问题预处理 : 问题类型 Focus 去除 无 用词等 问句 In which movies directed by Garry Marshall was Julia Roberts starring? movies directed by Garry Marshall was Julia Roberts starring in DBpedia Wikipedia Word2vec Reverb&Patty 统计信息 MLN 模型谓词和公式 短语检测 & 资源映射 & 特征提取 MLN 联合消岐 构造查询图 问句 资源映射候选结构匹配候选 资源映射结果结构匹配结果 >hascandidateresource >hasheadpos movies" "dbo:film " movies " "NNS "directed by" "dbo:director "directed" "VBN" "by" "dbo:publisher" "Garry Marshall " "NNP "Garry Marshall " "dbr:garry_marshall >hasdeppath >hasresourcetype " movies " "directed by" "nsubj- prep" "dbo:film" "Concept "directed by" "Garry Marshall" "pobj- nn "dbo:director" "Property >istypecompatible "dbr:garry_marshall""concept "dbo:film" "dbo:director" "1_1" "dbo:director" "dbr:garry_marshall " "2_1 >hasresource " movies " "dbo:film" "directed by" "dbo:director" "Garry Marshall " "dbr:garry_marshall" "Julia Roberts" "dbr:julia_roberts" "starring in" "dbo:starring > hasrelation "dbo:film" "dbo:director" "1_1" "dbo:film" "dbo:starring" "1_1" "dbo:director " "dbr:garry_marshall " "2_1" "dbr:julia_roberts " "dbo:starring " "1_2" 查询图 生成查询 SPARQL 语句 10
Framework 问题预处理 : 问题类型 Focus 去除 无 用词等 问句 In which movies directed by Garry Marshall was Julia Roberts starring? movies directed by Garry Marshall was Julia Roberts starring in DBpedia Wikipedia Word2vec Reverb&Patty 统计信息 MLN 模型谓词和公式 短语检测 & 资源映射 & 特征提取 MLN 联合消岐 构造查询图 问句 资源映射候选结构匹配候选 资源映射结果结构匹配结果 >hascandidateresource >hasheadpos movies" "dbo:film " movies " "NNS "directed by" "dbo:director "directed" "VBN" "by" "dbo:publisher" "Garry Marshall " "NNP "Garry Marshall " "dbr:garry_marshall >hasdeppath >hasresourcetype " movies " "directed by" "nsubj- prep" "dbo:film" "Concept "directed by" "Garry Marshall" "pobj- nn "dbo:director" "Property >istypecompatible "dbr:garry_marshall""concept "dbo:film" "dbo:director" "1_1" "dbo:director" "dbr:garry_marshall " "2_1 >hasresource " movies " "dbo:film" "directed by" "dbo:director" "Garry Marshall " "dbr:garry_marshall" "Julia Roberts" "dbr:julia_roberts" "starring in" "dbo:starring > hasrelation "dbo:film" "dbo:director" "1_1" "dbo:film" "dbo:starring" "1_1" "dbo:director " "dbr:garry_marshall " "2_1" "dbr:julia_roberts " "dbo:starring " "1_2" 查询图 生成查询 SPARQL 语句 10
Framework 问题预处理 : 问题类型 Focus 去除 无 用词等 问句 In which movies directed by Garry Marshall was Julia Roberts starring? movies directed by Garry Marshall was Julia Roberts starring in DBpedia Wikipedia Word2vec Reverb&Patty 统计信息 MLN 模型谓词和公式 短语检测 & 资源映射 & 特征提取 MLN 联合消岐 构造查询图 问句 资源映射候选结构匹配候选 资源映射结果结构匹配结果 >hascandidateresource >hasheadpos movies" "dbo:film " movies " "NNS "directed by" "dbo:director "directed" "VBN" "by" "dbo:publisher" "Garry Marshall " "NNP "Garry Marshall " "dbr:garry_marshall >hasdeppath >hasresourcetype " movies " "directed by" "nsubj- prep" "dbo:film" "Concept "directed by" "Garry Marshall" "pobj- nn "dbo:director" "Property >istypecompatible "dbr:garry_marshall""concept "dbo:film" "dbo:director" "1_1" "dbo:director" "dbr:garry_marshall " "2_1 >hasresource " movies " "dbo:film" "directed by" "dbo:director" "Garry Marshall " "dbr:garry_marshall" "Julia Roberts" "dbr:julia_roberts" "starring in" "dbo:starring > hasrelation "dbo:film" "dbo:director" "1_1" "dbo:film" "dbo:starring" "1_1" "dbo:director " "dbr:garry_marshall " "2_1" "dbr:julia_roberts " "dbo:starring " "1_2" 生成查询 查询图 SPARQL 语句 10 SELECT DISTINCT?uri WHERE {?uri rdf:type dbo:film.?uri dbo:starring res:julia_roberts.?uri dbo:director res:garry_marshall. }
Experiments Questions three collections of questions from QALD QALD1, QALD3, QALD4 Linked Data: DBpedia, YAGO MLN: thebeast toolkit inference algorithm: cutting plane approach[3] weights learning algorithm: MIRA
Effect of Pattern Learning
Effect of Joint Inference
Effect of Joint Inference
Ours vs. state- of- the- art
Conclusion and Future work Pattern learning is needed for parsing a question over large- scale linked data Joint inference can effective for improving the performance of natural language question answering Scaled up to multiple interlinked knowledge bases Labeled data is insufficient to build up a robust model More robust solutions to find the implicit properties in questions
Reference: Questioning Answering over Linked Data Using Markov First- order Logic, To appear in Proceedings of EMNLP 2014, Doha, Qatar, October, 25-29 Thanks Email: kliu@nlpr.ia.ac.cn : 刘康 _ 自动化所 Homepage: http://www.nlpr.ia.ac.cn/cip/liukang.htm