[1]俞 奎,王 浩,姚宏亮.动态影响图模型研究[J].智能系统学报,2008,3(02):159-166.
 YU Kui,WANG Hao,YAO Hong-liang.A dynamic influence diagram for dynamic decision processes[J].CAAI Transactions on Intelligent Systems,2008,3(02):159-166.
点击复制

动态影响图模型研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第3卷
期数:
2008年02期
页码:
159-166
栏目:
出版日期:
2008-04-25

文章信息/Info

Title:
A dynamic influence diagram for dynamic decision processes
文章编号:
1673-4785(2008)02-0159-08
作者:
俞  奎12王  浩2姚宏亮2
1.常州纺织服装职业技术学院,江苏常州213164;
2.合肥工业大学计算机与信息学院,安徽合肥230009
Author(s):
YU Kui12 WANG Hao2 YAO Hong-liang2
1.Department of Computer Science, Institute of Textile and Garment of Changzho u, Changzhou 213164, China;
2. School of Computer and Information, Hefei Univers ity of Technology, Hefei 230009, China
关键词:
动态贝叶斯网络 影响图 马尔可夫决策过程 部分可观察马尔可夫决策过程 动态影响图
Keywords:
dynamic Bayesian networks influence diagrams Markov decision process partially observable Markov decision process dynamic influence diagram
分类号:
TP181
文献标志码:
A
摘要:
部分可观察马尔可夫决策过程在策略空间和状态空间上的计算复杂性,使求解其一个最优策略成为NPhard难题.为此,提出一种动态影响图模型来建模不确定环境下的 A gent动态决策问题.动态影响图模型以有向无环图表示系统变量之间的复杂关系.首先,动态影响图利用动态贝叶斯网络表示转移模型和观察模型以简化系统的状态空间;其次,效用函数以效用结点的形式清晰地表示出来,从而简化系统效用函数的表示;最后,通过决策结点表示系统的行为来简化系统的策略空间.通过实例从3个方面和POMDP模型进行了比较,研究的结果表明,动态影响图模型为大型的POMDP问题提供了一种简明的表示方式,最后在Rob ocup环境初步验证了该模型.
Abstract:
Computational complexities in strategy space and state space make the partially observable Markov decision process (POMDP) an NPhard problem. Therefore, in this paper, a dynamic influence diagram is proposed to model the decisionmaking problem with a single agent, in which a directed acyclic diagram is used to express the complex relationships between systematic variables. Firstly, a dynamic Bayesian network is used to represent the transition and observation models so as to reduce the state space of the system. Secondly, in order to reduce the representational complexity of the utility function, it is expressed in terms of utility nodes. Finally, the actions of the system are represented with decision nodes to simplify the strategy space. The dynamic influence diagram is compared with the POMDP using these three aspects. Our research indicates that a dynamic influence diagram provides a simple way to express POMDP problems. Experiments in the Robocup environment verified the effectiveness of the proposed model.

参考文献/References:

[1] KAELB LING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey[J]. Journal of Artificial Intelligence Research, 1996,4: 237285.
[2]POUPART P. Exploiting structure to efficiently solve large scale partiall y observable markov decision processes\[D\]. Toronto:University of Toronto, 2005.
[3]KAELBLING L P, LITTMAN M L, CASSANDRA A R. Planning and acting in partiall y observable stochastic domains[J]. Artificial Intelligence, 1998, 101: 99134. 
[4]MICHAEL J, YISHAY M, ANDREW Y. Ng approximate planning in la rg e POMDPs via reusable trajectories[C]// Advances in Neural Information Processing Systems. [S.l.] Cambridge:MIT Press,1999:10011007.[5]NICHOLAS R, GEOFFREY J. Gordon, sebastian thrun: finding approximate POM DP solutions through belief compression[J]. J Artif Intell Res(JAIR), 2005 , 23: 140.
[6]PAPADIMITRIOU C H, TSITSIKLIS J N. The complexity of Markov decisio n processes\[J\]. Mathematics of Operations Research, 1987, 12(3):441450.
[7]LUSENA C, GOLDSMITH J, MUNDHENK M. Nonapproximabilit y results for partially observable Markov decision processes[J]. Journal of Arti ficial Intelligence Research, 2001, 14:83103.
[8]DEAN T,KANAZAWA K. Probabilistic temporal reasoning[C]// National Conference on Artificial Intelligence. Washington:AAAI Press, 1988, 524528.
[9]RONALD A, HOWARD, JAMES E. Readings on the principles and app li cations of decision analysis [M]. [S.l.]: Strategic Decision Group, 1 984.
[10]BOUTILIER C, DEAN T, HANKS S. Decisiontheoretic planning: s tructural a ssumptions and computational leverage[J]. Journal of Artificial Intelligence R esearch, 1999, 11:194.
[11]JOSEPH A, TATMAN, ROSS D, et al. Dynamic programming and influence di agrams[J]. IEEE Transations on Systems, Man, and Cybernetics, 1990, 20(2):365 379.
[12]BRENDA N G. Avi Pfeffer, Factored Particles for Scalable M on itoring[C]// Uncertainty in Artificial Int elligence Morgan Kaufmann. San Francisco, USA,2002:370377.
[13]PRASHANT D, PIOTR G. A particle filtering based appro ac h to approximating interactive POMDPs[C]// P Nati onal Conference on Artificial Intelligence, Menlo Park.AAAI Press,2005:969974.

备注/Memo

备注/Memo:
收稿日期:2007-06-20.
基金项目:
国家自然科学基金资助项目(60575023,60705015);
安徽省自然科学基金资助项目(07041206 4).
作者简介:
俞 奎,男,1979年生,硕士研究生,主要研究方向为贝叶斯网络建模与推理、Agent技术,发表学术论文7篇. 
王 浩, 男, 1962年生, 教授,博士,合肥工业大学计算机与信息学院副院长,主要研究方向为人工智能、数据挖掘、面向对象技术等,中国自动化学会机器人竞赛工作委员会委员、安徽省高校中青年骨干教师.先后参加国家自然科学基金、国家教委博士点基金等10多项课题研究,获安徽省科技进步三等奖2项.目前主持国家自然科学基金和安徽省自然科学基金等多项课题.
 姚宏亮,男,1972年生,副教授,博士,主要研究方向为贝叶斯网络、Agent技术,发表学术论文10余篇.
通讯作者:俞 奎.E-mail:ykui713@hotmail.com.
更新日期/Last Update: 2009-05-11