<-上一篇/Previous Article 下一篇/Next Article->

[1]张小川,唐艳,梁宁宁.采用时间差分算法的九路围棋机器博弈系统[J].智能系统学报,2012,7(3):278-282.
　ZHANG Xiaochuan,TANG Yan,LIANG Ningning.A 9×9 Go computer game system using temporal difference[J].CAAI Transactions on Intelligent Systems,2012,7(3):278-282.

点击复制

采用时间差分算法的九路围棋机器博弈系统

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 7 期数: 2012年第3期页码: 278-282 栏目: 学术论文—智能系统出版日期: 2012-06-25

Title:: A 9×9 Go computer game system using temporal difference

文章编号:: 1673-4785(2012)03-0278-05

作者:: 张小川，唐艳，梁宁宁; 重庆理工大学计算机科学与工程学院，重庆 400054

Author(s):: ZHANG Xiaochuan, TANG Yan, LIANG Ningning; College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

关键词:: 机器博弈; 九路围棋; 围棋机器博弈; 时间差分算法

Keywords:: computer game; 9×9 Go; Go computer game; temporal difference

分类号:: TP31

文献标志码:: A

摘要:: 围棋机器博弈是机器博弈中重要的分支之一，其庞大的博弈空间给机器博弈研究者带来了巨大挑战.目前围棋机器博弈多采用静态估值搜索与蒙特卡洛树搜索，故将时间差分算法引入至九路围棋机器博弈系统中，提出基于时间差分算法的围棋机器博弈系统模型，该博弈系统具有一定的自学习能力，能在不断的对弈中逐步提高博弈能力.通过与采用αβ搜索算法的博弈系统进行实际对弈，证明了该方法的可行性.

Abstract:: Computer Go is an important branch of computer games and presents great challenges to computer game researchers due to its need for huge game space. Presently, the static evaluation method and the MonteCarlo tree search method are widely used in Go computer games. In this paper, a temporal difference algorithm was introduced to the 9×9 Go computer game system which gave it selflearning capability, thereby improving the game levels as a result of the continuous training. Through playing chess with a system which adopts an αβ algorithm, the new method was proven to be effective.

参考文献/References:: ［1］张聪品，刘春红，徐久成．博弈树启发式搜索的αβ剪枝技术研究［J］．计算机工程与应用, 2008, 44(16): 5455, 97.
ZHANG Congpin, LIU Chunhong, XU Jiucheng. Research on alphabeta pruning of heuristic search in gameplaying tree［J］. Computer Engineering and Applications, 2008,44(16): 5455, 97.
［2］刘知青，李文峰．现代计算机围棋基础［M］．北京：北京邮电大学出版社, 2011: 6380.
［3］GELLY S, WANG Yizao, MUNOS R, et al. Modification of UCT with patterns in MonteCarlo Go［R/OL］. ［20111015］. http://219.142.86.87/paper/RR6062.pdf.
［4］GELLY S, WANG Yizao. Exploration exploitation in Go: UCT for MonteCarlo Go［C/OL］. ［20111015］. http://wenku.baidu.com/view/66c2edd6b9f3f90f76c61bc0.html.
［5］张汝波，周宁，顾国昌，等．基于强化学习的智能机器人避碰方法研究［J］．机器人, 1995, 21 (3): 204209.
ZHANG Rubo, ZHOU Ning, GU Guochang, et al. Reinforcement learning based obstacle avoidance learning for intelligent robot［J］. Robot, 1995, 21 (3): 204209.
［6］沈晶，顾国昌，刘海波．基于免疫聚类的自动分层强化学习方法研究［J］．哈尔滨工程大学学报, 2007, 28(4): 423428.
SHEN Jing, GU Guochang, LIU Haibo. Hierarchical reinforcement learning with an automatically generated hierarchy based on immune clustering［J］. Journal of Harbin Engineering University, 2007, 28(4): 423428.
［7］BAE J, CHHATBAR P, FRANCIS J T, et al. Reinforcement learning via kernel temporal difference［C］//Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Boston, USA， 2011: 56625665.
［8］SUTTON R S. Learning to predict by the methods of temporal difference［J］. Machine Learning, 1988, 3(1): 944.
［9］KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey［J］. Journal of Artificial Intelligence Research, 1996, 4: 237285.
［10］阿培丁．机器学习导论［M］．范明，昝红英，牛常勇，译．北京：机械工业出版社, 2009: 372390.
［11］SUTTON R S, BARTO A G. Reinforcement learning: an introduction［M］. Cambridge, USA: The MIT Press, 1997.
［12］聂卫平，冯大树．聂卫平围棋道场［M］．北京：北京体育大学出版社, 2004.
［13］徐长明，马宗民，徐心和，等．面向机器博弈的即时差分学习研究［J］．计算机科学, 2010, 37(8): 219224.
XU Changming, MA Zongmin, XU Xinhe, et al. Study of temporal difference learning in computer games［J］. Computer Science, 2010, 37(8): 219224.

相似文献/References:: [1]徐长明,南晓斐,王骄,等.中国象棋机器博弈的时间自适应分配策略研究[J].智能系统学报,2006,1(2):39.
　XU Chang-ming,NAN Xiao-fei,WANG Jiao,et al.Adaptive time allocation strategy in? computer game of Chinese Chess[J].CAAI Transactions on Intelligent Systems,2006,1():39.
[2]徐心和,邓志立,王骄,等.机器博弈研究面临的各种挑战[J].智能系统学报,2008,3(4):287.
　XU Xin-he,DENG Zhi-li,WANG Jiao,et al.Challenging issues facing computer game research[J].CAAI Transactions on Intelligent Systems,2008,3():287.
[3]李学俊,王小龙,吴蕾,等.六子棋中基于局部“路”扫描方式的博弈树生成算法[J].智能系统学报,2015,10(2):267.[doi:10.3969/j.issn.1673-4785.201401022]
　LI Xuejun,WANG Xiaolong,WU Lei,et al.Game tree generation algorithm based on local-road scanning method for connect 6[J].CAAI Transactions on Intelligent Systems,2015,10():267.[doi:10.3969/j.issn.1673-4785.201401022]
[4]张小川,王宛宛,彭丽蓉.一种军棋机器博弈的多棋子协同博弈方法[J].智能系统学报,2020,15(2):399.[doi:10.11992/tis.201812012]
　ZHANG Xiaochuan,WANG Wanwan,PENG Lirong.A multi-chess collaborative game method for military chess game machine[J].CAAI Transactions on Intelligent Systems,2020,15():399.[doi:10.11992/tis.201812012]
[5]李霞丽,王昭琦,刘博,等.麻将博弈AI构建方法综述[J].智能系统学报,2023,18(6):1143.[doi:10.11992/tis.202211028]
　LI Xiali,WANG Zhaoqi,LIU Bo,et al.Survey of Mahjong game AI construction methods[J].CAAI Transactions on Intelligent Systems,2023,18():1143.[doi:10.11992/tis.202211028]

备注/Memo

收稿日期： 2011-10-17.网络出版日期：2012-05-18.
基金项目：重庆市教委科研项目(KJ120824)；重庆市自然科学基金资助项目(2007BB2415).
通信作者：张小川. E-mail: cqpczxc@163.com.
作者简介：
张小川，男，1965年生，教授，中国人工智能学会机器博弈专业委员会副主任.主要研究方向为人工智能、人工生命、计算机软件等.主持国家级、省部级项目6项，横向项目30余项，曾获重庆市自然科学奖1项、科技进步奖1项，重庆市教学成果奖1项.主编教材3部，发表学术论文50余篇.
唐艳，女，1987年生，硕士研究生，主要研究方向为计算智能与智能软件.

更新日期/Last Update: 2012-09-05

采用时间差分算法的九路围棋机器博弈系统 PDF下载HTML

备注/Memo

采用时间差分算法的九路围棋机器博弈系统

PDF下载 HTML