<-上一篇/Previous Article 下一篇/Next Article->

[1]王亚杰,乔继林,梁凯,等.结合先验知识与蒙特卡罗模拟的麻将博弈研究[J].智能系统学报,2022,17(1):69-78.[doi:10.11992/tis.20210730]
　WANG Yajie,QIAO Jilin,LIANG Kai,et al.Research on mahjong game based on prior knowledge and Monte Carlo simulation[J].CAAI Transactions on Intelligent Systems,2022,17(1):69-78.[doi:10.11992/tis.20210730]

点击复制

结合先验知识与蒙特卡罗模拟的麻将博弈研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第1期页码: 69-78 栏目: 学术论文—机器学习出版日期: 2022-01-05

Title:: Research on mahjong game based on prior knowledge and Monte Carlo simulation

作者:: 王亚杰¹, 乔继林², 梁凯², 谢延延²; 1. 沈阳航空航天大学工程训练中心, 辽宁沈阳 110136;
2. 沈阳航空航天大学计算机学院, 辽宁沈阳 110136

Author(s):: WANG Yajie¹, QIAO Jilin², LIANG Kai², XIE Yanyan²; 1. Engineering Training Center, Shenyang Aerospace University, Shenyang 110136, China;
2. School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China

关键词:: 麻将; 博弈; 先验知识; 蒙特卡罗; 对手手牌; 模拟; 点炮; 胜率

Keywords:: mahjong; game; prior knowledge; Monte Carlo; opponent’s hand; simulation; win by discard; win rate

分类号:: TP18

DOI:: 10.11992/tis.20210730

摘要:: 针对内陆麻将缺乏统一平台和大量牌谱数据，难以设计出基于监督学习的博弈算法的问题，本文设计了一系列将规则、经验与蒙特卡罗方法相结合的博弈算法。首先，分别针对麻将博弈的弃牌模块、听牌模块、吃牌模块提出了弃牌优先级、听牌有效数、吃牌优先级的方法，完善了麻将AI的知识体系，设计了基础版博弈算法Fanfou_ba和优化版博弈算法Fanfou_op；其次，提出了利用蒙特卡罗方法模拟听牌对手手牌来降低己方点炮概率的提升版博弈算法Fanfou_mc；最后，将3种博弈算法进行对比实验。实验结果显示Fanfou_op相比Fanfou_ba胜率提高了9.76%，Fanfou_mc相比Fanfou_op胜率提高了0.13%且点炮率降低了0.47%，表明本文所提出的改进策略是可行并有效的。

Abstract:: In view of the difficulty in designing game algorithms based on supervised learning due to the shortage of a unified platform and a large amount of card score data for inland mahjong, ,this paper designs a series of game algorithms that combine rules, experience and the Monte Carlo method for inland mahjong game. Firstly, the fold priority, effective number of draws and the eating priority are proposed for the discard module, draw module, and card eating module of the mahjong game, respectively. The mahjong AI knowledge system is improved, and the basic game algorithm Fanfou_ba and the optimized game algorithm Fanfou_op are designed. Secondly, the game algorithm Fanfou_op is proposed that reduces the probability of firing a shot by using the Monte Carlo method to simulate the waiting opponent’s hand. Finally, comparative experiments are conducted on these three kinds of game algorithms. The experimental results show that compared with Fanfou_ba, the Fanfou_op algorithm improves the win rate by 9.76%, and that compared with the Fanfou_op algorithm, the Fanfou_mc algorithm enhances win rate by 0.13% and reduces the shot rate by 0.47%, which proves that the improvement strategy proposed is feasible and effective.

参考文献/References:: [1] 王骄, 徐心和. 计算机博弈: 人工智能的前沿领域: 全国大学生计算机博弈大赛[J]. 计算机教育, 2012(7): 14–18
WANG Jiao, XU Xinhe. Computer game: the frontier field of artificial intelligence: the national college student computer game competition[J]. Computer education, 2012(7): 14–18
[2] 王亚杰, 邱虹坤, 吴燕燕, 等. 计算机博弈的研究与发展[J]. 智能系统学报, 2016, 11(6): 788–798
WANG Yajie, QIU Hongkun, WU Yanyan, et al. Research and development of computer games[J]. CAAI transactions on intelligent systems, 2016, 11(6): 788–798
[3] 徐心和, 邓志立, 王骄, 等. 机器博弈研究面临的各种挑战[J]. 智能系统学报, 2008, 3(4): 287–293
XU Xinhe, DENG Zhili, WANG Jiao, et al. Challenging issues facing computer game research[J]. CAAI transactions on intelligent systems, 2008, 3(4): 287–293
[4] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484–489.
[5] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354–359.
[6] SILVER D, HUBER T, SCHRITTWIESER J, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm[J]. IEEE transactions on computational intelligence and AI in games, 2017, 3(2): 167–170.
[7] SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play[J]. Science, 2018, 362(6419): 1140–1144.
[8] 李翔, 姜晓红, 陈英芝, 等. 基于手牌预测的多人无限注德州扑克博弈方法[J]. 计算机学报, 2018, 41(1): 47–64
LI Xiang, JIANG Xiaohong, CHEN Yingzhi, et al. Game in multiplayer no-limit texas Hold’Em based on hands prediction[J]. Chinese journal of computers, 2018, 41(1): 47–64
[9] MORAV?íK M, SCHMID M, BURCH N, et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker[J]. Science, 2017, 356(6337): 508–513.
[10] BROWN N, SANDHOLM T. Superhuman AI for multiplayer poker[J]. Science, 2019, 365(6456): 885–890.
[11] NOAM B. Equilibrium finding for large adversarial imperfect-information Games[D]. Pittsburgh: Carnegie Mellon University, 2020.
[12] 微软亚洲研究院.哪类游戏AI难度更高？用数学来分析一下[EB/OL]. (2019-8-16) [2021-07-17]. https://www.msra.cn/zh-cn/news/features/difficulty-of-ai-games.
MSRA. Which game is more difficult for AI? Use math to analyze[EB/OL]. (2019-8-16) [2021-07-17]. https://www.msra.cn/zh-cn/news/features/difficulty-of-ai- games.
[13] BROWN N, SANDHOLM T. Safe and nested subgame solving for imperfect-information games[J]. NIPS, 2017: 690–700.
[14] XINGDREAM. 2020麻将项目比赛规则和本文完整实验源码[EB/OL]. (2021-06-01) [2021-07-17]. https://github.com/xingdream/mahjong.
XINGDREAM. 2020 Mahjong competition rules and the complete experimental source code of this article[EB/OL]. (2021-06-01) [2021-07-17]. https://github.com/xingdream/mahjong.
[15] CHENG Yuan, LI Chikwong, LI Sharon H. Mathematical aspect of the combinatorial game “Mahjong”[J]. Southeast asian bulletin of Mathematics, 2019, 43: 815–826.
[16] LI Sanjiang, YAN Xueqing. Let’s play mahjong[J]. IEICE transactions on fundamentals of electronics, communications and computer sciences, 2019, abs/1903.03294.
[17] 林典馀, 吴毅成. 麻将之人工智慧研究[D]. 新竹: 国立交通大学，2008.
LIN Dianyu, WU Yicheng. The study of mahjong artificial intelligence[D]. Xinzhu: National Chiao Tung University, 2008.
[18] 陈新飏, 林顺喜. 电脑麻将程序ThousandWind 的设计与实作[D]. 新竹: 国立台湾师范大学, 2013.
CHEN Xinsi, LIN Shunxi. The design and implementation of the mahjong program ThousandWind[D]. Xinzhu: National Taiwan Normal University, 2013.
[19] 曾海洋, 颜士净. 蒙特卡罗麻将程式设计与改良[D]. 新竹: 台湾计算机博弈学会, 2015.
ZENG Haiyang, YAN Shijing. Monte Carlo Mahjong programming and improvement[D]. Xinzhu: Taiwan Computer Game Association, 2015.
[20] HANDA H. Evolution of the weight vectors in Mahjong non-player characters[C]//2013 World Congress on Nature and Biologically Inspired Computing. New York, USA: IEEE, 2013: 147?152.
[21] MIZUKAMI N, TSURUOKA Y. Building a computer Mahjong player based on Monte Carlo simulation and opponent models[C]//2015 IEEE Conference on Computational Intelligence and Games. New York, USA: IEEE, 2015: 275?283.
[22] GAO Shiqi, OKUYA Fuminori, KAWAHARA Yoshihiro, et al. Supervised learning of imperfect information data in the game of mahjong via deep convolutional neural networks[J]. Information processing society of Japan, 2018(2018): 43–50.
[23] GAO SHIQI, OKUYA F, KAWAHARA Y, et al. Building a computer mahjong player via deep convolutional neural networks[EB/OL]. (2019-06-00) [2021-07-17]. https://arxiv.org/abs/1906.02146.
[24] LI JUNJIE, KOYAMADA S, YE QIWEI, et al. Suphx: mastering mahjong with deep reinforcement learning[EB/OL]. (2020-03-30) [2021-07-17]. https://arXiv preprint arXiv:2003.13590.
[25] WANG Mingyan, YAN Tianwei, LUO Mingyuan, et al. A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games[J]. Multimedia tools and applications, 2019, 78(16): 23443–23467.
[26] 任航. 基于知识与树搜索的非完备信息博弈决策的研究与应用[D]. 南昌: 南昌大学, 2020.
REN Hang. Research and application of imperfect information game decision based on knowledge and game-tree search[D]. Nanchang: Nanchang University, 2020.
[27] 雷捷维, 王嘉旸, 任航, 等. 基于Expectimax搜索与Double DQN的非完备信息博弈算法[J]. 计算机工程, 2021, 47(3): 304, 310–320
LEI Jiewei, WANG Jiayang, REN Hang, et al. Incomplete information game algorithm based on expectimax search and double DQN[J]. Computer engineering, 2021, 47(3): 304, 310–320

相似文献/References:: [1]王龙,王靖,武斌,等.量子博弈:新方法与新策略[J].智能系统学报,2008,3(4):294.
　WANG Long?,WANG Jing,WU Bin?,et al.Quantum games:new methodologies and strategies[J].CAAI Transactions on Intelligent Systems,2008,3():294.
[2]曲卫华,颜志军.企业、政府与公众公共健康提升激励机制演化分析[J].智能系统学报,2017,12(2):237.[doi:10.11992/tis.201508012]
　QU Weihua,YAN Zhijun.Evolutionary analysis of incentive mechanisms for enterprises, governments, and the public to achieve environmental health improvements[J].CAAI Transactions on Intelligent Systems,2017,12():237.[doi:10.11992/tis.201508012]
[3]李霞丽,王昭琦,刘博,等.麻将博弈AI构建方法综述[J].智能系统学报,2023,18(6):1143.[doi:10.11992/tis.202211028]
　LI Xiali,WANG Zhaoqi,LIU Bo,et al.Survey of Mahjong game AI construction methods[J].CAAI Transactions on Intelligent Systems,2023,18():1143.[doi:10.11992/tis.202211028]

备注/Memo

收稿日期:2021-07-17。
基金项目:辽宁省兴辽英才计划项目（XLYC1906003）.
作者简介:王亚杰，教授，博士，中国人工智能学会理事，中国人工智能学会机器博弈专业委员会副主任，主要研究方向为机器博弈、模式识别、图像融合。主持和参与课题20余项。发表学术论文60余篇;乔继林，硕士研究生，主要研究方向为机器博弈。在2020年计算机博弈大赛麻将项目中获得冠军;梁凯，硕士研究生，主要研究方向为机器博弈。参与2020年计算机博弈大赛项目并获得冠军。
通讯作者:王亚杰. E-mail: wangyajie@sina.com

更新日期/Last Update: 1900-01-01

结合先验知识与蒙特卡罗模拟的麻将博弈研究 PDF下载HTML

备注/Memo

结合先验知识与蒙特卡罗模拟的麻将博弈研究

PDF下载 HTML