[1]李霞丽,王昭琦,刘博,等.麻将博弈AI构建方法综述[J].智能系统学报,2023,18(6):1143-1155.[doi:10.11992/tis.202211028]
 LI Xiali,WANG Zhaoqi,LIU Bo,et al.Survey of Mahjong game AI construction methods[J].CAAI Transactions on Intelligent Systems,2023,18(6):1143-1155.[doi:10.11992/tis.202211028]
点击复制

麻将博弈AI构建方法综述

参考文献/References:
[1] 陆升阳, 赵怀林, 刘华平. 场景图谱驱动目标搜索的多智能体强化学习[J]. 智能系统学报, 2023, 18(1): 207–215
LU Shengyang, ZHAO Huailin, LIU Huaping. Multi-agent reinforcement learning for scene graph-driven target search[J]. CAAI transactions on intelligent systems, 2023, 18(1): 207–215
[2] 欧阳勇平, 魏长赟, 蔡帛良. 动态环境下分布式异构多机器人避障方法研究[J]. 智能系统学报, 2022, 17(4): 752–763
OUYANG Yongping, WEI Changyun, CAI Boliang. Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J]. CAAI transactions on intelligent systems, 2022, 17(4): 752–763
[3] 齐小刚, 陈春绮, 熊伟, 等. 基于博弈论的预警卫星系统抗毁性研究[J]. 智能系统学报, 2021, 16(2): 338–345
QI Xiaogang, CHEN Chunqi, XIONG Wei, et al. Research on the invulnerability of an early warning satellite system based on game theory[J]. CAAI transactions on intelligent systems, 2021, 16(2): 338–345
[4] MIZUKAMI N, NAKAHARI R, URA A, et al. Realizing a four-player computer mahjong program by supervised learning with isolated multi-player aspects[J]. Transactions of information processing society of Japan, 2014, 55(11): 1–11.
[5] LI Junjie, KOYAMADA S, YE Qiwei, et al. Suphx: mastering mahjong with deep reinforcement learning[EB/OL].(2020-03-30)[2022-11-18].https://arxiv.org/abs/2003.13590
[6] 乔继林. 麻将机器博弈方法研究[D].沈阳: 沈阳航空航天大学, 2022.
QIAO Jilin. Research on the Mahjong machine game method [D]. Shenyang: Shenyang Aerospace University, 2022.
[7] 王亚杰, 乔继林, 梁凯, 等. 结合先验知识与蒙特卡罗模拟的麻将博弈研究[J]. 智能系统学报, 2022, 17(1): 69–78
WANG Yajie, QIAO Jilin, LIANG Kai, et al. Research on Mahjong game based on prior knowledge and Monte Carlo simulation[J]. CAAI transactions on intelligent systems, 2022, 17(1): 69–78
[8] 王松. 基于深度学习的非完备信息博弈对手建模的研究[D]. 南昌: 南昌大学, 2023.
WANG Song. Research on incomplete information game opponent model based on deep learning [D]. Nanchang: Nanchang University, 2023.
[9] 赵海璐. 大众麻将计算机博弈智能搜索算法的应用研究[D]. 重庆: 重庆理工大学, 2023.
ZHAO Hailu. Application research on intelligent search algorithm of popular Mahjong computer game [D]. Chongqing: Chongqing University of Technology, 2023.
[10] 任航. 基于知识与树搜索的非完备信息博弈决策的研究与应用[D]. 南昌: 南昌大学, 2020.
REN Hang. Research and application of imperfect information game decision based on knowledge and game-tree search[D]. Nanchang: Nanchang University, 2020.
[11] 彭丽蓉, 赵海璐, 甘春晏, 等. 一种大众麻将计算机博弈的胡牌方法研究[J]. 重庆理工大学学报(自然科学版), 2021, 35(12): 127–133
PENG Lirong, ZHAO Hailu, GAN Chunyan, et al. Research on the hu method of a popular mahjong computer game[J]. Journal of Chongqing University of Technology (natural science edition), 2021, 35(12): 127–133
[12] YAN Xueqing, LI Yongming, LI Sanjiang. A fast algorithm for computing the deficiency number of a mahjong hand[EB/OL]. (2021-08-15)[2022-11-13].https://arxiv.org/abs/2108.06832.
[13] WANG Mingyan, REN Hang, HUANG Wei, et al. An efficient AI-based method to play the Mahjong game with the knowledge and game-tree searching strategy[J]. ICGA journal, 2021, 43(1): 2–25.
[14] XU D. Mahjong AI/analyzer[D]. Los Angeles: California State University Northridge, 2015.
[15] IHARA K, KATO S. Neuro-evolutionary approach to multi-objective optimization in one-player mahjong[C]//International Conference on Network-Based Information Systems. Cham: Springer, 2018: 492-503.
[16] LI Sanjiang, YAN Xueqing. Let’s play Mahjong![EB/OL]. (2019-03-08)[2022-11-13].https://arxiv.org/abs/1903.03294.
[17] SCHRUM J, MIIKKULAINEN R. Evolving multimodal behavior with modular neural networks in Ms. Pac-Man[C]//Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. New York: ACM, 2014: 325-332.
[18] GAO Shiqi, FUMINORI O, YOSHIHIRO K, et al. Supervised learning of imperfect information data in the game of mahjong via deep convolutional neural networks[J]. Information processing society of Japan, 2018: 43-50.
[19] ZHENG Y, YOKOYAMA S, YAMASHITA T, et al. Study on evaluation function design of Mahjong using supervised learning[J]. SIG-SAI, 2019, 34(5): 1–9.
[20] GAO Shijing, LI Shuqin. Bloody Mahjong playing strategy based on the integration of deep learning and XGBoost[J]. CAAI transactions on intelligence technology, 2022, 7(1): 95–106.
[21] WANG Mingyan, YAN Tianwei, LUO Mingyuan, et al. A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games[J]. Multimedia tools and applications, 2019, 78(16): 23443–23467.
[22] SATO H, SHIRAKAWA T, HAGIHARA A, et al. An analysis of play style of advanced mahjong players toward the implementation of strong AI player[J]. International journal of parallel, emergent and distributed systems, 2017, 32(2): 195–205.
[23] 孙一铃. 基于Expectimax搜索的非完备信息博弈算法的研究[D]. 北京: 北京交通大学, 2021.
SUN Yiling. Research on incomplete information game algorithm based on Expectimax search[D]. Beijing: Beijing Jiaotong University, 2021.
[24] 雷捷维, 王嘉旸, 任航, 等. 基于Expectimax搜索与Double DQN的非完备信息博弈算法[J]. 计算机工程, 2021, 47(3): 304–310,320
LEI Jiewei, WANG Jiayang, REN Hang, et al. Incomplete information game algorithm based on expectimax search and double DQN[J]. Computer engineering, 2021, 47(3): 304–310,320
[25] ZHAO Cong, XIAO Bing, ZHA Lin. Incomplete information competition strategy based on improved asynchronous advantage actor critical model[C]//Proceedings of the 2020 4th International Conference on Deep Learning Technologies. New York: ACM, 2020: 32-37.
[26] HAN D, KOZUNO T, LUO X, et al. Variational oracle guiding for reinforcement learning[C]//International Conference on Learning Representations. Vienna: ICLR, 2021: 1-22.
[27] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444.
[28] MICHAEL B, NEIL B, MICHAEL J, et al. Heads-up limit hold’em poker is solved[J]. Science, 2015, 347(6218): 145–149.
[29] ZHA Daochen, XIE Jingru, MA Wenye, et al. DouZero: mastering DouDizhu with self-play deep reinforcement learning[EB/OL]. (2021-06-11)[2022-11-13].https://arxiv.org/abs/2106.06135.
[30] BROWN N, SANDHOLM T. Superhuman AI for multiplayer poker[J]. Science, 2019, 365(6456): 885–890.
[31] OH I, RHO S, MOON S, et al. Creating pro-level AI for a real-time fighting game using deep reinforcement learning[J]. IEEE transactions on games, 2022, 14(2): 212–220.
[32] UENO M, HAYAKAWA D, ISAHARA H. Estimating the purpose of discard in mahjong to support learning for beginners[C]//International Symposium on Distributed Computing and Artificial Intelligence. Cham: Springer, 2019: 155-163.
[33] Long H, Tomoyuki K. Improving Mahjong Agent by Predicting Types of Yaku[C]//Proceedings game programming workshop. Venue: IPSJ, 2019: 206-212.
[34] 龚慧雯, 王桐, 陈立伟, 等. 基于深度强化学习的多智能体对抗策略算法[J]. 应用科技, 2022, 49(5): 1–7
GONG Huiwen, WANG Tong, CHEN Liwei, et al. A multi-agent adversarial strategy algorithm based on deep reinforcement learning[J]. Applied science and technology, 2022, 49(5): 1–7
[35] KURITA M, HOKI K. Method for constructing artificial intelligence player with abstractions to Markov decision processes in multiplayer game of mahjong[J]. IEEE transactions on games, 2021, 13(1): 99–110.
[36] VILLAGE D M. NAGA: deep learning Mahjong AI[EB/OL]. (2022-07-29)[2022-11-13].https://dmv.nico/ja/articles/mahjong_ai_naga/.
[37] TRUONG T D. A supervised attention-based multiclass classifier for tile discarding in Japanese Mahjong[D]. Grimstad: University of Agder, 2021.
[38] LIN J. Phoenix: an open-source, reproducible and interpretable Mahjong agent[EB/OL]. (2021-05-05)[2022-11-13].https://csci527-phoenix.github.io/documents.html.
[39] Long H, Tomoyuki K. Training japanese mahjong agent with two dimension feature representation[C]//Proceedings game programming workshop. online: IPSJ, 2020: 125-130.
[40] ZHA Daochen, LAI K H, HUANG Songyi, et al. RLCard: a platform for reinforcement learning in card games[C]//Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2020: 5264-5266.
[41] LOCKHART E, LANCTOT M, PéROLAT J, et al. Computing approximate equilibria in sequential adversarial games by exploitability descent[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao: International Joint Conferences on Artificial Intelligence Organization, 2019: 464-470.
[42] WANG Zhikun, BOULARIAS A, MüLLING K, et al. Balancing safety and exploitability in opponent modeling[J]. Proceedings of the AAAI conference on artificial intelligence, 2011, 25(1): 1515–1520.
[43] 董胤蓬, 苏航, 朱军. 面向对抗样本的深度神经网络可解释性分析[J]. 自动化学报, 2022, 48(1): 75–86
DONG Yinpeng, SU Hang, ZHU Jun. Interpretability analysis of deep neural networks with adversarial examples[J]. Acta automatica sinica, 2022, 48(1): 75–86
[44] 刘佳, 陈增强, 刘忠信. 多智能体系统及其协同控制研究进展[J]. 智能系统学报, 2010, 5(1): 1–9
LIU Jia, CHEN Zengqiang, LIU Zhongxin. Advances in multi-Agent systems and cooperative control[J]. CAAI transactions on intelligent systems, 2010, 5(1): 1–9
[45] 殷昌盛, 杨若鹏, 朱巍, 等. 多智能体分层强化学习综述[J]. 智能系统学报, 2020, 15(4): 646–655
YIN Changsheng, YANG Ruopeng, ZHU Wei, et al. A survey on multi-agent hierarchical reinforcement learning[J]. CAAI transactions on intelligent systems, 2020, 15(4): 646–655
[46] LIN Longji. Self-improving reactive agents based on reinforcement learning, planning and teaching[J]. Machine learning, 1992, 8(3): 293–321.
[47] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5055-5065.
[48] PATERIA S, SUBAGDJA B, TAN A H, et al. Hierarchical reinforcement learning: a comprehensive survey[J]. ACM computing surveys, 54(5): 109.
[49] DAYAN P, HINTON G E. Feudal reinforcement learning[J]. Advances in neural information processing systems, 1992, 5: 271–278.
[50] VEZHNEVETS A S, OSINDERO S, SCHAUL T, et al. FeUdal networks for hierarchical reinforcement learning[EB/OL]. (2017-03-03)[2022-11-13].https://arxiv.org/abs/1703.01161.
[51] CAMERON J, PIERCE W D. Reinforcement, reward, and intrinsic motivation: a meta-analysis[J]. Review of educational research, 1994, 64(3): 363–423.
[52] OSTROVSKI G, BELLEMARE M G, VAN DEN OORD A, et al. Count-based exploration with neural density models[EB/OL]. (2017-03-03)[2022-11-13].https://arxiv.org/abs/1703.01310.
[53] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 488-489.
[54] 陈浩, 李嘉祥, 黄健, 等. 融合认知行为模型的深度强化学习框架及算法[J/OL]. 控制与决策: 1-9. [2023-10-06]. https://doi.org/10.13195/j.kzyjc.2022.0281.
CHEN Hao, LI Jiaxiang, HUANG Jian, et al. Deep reinforcement learning framework and algorithm integrating cognitive behaviour model[J/OL]. Control and decision: 1-9. [2023-10-06]. https://doi.org/10.13195/j.kzyjc.2022.0281.
[55] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. New York: ACM, 2017: 1126-1135.
[56] 谭晓阳, 张哲. 元强化学习综述[J]. 南京航空航天大学学报, 2021, 53(5): 653–663
TAN Xiaoyang, ZHANG Zhe. Review on meta reinforcement learning[J]. Journal of Nanjing University of Aeronautics & Astronautics, 2021, 53(5): 653–663
[57] 王方伟, 柴国芳, 李青茹, 等. 基于参数优化元学习和困难样本挖掘的小样本恶意软件分类方法[J]. 武汉大学学报(理学版), 2022, 68(1): 17–25
WANG Fangwei, CHAI Guofang, LI Qingru, et al. Classification of few-sample malware based on parameter-optimized meta-learning and hard example mining[J]. Journal of Wuhan University (natural science edition), 2022, 68(1): 17–25
[58] 宋佳蓉, 杨忠, 张天翼, 等. 基于卷积神经网络和多类SVM的交通标志识别[J]. 应用科技, 2018, 45(5): 71–75, 81
SONG Jiarong, YANG Zhong, ZHANG Tianyi, et al. Traffic sign identification based on convolutional neural network and multiclass SVM[J]. Applied science and technology, 2018, 45(5): 71–75, 81
[59] WEISS K, KHOSHGOFTAAR T M, WANG Dingding. A survey of transfer learning[J]. Journal of big data, 2016, 3(1): 1–40.
[60] RUDER S, PETERS M E, SWAYAMDIPTA S, et al. Transfer learning in natural language processing[C]//Proceedings of the 2019 Conference of the North. Minneapolis, Minnesota. Stroudsburg: Association for Computational Linguistics, 2019: 15-18.
[61] SHAO Kun, ZHU Yuanheng, ZHAO Dongbin. StarCraft micromanagement with reinforcement learning and curriculum transfer learning[J]. IEEE transactions on emerging topics in computational intelligence, 2019, 3(1): 73–84.
[62] SUN Qianru, LIU Yaoyao, CHUA T S, et al. Meta-transfer learning for few-shot learning[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020: 403-412.
[63] OLIVAS E S, GUERRERO J D M, MARTINEZ-SOBER M, et al. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques[M]. IGI global, 2009.
[64] 张蒙, 李凯, 吴哲, 等. 一种针对德州扑克AI的对手建模与策略集成框架[J]. 自动化学报, 2022, 48(4): 1004–1017
ZHANG Meng, LI Kai, WU Zhe, et al. An opponent modeling and strategy integration framework for texas hold’em[J]. Acta automatica sinica, 2022, 48(4): 1004–1017
[65] GANZFRIED S, SANDHOLM T. Safe opponent exploitation[J]. ACM transactions on economics and computation, 2015, 3(2): 1–28.
[66] LONG Qian, ZHOU Zihan, GUPTA A, et al. Evolutionary population curriculum for scaling multi-agent reinforcement learning[EB/OL]. (2020-03-23)[2022-11-13].https://arxiv.org/abs/2003.10423.
[67] YANG Y, LUO J, WEN Y, et al. Diverse auto-curriculum is critical for successful real-world multiagent learning systems[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems. Richland: ACM, 2021: 51-56.
[68] WU Zhe, LI Kai, XU Hang, et al. L2E: learning to exploit your opponent[C]//2022 International Joint Conference on Neural Networks. Padua: IEEE, 2022: 1-8.
[69] SHEN Macheng, HOW J P. Robust opponent modeling via adversarial ensemble reinforcement learning[J]. Proceedings of the international conference on automated planning and scheduling, 2021, 31: 578–587.
[70] 苏子美, 董红斌. 面向无人机路径规划的多目标粒子群优化算法[J]. 应用科技, 2021, 48(3): 12–20, 26
SU Zimei, DONG Hongbin. Multi-objective particle swarm optimization algorithm for UAV path planning[J]. Applied science and technology, 2021, 48(3): 12–20, 26
[71] JADERBERG M, DALIBARD V, OSINDERO S, et al. Population based training of neural networks[EB/OL]. (2017-11-27)[2022-11-13].https://arxiv.org/abs/1711.09846.
[72] LI Wenxin, ZHOU Haoyu, WANG C, et al. Teaching AI algorithms with games including Mahjong and fight the landlord on the botzone online platform[C]//Proceedings of the ACM Conference on Global Computing Education. New York: ACM, 2019: 129-135.
相似文献/References:
[1]徐长明,南晓斐,王 骄,等.中国象棋机器博弈的时间自适应分配策略研究[J].智能系统学报,2006,1(2):39.
 XU Chang-ming,NAN Xiao-fei,WANG Jiao,et al.Adaptive time allocation strategy in? computer game of Chinese Chess[J].CAAI Transactions on Intelligent Systems,2006,1():39.
[2]徐心和,邓志立,王骄,等.机器博弈研究面临的各种挑战[J].智能系统学报,2008,3(4):287.
 XU Xin-he,DENG Zhi-li,WANG Jiao,et al.Challenging issues facing computer game research[J].CAAI Transactions on Intelligent Systems,2008,3():287.
[3]张小川,唐艳,梁宁宁.采用时间差分算法的九路围棋机器博弈系统[J].智能系统学报,2012,7(3):278.
 ZHANG Xiaochuan,TANG Yan,LIANG Ningning.A 9×9 Go computer game system using temporal difference[J].CAAI Transactions on Intelligent Systems,2012,7():278.
[4]李学俊,王小龙,吴蕾,等.六子棋中基于局部“路”扫描方式的博弈树生成算法[J].智能系统学报,2015,10(2):267.[doi:10.3969/j.issn.1673-4785.201401022]
 LI Xuejun,WANG Xiaolong,WU Lei,et al.Game tree generation algorithm based on local-road scanning method for connect 6[J].CAAI Transactions on Intelligent Systems,2015,10():267.[doi:10.3969/j.issn.1673-4785.201401022]
[5]张小川,王宛宛,彭丽蓉.一种军棋机器博弈的多棋子协同博弈方法[J].智能系统学报,2020,15(2):399.[doi:10.11992/tis.201812012]
 ZHANG Xiaochuan,WANG Wanwan,PENG Lirong.A multi-chess collaborative game method for military chess game machine[J].CAAI Transactions on Intelligent Systems,2020,15():399.[doi:10.11992/tis.201812012]
[6]吴立成,吴启飞,钟宏鸣,等.基于卷积神经网络的“拱猪”博弈算法[J].智能系统学报,2023,18(4):775.[doi:10.11992/tis.202203030]
 WU Licheng,WU Qifei,ZHONG Hongming,et al.Algorithm for “Hearts” game based on convolutional neural network[J].CAAI Transactions on Intelligent Systems,2023,18():775.[doi:10.11992/tis.202203030]

备注/Memo

收稿日期:2022-11-18。
基金项目:国家自然科学基金项目(61873291,62276285).
作者简介:李霞丽,教授,主要研究方向为计算机博弈;王昭琦,硕士研究生,主要研究方向为计算机博弈;吴立成,教授,中国人工智能学会机器博弈专委会副主任,主要研究方向为智能系统及机器人、计算机博弈。主持国家自然科学基金等项目10余项,发表学术论文80余篇。
通讯作者:吴立成.E-mail:wulicheng@tsinghua.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com