[1]熊铁妞,邱吉芳,胡建.基于深度学习技术的古彝文字图像搜集与整理方法[J].智能系统学报,2025,20(4):928-935.[doi:10.11992/tis.202406036]
XIONG Tieniu,QIU Jifang,HU Jian.Collection and sorting method of ancient Yi character images based on deep learning technology[J].CAAI Transactions on Intelligent Systems,2025,20(4):928-935.[doi:10.11992/tis.202406036]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第4期
页码:
928-935
栏目:
学术论文—机器学习
出版日期:
2025-08-05
- Title:
-
Collection and sorting method of ancient Yi character images based on deep learning technology
- 作者:
-
熊铁妞1,2, 邱吉芳3, 胡建1,2
-
1. 西南民族大学 计算机系统国家民委重点实验室, 四川 成都 610225;
2. 西南民族大学 计算机与人工智能学院, 四川 成都 610225;
3. 西南民族大学 中国语言文学学院, 四川 成都 610225
- Author(s):
-
XIONG Tieniu1,2, QIU Jifang3, HU Jian1,2
-
1. The Key Laboratory for Computer Systems of State Ethnic Affairs Commission, Southwest Minzu University, Chengdu 610225, China;
2. College of Computer Science and Artificial Intelligence, Southwest Minzu University, Chengdu 610225, China;
3. School of Chinese Language and Literature, Southwest Minzu University, Chengdu 610225, China
-
- 关键词:
-
深度学习; 古彝文字; 古籍; 图像处理; 相似度匹配; 特征提取; 目标检测; 数字化
- Keywords:
-
deep learning; ancient Yi characters; ancient literatures; image processing; similarity matching; feature extraction; object detection; digitalization
- 分类号:
-
TP391.4; TP391.1
- DOI:
-
10.11992/tis.202406036
- 文献标志码:
-
2025-2-21
- 摘要:
-
古彝文字是中华文化的重要载体之一,但人工搜集、整理大量古彝文字耗时耗力,而且能辨识古彝文字的人已非常稀缺且越来越少,这使得整理工作变得更为困难。对此,本文提出一种基于深度学习技术的古彝文字图像搜集与整理的新思路。在古彝文字图像搜集方面,通过目标检测模型得到每个古彝文字在彝文古籍图像中的位置,据此在彝文古籍图像中截取出古彝文字图像,实现古彝文字搜集。在古彝文图像整理方面,首先根据规范彝文来源于古彝文的事实,采用规范彝文字体文件自动生成彝文字图像用于构建数据集,并将数据集应用于训练古彝文字图像特征算法,这有效回避了目前因古彝文字数量庞大、异体字众多、整理尚未完成,而尚无古彝文字图像数据集的问题;然后,通过匹配所搜集的古彝文字图像的特征与现已收录的古彝文字图像的特征的相似性,判断所搜集的古彝文字图像是否已被收录,从而整理出未收录的古彝文字图像。实验在多种典型的特征提取算法和相似性计算方式下进行,实验结果验证了方法的有效性。
- Abstract:
-
The ancient Yi script is one of the important carriers of Chinese culture. However, manually collecting and organizing a large amount of ancient Yi script is time-consuming and labor-intensive. Additionally, very few people can recognize ancient Yi script, and their numbers are dwindling, which makes the task even more difficult. In response to this, this paper proposes a new approach to collecting and organizing images of the ancient Yi script based on deep learning technology. For image collection, the object detection model is used to locate each ancient Yi character in the images of ancient Yi manuscripts, and the characters are extracted from these images accordingly. For image organization, because modern standardized Yi characters are derived from ancient Yi characters, standardized Yi character font files are used to generate images of the Yi characters automatically to construct a dataset. This dataset is then used to train an algorithm for extracting features of ancient Yi script images, which effectively addresses the current lack of an ancient Yi script image dataset due to the large number of characters, many variants, and incomplete organization. Subsequently, matching the features of the collected ancient Yi script images with those of already cataloged images enables determining whether the collected images have been previously recorded and thereby organizing uncatalogued ancient Yi script images. Experiments conducted with various typical feature extraction algorithms and similarity computation methods validate the effectiveness of this approach.
备注/Memo
收稿日期:2024-6-21。
基金项目:国家社会科学基金重大招标项目(19ZDA284);西南民族大学中华民族共同体研究院团队项目(2024GTT-TD17);西南民族大学中央高校基本科研业务费专项基金项目(ZYN2023009).
作者简介:熊铁妞,硕士研究生,主要研究方向为深度学习、图像处理、古彝文字数字化。E-mail:xiongtieniu@stu.swun.edu.cn。;邱吉芳,本科生,主要学习方向彝语语言学、彝语方言学。E-mail:18384496920@163.com。;胡建,教授,博士,主要研究方向为计算机视觉、群体智能、文献数字化。E-mail:hujian@swun.edu.cn。
通讯作者:胡建. E-mail:hujian@swun.edu.cn
更新日期/Last Update:
1900-01-01