[1]李佳敏,刘兴波,聂秀山,等.三元组深度哈希学习的司法案例相似匹配方法[J].智能系统学报,2020,15(6):1147-1153.[doi:10.11992/tis.202006049]
 LI Jiamin,LIU Xingbo,NIE Xiushan,et al.Triplet deep Hashing learning for judicial case similarity matching method[J].CAAI Transactions on Intelligent Systems,2020,15(6):1147-1153.[doi:10.11992/tis.202006049]
点击复制

三元组深度哈希学习的司法案例相似匹配方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年6期
页码:
1147-1153
栏目:
学术论文—机器感知与模式识别
出版日期:
2020-11-05

文章信息/Info

Title:
Triplet deep Hashing learning for judicial case similarity matching method
作者:
李佳敏1 刘兴波1 聂秀山2 郭杰1 尹义龙1
1. 山东大学 软件学院, 山东 济南 250101;
2. 山东建筑大学 计算机科学与技术学院, 山东 济南 250101
Author(s):
LI Jiamin1 LIU Xingbo1 NIE Xiushan2 GUO Jie1 YIN Yilong1
1. School of Software, Shandong University, Ji’nan 250101, China;
2. School of Computer Science and Technology, Shandong Jianzhu University, Ji’nan 250101, China
关键词:
司法案例案例匹配相似检索哈希学习深度学习神经网络BERT模型三元组
Keywords:
judicial casescase matchingsimilarity retrievalHashing learningdeep learningneural networkBERT modeltriples
分类号:
TP391
DOI:
10.11992/tis.202006049
摘要:
在数量庞大的司法案例文书中进行相似案例匹配可以有效地提升司法部门的工作效率。但司法案件文本不仅长,而且文本自身还具有一定程度的结构复杂性,因此司法案例文本匹配与传统自然语言处理任务相比,具有较高的难度。为解决上述问题,本文基于三元组深度哈希学习模型提出了一种司法案例相似匹配方法,首先使用预训练的BERT中文模型分组提取文书的特征;再利用文书三元组相似性关系,训练深度神经网络模型,用于生成文书的哈希码表示;最后,基于文书哈希码的海明距离来判断是否为相似案例。实验结果表明,本文采用哈希学习方法极大地降低了文书特征表示的存储开销,提高了相似案例匹配的速度。
Abstract:
Matching similar cases in a large number of judicial case documents can effectively improve the efficiency of the judicial department. However, the text of judicial cases is not only lengthy, but also exhibits a certain degree of structural complexity. Therefore, the text matching of judicial cases is more difficult compared with the traditional natural language processing tasks. To solve the above problems and challenges, this paper proposes a judicial case similar matching method based on the triplet deep Hashing learning model. First, a pre-trained BERT model is used to extract the features of the documents in groups. The triplet similarity relationship of the documents is then employed to train the deep neural network model to generate the Hashing code representation of the documents. Finally, the Hamming distance based on the Hashing code of the documents is used to determine whether they are similar cases. Experimental results show that the Hashing learning method greatly reduces the storage cost of the documents’ feature representations and improves the speed of similar case matching.

参考文献/References:

[1] 贾君枝, 毛海飞. 基于法律框架网络本体的语义匹配技术研究[J]. 情报理论与实践, 2008, 31(1): 124-128
JIA Junzhi, MAO Haifei. Research on the semantic matching technology based on the Chinese legal framenet ontology[J]. Information studies: theory & application, 2008, 31(1): 124-128
[2] INDYK P, MOTWANI R. Approximate nearest neighbors: towards removing the curse of dimensionality[C]//Proceedings of the 30th Annual ACM Symposium on Theory of Computing. Dallas, USA, 1998: 604-613.
[3] LAI Hanjiang, PAN Yan, LIU Ye, et al. Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 3270-3278.
[4] GIONIS A, INDYK P, MOTWANI R. Similarity search in high dimensions via hashing[C]//Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, 1999: 518-529.
[5] WEISS Y, TORRALBA A, FERGUS R. Spectral hashing[C]//Proceedings of the 21st International Conference on Neural Information Processing Systems. Vancouver, Canada, 2008: 1753-1760.
[6] LIU Wei, WANG Jun, KUMAR S, et al. Hashing with graphs[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, USA, 2011: 1-8.
[7] GONG Yunchao, LAZEBNIK S, GORDO A, et al. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(12): 2916-2929.
[8] KULIS B, DARRELL T. Learning to hash with binary reconstructive embeddings[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems. Vancouver, Canada, 2009: 1042-1050.
[9] NOROUZI M, FLEET D J. Minimal loss hashing for compact binary codes[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, USA, 2011: 353-360.
[10] LIU Wei, WANG Jun, JI Rongrong, et al. Supervised hashing with kernels[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA, 2012: 2074-2081.
[11] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105.
[12] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 1?9.
[13] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1026?1034.
[14] SZEGEDY C, TOSHEV A, ERHAN D. Deep neural networks for object detection[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA, 2013: 2553-2561.
[15] LIN K, YANG H F, HSIAO J H, et al. Deep learning of binary hash codes for fast image retrieval[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, USA, 2015: 27-35.
[16] XIA Rongkai, PAN Yan, LAI Hanjiang, et al. Supervised hashing for image retrieval via image representation learning[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Québec, Canada, 2014: 2156?2162.
[17] 李泗兰, 郭雅. 基于深度学习哈希算法的快速图像检索研究[J]. 计算机与数字工程, 2019, 47(12): 3187-3192
LI Silan, GUO Ya. Fast image retrieval based on hash algorithm in depth learning[J]. Computer and digital engineering, 2019, 47(12): 3187-3192
[18] LIONG V E, LU Jiwen, WANG Gang, et al. Deep hashing for compact binary codes learning[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 2475-2483.
[19] YANG H F, LIN K, CHEN Chusong. Supervised learning of semantics-preserving hashing via deep neural networks for large-scale image search[J]. Computer Science, 2015, 10(12): 131?138.
[20] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA, 2019: 4171?4186.
[21] 汪静, 罗浪, 王德强. 基于Word2Vec的中文短文本分类问题研究[J]. 计算机系统应用, 2018, 27(5): 209-215
WANG Jing, LUO Lang, WANG Deqiang. Research on Chinese short text classification based on Word2Vec[J]. Computer systems & applications, 2018, 27(5): 209-215
[22] LI Xi, LIN Guosheng, SHEN Chunhua, et al. Learning hash functions using column generation[C]//Proceeding of the 30th International Conference on Machine Learning, 2013: 142-150.
[23] NOROUZI M, FLEET D J, SALAKHUTDINOV R. Hamming distance metric learning[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1061?1069.
[24] GONG Yunchao, LAZEBNIK S. Iterative quantization: a procrustean approach to learning binary codes[C]//Proceedings of CVPR 2011. Providence, USA, 2011: 817-824.
[25] SONG Jingkun, YANG Yi, HUANG Zi, et al. Effective multiple feature hashing for large-scale near-duplicate video retrieval[J]. IEEE transactions on multimedia, 2013, 15(8): 1997-2008.

备注/Memo

备注/Memo:
收稿日期:2020-06-29。
基金项目:国家重点研发计划项目(2018YFC0830100,2018YFC0830102)
作者简介:李佳敏,硕士研究生,主要研究方向为智能媒体处理;刘兴波,博士研究生,主要研究方向为智能媒体处理、计算机视觉;尹义龙,教授,博士生导师,主要研究方向为人工智能理论与方法、机器学习、数据挖掘。主持国家自然科学基金重点项目1项、国家重点研发专项课题1项、面上项目3项、青年项目1项,主持省部级科研项目11项。发表学术论文300余篇
通讯作者:尹义龙.E-mail:ylyin@sdu.edu.cn
更新日期/Last Update: 2020-12-25