[1]刘沛华,鲁华祥,龚国良,等.基于FPGA的全流水双精度浮点矩阵乘法器设计[J].智能系统学报,2012,7(4):302-306.
LIU Peihua,LU Huaxiang,GONG Guoliang,et al.Design of an FPGAbased doubleprecision floatingpoint matrix multiplier with pipeline architecture[J].CAAI Transactions on Intelligent Systems,2012,7(4):302-306.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
7
期数:
2012年第4期
页码:
302-306
栏目:
学术论文—智能系统
出版日期:
2012-08-25
- Title:
-
Design of an FPGAbased doubleprecision floatingpoint matrix multiplier with pipeline architecture
- 文章编号:
-
1673-4785(2012)04-0302-05
- 作者:
-
刘沛华,鲁华祥,龚国良,刘文鹏
-
中国科学院半导体研究所 神经网络实验室,北京 100083
- Author(s):
-
LIU Peihua, LU Huaxiang, GONG Guoliang, LIU Wenpeng
-
Lab of Artificial Neural Networks, Institute of Semiconductors, Chinese Academy of Science, Beijing 100083, China
-
- 关键词:
-
矩阵乘法; 现场可编程门阵列(FPGA); 环路流水线; Cslow 时序重排技术; 乘法器设计
- Keywords:
-
matrix multiplication; FPGA; loop pipeline; Cslow retiming; multiplia design
- 分类号:
-
TP332.2
- 文献标志码:
-
A
- 摘要:
-
在数字通信、图像处理等应用领域中需要用到大量的矩阵乘法运算,并且它的计算性能是影响系统性能的关键因素.设计了一个全流水结构的并行双精度浮点矩阵乘法器以提高计算性能,并在Xilinx Virtex5 LX155 现场可编程门阵列(FPGA)上完成了方案的实现.乘法器中处理单元(PE)按阵列形式排列,在一个FPGA芯片上可集成10个PE单元实现并行计算.为了提高工作频率,PE单元采用流水线结构,并运用Cslow时序重排技术解决了环路流水线上“数据相关冲突”的问题.仿真结果表明,该乘法器的峰值计算性能可达到5 000 MFLOPS.此外,对不同维数的矩阵乘法进行了实验,其结果也证实了该设计达到了较高的计算性能.
- Abstract:
-
Many application areas, such as digital communication and image processing, make extensive use of matrix multiplication operations, and the computational performance of this operation is critical for the whole system. A parallel doubleprecision floatingpoint matrix multiplier with pipeline architecture was designed to improve the computational performance. The design was implemented in a Xilinx Virtex5 LX155 field programmable gate array (FPGA). Up to 10 processing elements were integrated in a single FPGA device, and they were arranged as an array to achieve parallel computation. The processing elements employed pipelined architecture to increase the speed, and Cslow retiming was applied to solve the datarelated conflicts issues on the loop pipeline. The postRoute simulation results show that the peak performance of the matrix multiplier can achieve 5 000 MFLOPS. In addition, the matrix multiplication experiments with different dimensions were carried out, and the results confirm that the design achieved high computational performance.
备注/Memo
收稿日期: 2012-02-06.
网络出版日期:2012-07-12.
基金项目:国家自然科学基金资助项目(61076014);江苏省高校自然科学基金资助项目(10KJA510042);先导项目(XDA06020700).
通信作者:刘沛华.
E-mail:pclph123@163.com.
作者简介:
刘沛华,女,1985年生,硕士研究生,主要研究方向为电路与系统、神经网络.
鲁华祥,男,1965年生,研究员,主要研究方向为智能信息处理、神经网络技术及其应用.近年来,作为项目负责人或骨干研究人员完成国家重大科技攻关项目3项、国家“863”计划项目3项、国家自然科学基金重点项目3项,发表学术论文50余篇.
龚国良,男,1982年生,博士研究生,主要研究方向为优化算法、神经网络、模式识别等.
更新日期/Last Update:
2012-09-26