您好,欢迎光临本网站![请登录][注册会员]  
文件名称: A Framework for Generating High Throughput CNN Implementations on FPGAs.pdf
  所属分类: 深度学习
  开发工具:
  文件大小: 2mb
  下载次数: 0
  上传时间: 2019-07-20
  提 供 者: shiya******
 详细说明:一种FPGA硬件加速方案,实现深度学习,可实现高吞吐量的CNN网络Session 3: Deep Learning FPGA 18, February 25-27, Monterey, CA, USA maps. Let b, n and m index into the Batch, fin and fout dimensions Table 1: Variation of model paramcters Equation 4 specifies the operations of a convolution layer max Mayer (i yer CNN Conv Layers lk max fin min fin min m n ern min lime n kern (2)Duality of the OaA and CaP operations: OaA partitions im- 3.2 CaP for Reducing Wasted Computations ages, and Cap combines images. OaA processes a set of matrices by overlapping pixels(Step 4 in Figure 1), and Cap processes a set Oaa requires the shape of each partition to be N X N. The analysis f matrices by padding pixels(Step Cap in Figure 4). Since Cap is on equation ignores the useless computation on the zero paddings a dual of OaA, we can extend the e operator(Section 2.1). If the of P. Such approximation is not always valid, as can be seen in superscript x is negative, then we use e to compute step b in Figure 2a when N=32 or 64. Two examples are shown in Figure 3 Figure 1. If x is positive, we use e to compute I=6 (D)in CaP Scenario 1 is for deep layers when limg is small and scenario 2 happens when ling is larger In summary, we Cap the layer array so that input of Batchxfin X One possible solution is to select an appropriate n which fits well ling is reshaped to x fin xlimg where limg =d. limg+(d limg of most layers. The first problem is, this technique significantly 1)(kern -1). We then apply OaA to I. Abbreviate such operations 120 Session 3: Deep Learning FPGA 18, February 25-27, Monterey, CA, USA 3.3 Frequency domain Loop Tiling The CaP-Oaa technique manipulates the data dimensions limg and Kern. To block data of convolution layers into identical shapes, we 2.0 still need optimization on the fin and fout dimensions We revisit Algorithm 1. Tiling of the loop dimensions in lines 5 and 6 performs partitioning of fin and fout. In runtime, the kernel 1.0 filters and image data are partitioned into fixed shapes, and the tiles are loaded onto FPGA. Tiling on top of CaP-OaA makes the CaA N=16 CaA N=32 Native CaP-OAA N=16 CaP-OaA N=32 data flow of diverse CNNs on a target device identical to each other. The tiling factor f is the same for various convolution layers. After Cap-Oaa transforms the kernel filters and images to an uniform Figure 5: Comparison of computation complexity NXN shape, value of f becomes independent of the CNn model parameters, and is solely bound by the on-chip memory size. The motivation for loop tiling is to reduce the communication volume CaP-OaA. It is worth noticing that the various frequency domain to external memory by increased reuse of on-chip data [4].For convolution algorithms discussed so far are closely related to each frequency domain convolution, tradeoff exists between N and f to other. CaP-OaA reduces to oaa whend= 1. OaA further reduces balance computation complexity and data reuse Analysis on the o native frequency domain convolution whenn> lime + kern algorithm-architecture co-design is made in Section 5 1. Therefore, CaP-OaA is the most general version among these Although loop optimization for CNNs on FPGAs has been ex frequency domain convolution algorithms. Cap-Oaa also achieves tensivcly studicd, previous work [4, 8, 12]focused on convolution the highest hardware efficiency in space domain. Existing techniques cannot be directly applied to We further quantitatively analyze the computation complexity of frequency domain CNNS, since data flow of sliding window opera CaP-OaA CaP introduces a new variable d whose value can be set tions is different fron Hadamard product operations. On the other to approximate the ceiling function in Equation 5. It can be shown hand, our three techniques proposed in Section 3.1, 3.2 and 3.3 can that by setting d N-Lkern-1) (where gcd means ll be understood as loop optimizations in frequency domain OaA is analogous to loop tiling of limg, and CaP is analogous to loop Greatest Common Divisor), the complexity of Cap-OaA is tiling and unrolling of the Batch dimension With the optimizations in Section 3. 1, 3.2 and 3.3, we derive 0Cnp-n4
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 输入关键字,在本站1000多万海量源码库中尽情搜索: