Fast convolutional neural networks on FPGAs with hls4ml

Aarrestad, Thea and Loncar, Vladimir and Ghielmetti, Nicolò and Pierini, Maurizio and Summers, Sioni and Ngadiuba, Jennifer and Petersson, Christoffer and Linander, Hampus and Iiyama, Yutaro and Di Guglielmo, Giuseppe and Duarte, Javier and Harris, Philip and Rankin, Dylan and Jindariani, Sergo and Pedro, Kevin and Tran, Nhan and Liu, Mia and Kreinar, Edward and Wu, Zhenbin and Hoang, Duc (2021) Fast convolutional neural networks on FPGAs with hls4ml. Machine Learning: Science and Technology, 2 (4). 045015. ISSN 2632-2153

[thumbnail of Aarrestad_2021_Mach._Learn.__Sci._Technol._2_045015.pdf] Text
Aarrestad_2021_Mach._Learn.__Sci._Technol._2_045015.pdf - Published Version

Download (2MB)

Abstract

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.

Item Type: Article
Subjects: Afro Asian Archive > Multidisciplinary
Depositing User: Unnamed user with email support@afroasianarchive.com
Date Deposited: 05 Jul 2023 04:33
Last Modified: 24 Jun 2024 05:20
URI: http://info.stmdigitallibrary.com/id/eprint/1175

Actions (login required)

View Item
View Item