报告人:匹兹堡大学 Prof. Wei Gao
报告地点:腾讯会议 813-805-126
报告华体会网页版登陆入口:北京华体会网页版登陆入口7月22日上午9:30,暨美东华体会网页版登陆入口7月21日晚21:30
报告题目:Real-Time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI
主持人:张德宇 副教授
个人简介:
Wei Gao is currently an Associate Professor in the Department of Electrical and Computer Engineering at University of Pittsburgh. His research interests widely include mobile and embedded computing, edge computing, on-device AI, mobile health, and Internet of Things. He is the recipient of NSF CAREER award, best paper award of ACM CoNEXT and spotlight paper award of IEEE Transactions on Mobile Computing. He currently serves on the Editorial Board of IEEE Transactions on Mobile Computing, and also serves on the organizing committees and technical program committees of many other premier conference venues, including MobiCom, SenSys, INFOCOM, etc.
报告摘要:
With the wide adoption of AI applications, there is a pressing need of enabling real-time neural network (NN) inference on small embedded devices at the edge, but deploying NNs on these small devices is challenging due to their extremely weak capabilities. Although NN partitioning and offloading can contribute to such deployment, they are incapable of minimizing the local costs at embedded devices. In this talk, I will introduce our recent research progress of enabling NN inference on weak edge devices (e.g., microcontrollers), via agile NN offloading that migrates the required NN computations from online inference to offline learning. More specifically, our approach leverages explainable AI techniques to explicitly enforce feature sparsity when training the NN model offline. Such sparsity, then, allows online inference to offload the majority of NN features for remote computation with much higher compressibility. Our experiment results show that our agile offloading framework can reduce the inference latency on weak MCUs to <20ms, ensuring that sensory data on embedded devices can be timely consumed. It also reduces the local resource consumption by >8x, without impairing the inference accuracy.