In this video Svetlana Minakova, Erqian Tang and Todor Stefanov, from the Leiden Institute of Advanced Computer Science, present the paper entitled "Combining task- and data-level parallelism for high-throughput CNN inference on embedded CPUs-GPUs MPSoCs" which was accepted at the SAMOS XX International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.
Details of the publication:
S. Minakova, E. Tang, T. Stefanov, «Combining task- and data-level parallelism for high-throughput CNN inference on embedded CPUs-GPUs MPSoCs», in the Proceedings of the SAMOS XX International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, Pythagoreio, Samos Island, Greece, virtual event, July 4-6, 2020.
Abstract:
Nowadays Convolutional Neural Networks (CNNs) are widely used to perform various tasks in areas such as computer vision or natural language processing. Some of the CNN applications require high-throughput execution of the CNN inference, on embedded devices, and many modern embedded devices are based on CPUs-GPUs multi-processor systems-on-chip (MPSoCs). Ensuring high-throughput execu-tion of the CNN inference on embedded CPUs-GPUs MPSoCs is a complex task, which requires efficient utilization of both task-level (pipeline) and data-level parallelism, available in a CNN. However, the existing Deep Learning frameworks utilize only task-level (pipeline) or only data-level parallelism, available in a CNN, and do not take full advantage of allembedded MPSoC computational resources. Therefore, in this paper, we propose a novel methodology for efficient execution of the CNN inference on embedded CPUs-GPUs MPSoCs. In our methodology, we ensure efficient utilization of both task-level (pipeline) and data-level parallelism, available in a CNN, to achieve high-throughput execution of the CNN inference on embedded CPUs-GPUs MPSoCs.
Follow us on Linkedin and Twitter!