Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation

Generative AI & LLMs
Published: arXiv: 2511.23440v1
Authors

Bernhard Klein Falk Selker Hendrik Borras Sophie Steger Franz Pernkopf Holger Fröning

Abstract

Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confident yet incorrect predictions. Bayesian neural networks (BNNs) address this by providing probabilistic estimates, but incur high computational cost because predictions require sampling weight distributions and multiple forward passes. The Probabilistic Forward Pass (PFP) offers a highly efficient approximation to Stochastic Variational Inference (SVI) by assuming Gaussian-distributed weights and activations, enabling fully analytic uncertainty propagation and replacing sampling with a single deterministic forward pass. We present an end-to-end pipeline for training, compiling, optimizing, and deploying PFP-based BNNs on embedded ARM CPUs. Using the TVM deep learning compiler, we implement a dedicated library of Gaussian-propagating operators for multilayer perceptrons and convolutional neural networks, combined with manual and automated tuning strategies. Ablation studies show that PFP consistently outperforms SVI in computational efficiency, achieving speedups of up to 4200x for small mini-batches. PFP-BNNs match SVI-BNNs on Dirty-MNIST in accuracy, uncertainty estimation, and OOD detection while greatly reducing compute cost. These results highlight the potential of combining Bayesian approximations with code generation to enable efficient BNN deployment on resource-constrained systems.

Paper Summary

Problem
Deep learning models, particularly Bayesian Neural Networks (BNNs), are powerful tools for reasoning under uncertainty. However, they come with a significant computational cost, making them challenging to deploy on resource-constrained embedded systems like smartphones, self-driving cars, or IoT devices. The main problem addressed by this research is finding ways to accelerate the execution of BNNs on these devices without sacrificing their performance.
Key Innovation
This paper presents a novel approach to accelerating BNNs using a technique called Probabilistic Forward Pass (PFP). PFP is a method that generates samples from the output distribution of a neural network without requiring multiple forward passes. This is achieved by using a single probabilistic forward pass to predict the mean and variance of the output distribution, which are then used to generate samples. The researchers also develop a custom operator library for PFP-based BNNs and optimize its execution on embedded ARM CPUs using a deep learning compiler called TVM.
Practical Impact
The practical impact of this research is significant. By accelerating the execution of BNNs, the researchers can enable their deployment on resource-constrained devices, which can lead to breakthroughs in various applications such as: * Real-time object detection and tracking in self-driving cars * Uncertainty-aware decision-making in healthcare and finance * Efficient edge AI for IoT devices * Improved image and speech recognition on smartphones The researchers demonstrate the effectiveness of their approach by achieving speedups of up to 4200× on small mini-batch sizes and comparable performance to Stochastic Variational Inference (SVI) on the Dirty-MNIST dataset.
Analogy / Intuitive Explanation
Imagine you're trying to predict the weather by looking at a probability distribution of possible outcomes. Traditional neural networks would give you a single prediction, while Bayesian Neural Networks would give you a distribution of possible outcomes, along with their probabilities. The Probabilistic Forward Pass is like a shortcut that allows you to generate samples from this distribution without having to run the network multiple times. This shortcut is crucial for accelerating the execution of BNNs on resource-constrained devices.
Paper Information
Categories:
cs.LG cs.AR cs.DC stat.ML
Published Date:

arXiv ID:

2511.23440v1

Quick Actions