Why you should use Bayesian Neural Network?

October 17, 2021

Why you should use Bayesian Neural Network?

Bayesian Neural Network explains the uncertainties from the model and provide the distributions over the weights and outputs.

Goal

This article is to help those having no experience towards Bayesian Neural Network and serves for below purposes:

Illustrate the key differences between Standard Neural Network and Bayesian Neural Network
Explain different types of uncertainties
Discuss the advantages and limitations of Bayesian Neural Network

What is Bayesian Neural Network?

Bayesian neural network (BNN) combines neural network with Bayesian inference. Simply speaking, in BNN, we treat the weights and outputs as the variables and we are finding their marginal distributions that best fit the data. The ultimate goal of BNN is to quantify the uncertainty introduced by the models in terms of outputs and weights so as to explain the trustworthiness of the prediction.

Difference between Standard NN & Bayesian NN

To me, I would consider the Bayesian neural network as the extension of the Standard neural network. Below I have consolidated three key points about the differences between these two neural networks.

GOAL — SNN focuses on optimization while BNN focuses on marginalization. Optimization would find one optimal point to represent a weight while marginalization would treat each weight as a variable and find its distribution.
ESTIMATE — The estimate of the parameters for SNN would be maximum likelihood estimators (MLE) while for BNN, the estimate would be rather maximum a posteriori (MAP) or predictive distribution.
METHOD — Basically, SNN would use differentiation to find the optimal value such as gradient descent. In BNN, since the sophisticated integrals are hard to determine, scientists or researchers would always rely on Markov Chain Monte Carlos (MCMC), Variational Inference, and Normalizing Flows these kinds of techniques.

Aleatory Uncertainty & Epistemic Uncertainty

So now you are able to distinguish SNN and BNN and know the difference between them. As mentioned, BNN is used to measure the uncertainties of the model. In fact, there are two types of uncertainties.

Aleatory Uncertainty

Aleatoric uncertainty is also known as statistical uncertainty. In Statistics, it is representative of unknowns that differ each time we run the same experiment (train the model). In deep learning, aleatory uncertainty refers to the uncertainty of the model weights. As shown in the below chart, every time we train the model, the weights may slightly vary. This variation is actually the aleatory uncertainty.

Epistemic Uncertainty

Epistemic uncertainty is also known as systematic uncertainty. In deep learning, it refers to the uncertainty of the model outputs. As shown in the below chart, given that the black line is the prediction and the orange area would be the epistemic uncertainty. You can regard it as the confidence level of the prediction. In the other words, it tells you how confident your prediction result is. If the interval is small, the actual value would have a larger chance to have a closer value towards your prediction value. In the contrary, if the interval is large, the actual value may have a big discrepancy with your prediction value.

Advantages of Bayesian NN

By using Bayesian NN, you can benefit from

1. Training a Robust Model

Instead of taking into account just a single set of weights, BNN would find the distributions of the weights. By catering to the probability distributions, it can avoid the overfitting problem by addressing the regularization properties.

2. Getting a Prediction Interval

BNN model provides the whole picture towards the prediction which allows you to automatically calculate the uncertainties associated with your prediction when dealing with unknown targets.

Limitation of Bayesian NN

Despite the fact that BNN is useful in addressing the uncertainties issue, there are several limitations that should be noted.

1. Demanding Maths & Stats Knowledge

It is not an easy task to understand all the theories and formulae behind BNN. You are required to have a strong background in statistical distributions so as to apply the appropriate prior and posterior functions.

2. Longer Training Epoches to Converge

Since the model architecture is much more sophisticated (you can imagine that SNN only trains a single point while BNN is training the distribution parameters), it required a much longer time to converge for training.

Conclusion

From the article, I hope you guys have a basic understanding of BNN and the first feeling of how it works and why it is useful. In the next article, I would like to explain the details and concepts of BNN. Stay tuned and hope you enjoy reading the article. =)

cyda