|
Variational Bayesian methods, also called ensemble learning, are a family of techniques for approximating intractable integrals arising in Bayesian statistics and machine learning. They can be used to lower bound the marginal likelihood (i.e. "evidence") of several models with a view to performing model selection, and often provide an analytical approximation to the parameter posterior which is useful for prediction.
Mathematical derivation In variational inference, the posterior distribution over a set of latent variables given some data is approximated by a variational distribution The variational distribution is restricted to belong to a family of distributions of simpler form than . This family is selected with the intention that can be made very similar to the true posterior. The difference between and this true posterior is measured in terms of a dissimilarity function and hence inference is performed by selecting the distribution that minimises . One choice of dissimilarity function where this minimisation is tractable is the Kullback-Leibler divergence (KL divergence), defined as We can write the log evidence as As the log evidence is fixed with respect to , maximising the final term will minimise the KL divergence between and . By appropriate choice of , we can make tractable to compute and to maximise. Hence we have both a lower bound on the evidence and an analytical approximation to the posterior . See also | ||||||||
|
| |||||||||
![]() |
|
| |