This blog explains the main content of Chapter 2 in The Principles of Diffusion Models. We will explain the diffusion model from the variational perspective. We first introduce the original architecture of autoencoder. We then explain the development of variational autoencoder. The limitation of variational autoencoder leads to the development of diffusion model, i.e., denoising diffusion probabilistic model (DDPM). From the discrete data modeling perspective, we then see how the discrete variational autoencoder discrete VAE and the discrete denoising diffusion probabilistic models.
The autoencoder model was originally developed by Hinton and Salakhutdinov (Hinton & Salakhutdinov, 2006). In this paper, an AE model is trained via a deep neural network, which maps high-dimensional data points $\mathbf{x}$ to low-dimensional codes $\mathbf{z}$. These latent codes are good representation of original data points and used for the task of classification, regression, document retrival, and visualization, etc. It contains an encoder and a decoder:
We assume the data points are from continuous probability distribution $p_{\text{data}}(\mathbf{x})$. Given a set of data points ${\mathbf{x}^{(i)}: i=1,2,\ldots,N}$ sampled i.i.d. from $p_{\text{data}}$, the training procedure of an AE model tries to minimize the following objective function: \(\mathcal{L}_{\text{Vanilla-AE}}(\mathbf{\theta},\mathbf{\phi}) := \frac{1}{N} \sum_{i=1}^N \left\| \mathbf{x}^{(i)} - f_{\mathbf{\theta}}\big( g_{\mathbf{\phi}} ( \mathbf{x}^{(i)} ) \big)\right\|_2^2\)
There are variants of vanilla autoencoder:
ddd
ddd