Self-conditioning in diffusion model

Introduction

Self-conditioning is a mechanism proposed in the and the goal is to improve samples quality without too much additional cost.

How does self-condition mechanism work?

It is helpful to compare the self-conditioning mechanism with standard diffusion process.

Standard diffusion process

The standard prediction step in a diffusion model without self-conditioning can be represented as:

\[\begin{equation} x_{t-1}=\frac{1}{\sqrt{\alpha_t}}\left(x_t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta\left(x_t, t\right)\right)+\sigma_t z \end{equation}\]

where

Self-Conditioning in Diffusion Models

In self-conditioning, the model leverages its own previous predictions. Suppose \(\hat{x}_{t-1}\) is the predicted state at step \(t-1\). The self-conditioning formula can be adjusted to incorporate this prediction:

\[\begin{equation} x_{t-1}=\frac{1}{\sqrt{\alpha_t}}\left(x_t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta\left(x_t, \hat{x}_t, t\right)\right)+\sigma_t z \end{equation}\]

Here, \(\epsilon_\theta\left(x_t, \hat{x}_t, t\right)\) represents the neural network’s noise prediction conditioned on both the current noisy input \(x_t\) and the previous prediction \(\hat{x}_t\).

Code implementation

Though there might be other implementation method, the most common implementation of including previous estimate \(\hat{x}_0\) is concatenation. Here is a code pointer.