Part I: Classical divergences

Classical divergences provide an asymmetric measure the distance between two probability distributions.

In 1877 Ludwig Boltzmann defined the entropy in order to quantify the statistical disorder of a physical system.

In 1948 Claude Shannon while working in Bell labs publishes A mathematical theory of communication where he introduces the entropy of a discrete random variable by the formula

$$ H(X)= E_{P_X} \left( \log \frac{1}{P_X}\right) $$

where $P_X$ denotes the probability mass function of the discrete random variable $X$. The entropy $H(X)$ takes non-negative values and measures the uncertainty of predicting the values of the random variable X.

Solomon Kullback and Richard Leibler in 1951 defined the Kullback-Leibler divergence between two probability distributions, in their paper On information and sufficiency. If $P$ and $Q$ are two probability distributions, then the Kullback-Leibler divergence from $P$ to $Q$ is defined as

$$ D(P||Q)= \left\{ \begin{array}{ll} E_P \left(\log \frac{dP}{dQ}\right)= \sum_i\limits P(i) \log(\frac{P(i)}{Q(i)}) & \text{if } P \ll Q\\ \infty & \text{otherwise}, \end{array}\right. $$

(where in the last formula (and the rest of this presentation) we focus on discrete probability distributions). Here, the convention $0 \log \frac{0}{0}=0$ is used. This is an asymmetric distance between the probability distributions $P$ and $Q$, and it represents a measure of our surprise when the values of a random variable $X$ are revealed to us, where $X \sim P$ but we erroneously believe that $X \sim Q$.

In 1961 Alfred Renyi published On measures of entropy and information where he introduced a one parameter family of divergences $D_\alpha (P||Q)$ for $\alpha \in (0,1) \cup (1, \infty)$, defined by the formula

$$ D_{\alpha}(P||Q) = \left\{\begin{array}{ll} \frac{1}{\alpha -1} \log\left( \sum_i P(i)^{\alpha}Q(i)^{1-\alpha}\right), & \text{if } \alpha<1, \text{ or } (\alpha>1 \text{ and } P\ll Q) \\ \infty, & \text{otherwise.} \end{array} \right. $$

It can be shown that

$$ ⁍ $$

In the same paper, Renyi introduced a family of divergences that are called $f$-divergences and they are defined as follows

$$ D_f (P||Q)= \left\{ \begin{array}{ll} E_Q f \left( \frac{dP}{dQ} \right) & \text{if } P \ll Q \\ \infty & \text{otherwise}, \end{array} \right. $$

under the convention $0 f(\frac{0}{0}) =0$. Here $f:(0,\infty) \to \R$ is a convex or concave function. It can be proved that: