~ read.

Stick-Breaking Construction of Dirichlet Process

In the field of Bayesian nonparametrics, there exist several different methods to generate samples from a Dirichlet process. One can think of the Dirichlet process as a generalized Dirichlet distribution, when there is an infinite number of classes. Recall that the Dirichlet process is equipped with two parameters, a concentration parameter $ \alpha_0 $ and a base distribution $ G_0 $. The concentration parameter effectively governs how many classes are active in a draw from the Dirichlet process, i.e. how evenly distributed the classes are: The higher $ \alpha_0 $, the more classes will be active and less extreme will the probabilities differ.

The most common method to draw samples from a Dirichlet process (besides the Chinese restaurant process) is the so-called stick-breaking construction: Imagine a stick of unit length. You then break of a single piece from this stick: The length of that piece gives you the probability for an observation to belong to the first class, which we could denote by $( \pi_1 $). You then break of a second piece from the remaining stick, which has length $( 1 - \pi_1 $). Again, you write down the length of the broken off piece as the probability to belonging to class two, let's say $( \pi_2 $). In principle, you would need to repeat this an infinite number of times (although the stick will get smaller and smaller, we can always continue to cut the remainder in two), in practice we would stop after a certain amount of class probabilities have been drawn and choose the latest one to be $( 1 - \sum_j \pi_j $) to ensure that the probabilities sum to one. Also, the Dirichlet processes endowes each class $ i $ with a class parameter $ \theta_i $ drawn from the base distribution. Since we have an infinite number of classes (even though only a small subset will be active), we must draw $ \theta_i \sim G_0 $ for all $ i = 1, 2, \ldots $.

Although the above paragraph explains most of how the stick-breaking construction works, the most crucial part has been left out: Where do we break the stick off? It is obvious that this must be somehow influenced by $( \alpha_0 $), as say a low concentration parameter will make it likely that a large chunk of the stick is broken off at the first iteration. Formally, the length of the piece which is broken off is determined by a draw from a $( \operatorname{Beta}(\alpha_0, 1) $) distribution. Since the Beta distribution has support on the $( [0,1] $) interval, we have to multiply the draw with the remaining length of the stick to determine the length of the stick which is broken off at any given iteration. Hence, if we let $( p_i \sim \operatorname{Beta}(\alpha_0, 1) $), we calculate the class probabilities as $ \pi_i = p_i \prod_{j<i} (1 - p_j) $.

In practice, one is not able to draw an infinite number of class probabilities and parameters, which would also be a useless task as the respective probabilities will be almost equal to zero anyway. So we pick a cut-off of point for the number of classes, where the last class is just calculated as one minus the sum of the previous class probabilities to ensure that they sum to one. I created a little Shiny web application to visualize samples of the $ \pi_i $'s using the stick-breaking construction under various configurations. See below: