A Generalized Multinomial Distribution from Dependent Categorical Random Variables

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

For the full paper, which includes all proofs, download the pdf  here.

Generating a Sequence of Correlated Categorical Random Variables

For brevity, we will take the acronym DCRV for a Dependent Categorical Random Variable. A DCRV sequence \epsilon = (\epsilon_{1},...,\epsilon_{n}) is in itself a random variable, and thus has a probability distribution. In order to provide an algorithm to generate such a sequence, we first derive this probability distribution.

Probability Distribution of a DCRV Sequence

The probability distribution of the DCRV sequence \epsilon of length n is given formally in the following theorem. The proof follows in a straightforward fashion from the construction of dependent categorical random variables and is therefore omitted.

Theorem 5: Distribution of a DCRV Sequence.  Let (\Sigma,\mathcal{F}, \mathbb{P}) = ([0,1], \mathcal{B}, \mu). Let \epsilon_{i} : [0,1] \to \{1,...,K\},i = 1,...,n,n \in \mathbb{N} be DCRVs as defined in the construction. Let \epsilon = (\epsilon_{1},...,\epsilon_{n}) denote the DCRV sequence with observed values e = (e_{1},...,e_{n}). Then \mu has the density
f(x) = \sum_{i=1}^{K^{n}}K^{n}m^{i}\mathbb{1}_{\tiny((i-1)/K^{n}, i/K^{n}]}(x)
and
P(\epsilon = e) = \int\limits_{\left(\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right]}f(x)dx = m^{i}

where m^{i} is the mass allocated to the interval \left(\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right] by the probability mass flow and i as the lexicographic order of e in the sample space \{1,...,K\}^{n} given by the relation \frac{i}{K^{n}} = \sum_{j=1}^{K^{n}}\frac{\epsilon_{j}-1}{K^{j}}

Algorithm

We take a common notion of using a uniform random variable in order to generate the desired random variable \epsilon. For \epsilon with distribution F(x) = \int_{0}^{x}f(y)dy, f(x) as in the above equation, it is clear that F is invertible with inverse F^{-1}. Thus, F^{-1}(U) for U \sim \text{Uniform}[0,1] has the same distribution as \epsilon. Therefore, sampling u from U is equivalent to the sample e = F^{-1}(u) from \epsilon.

In the construction of the DCRVs, we associated \epsilon_{n} to the intervals

\begin{aligned}\epsilon_{N} =i &\text{ on } \left(\tfrac{l-1}{K^{N}}, \tfrac{l}{K^{N}}\right], & i \equiv l \mod K , & \quad i = 1,\ldots,K-1 \\\epsilon_{N} = K &\text{ on } \left(\tfrac{l-1}{K^{N}}, \tfrac{l}{K^{N}}\right], & 0 \equiv l \mod K\end{aligned}

From the construction, each sequence has a 1-1 correspondence with the interval \left[\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right) for a unique i = 1,\ldots,K^{n}. The probability of such a sequence can be found using Theorem 5:

P(\epsilon = e) = F\left(\left[\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right)\right) = m^{i} = l\left([s_{i-1}, s_{i})\right)

where l is the length of the above interval, and s_{i} = \sum_{j=1}^{i}m^{j}. Therefore, we have now partitioned the interval [0,1) according to the distribution of \epsilon bijectively to the K-nary partition of [0,1) corresponding to the particular sequence. Thus, sampling u \in [0,1) from a uniform distribution and finding the interval [s_{i-1},s_{i}) and corresponding i will yield the unique DCRV sequence.

Algorithm Strategy: Given u \in [0,1) and n \in \mathbb{N}, find the unique interval [s_{i-1}, s_{i}), i= 1,\ldots,K^{n} that contains u by “moving down the tree” and narrowing down the “search interval” until level n is reached.

We provide an explicit example prior to the pseudocode to illustrate the strategy.

Example DCRV Sequence Generation

Probabilistic Partitions of [0,1) for a DCRV sequence of length 2 with K=3
Suppose K = 3, p_{1} = p_{2} = p_{3} = 1/3,  and  \delta = 1/2, and suppose we wish to generate a DCRV sequence of length 2. The figure above gives the probability flow and the corresponding probability intervals [s_{i-1}, s_{i}) that partition [0, 1) according to Theorem 4. Now, suppose we sample a uniform random variable u, with observed value u = \frac{3}{4}. We now illustrate the process of moving down the above tree to generate the sequence.

  1. The first level of the probability tree partitions the interval [0, 1) into three intervals given in the figure above. u = \frac{3}{4} lies in the third interval, which corresponds to \epsilon_{1} = 3. Thus, the first entry of e is given by e_{1} = 3.
  2. Next, the search interval is reduced to the interval from the first step [2/3, 1). We then generate the partitions of [2/3, 1) by cumulatively adding p_{3}p_{i}^{-}, i = 1,2,3 to the left endpoint 2/3. Thus, the next partition points are
    • 2/3 + (1/3)(1/6) = 13/18
    • 2/3 + (1/3)(1/6) + (1/3)(1/6) = 7/9, and
    • 2/3 + (1/3)(1/6) + (1/3)(1/6) + (1/3)(2/3) = 1

Yielding the subintervals of [2/3, 1):

  • [2/3, 13/18)
  • [13/18, 7/9)
  • [7/9, 1)

We now find the interval from above that contains u is the second: [13/18,  7/9). Thus, \epsilon_{2} = 2. Since we only sought a sequence of length 2, the algorithm is finished, and we have generated the sequence e = (3, 2). If a longer sequence is desired, we repeat step 2 until we are at level n. For the pseudocode of the algorithm, see the pdf, linked at the top of the page.