A Generalized Multinomial Distribution from Dependent Categorical Random Variables

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

Generating a Sequence of Correlated Categorical Random Variables

For brevity, we will take the acronym DCRV for a Dependent Categorical Random Variable. A DCRV sequence $\epsilon = (\epsilon_{1},...,\epsilon_{n})$ is in itself a random variable, and thus has a probability distribution. In order to provide an algorithm to generate such a sequence, we first derive this probability distribution.

Probability Distribution of a DCRV Sequence

The probability distribution of the DCRV sequence $\epsilon$ of length n is given formally in the following theorem. The proof follows in a straightforward fashion from the construction of dependent categorical random variables and is therefore omitted.

Theorem 5: Distribution of a DCRV Sequence.  Let $(\Sigma,\mathcal{F}, \mathbb{P}) = ([0,1], \mathcal{B}, \mu)$. Let $\epsilon_{i} : [0,1] \to \{1,...,K\},i = 1,...,n,n \in \mathbb{N}$ be DCRVs as defined in the construction. Let $\epsilon = (\epsilon_{1},...,\epsilon_{n})$ denote the DCRV sequence with observed values $e = (e_{1},...,e_{n})$. Then $\mu$ has the density
$$f(x) = \sum_{i=1}^{K^{n}}K^{n}m^{i}\mathbb{1}_{\tiny((i-1)/K^{n}, i/K^{n}]}(x)$$
and
$$P(\epsilon = e) = \int\limits_{\left(\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right]}f(x)dx = m^{i}$$

where $m^{i}$ is the mass allocated to the interval $\left(\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right]$ by the probability mass flow and i as the lexicographic order of e in the sample space $\{1,...,K\}^{n}$ given by the relation $\frac{i}{K^{n}} = \sum_{j=1}^{K^{n}}\frac{\epsilon_{j}-1}{K^{j}}$

Algorithm

We take a common notion of using a uniform random variable in order to generate the desired random variable $\epsilon$. For $\epsilon$ with distribution $F(x) = \int_{0}^{x}f(y)dy, f(x)$ as in the above equation, it is clear that F is invertible with inverse $F^{-1}$. Thus, $F^{-1}(U)$ for $U \sim \text{Uniform}[0,1]$ has the same distribution as $\epsilon$. Therefore, sampling u from U is equivalent to the sample $e = F^{-1}(u)$ from $\epsilon$.

In the construction of the DCRVs, we associated $\epsilon_{n}$ to the intervals

\begin{aligned}\epsilon_{N} =i &\text{ on } \left(\tfrac{l-1}{K^{N}}, \tfrac{l}{K^{N}}\right], & i \equiv l \mod K , & \quad i = 1,\ldots,K-1 \\\epsilon_{N} = K &\text{ on } \left(\tfrac{l-1}{K^{N}}, \tfrac{l}{K^{N}}\right], & 0 \equiv l \mod K\end{aligned}

From the construction, each sequence has a 1-1 correspondence with the interval $\left[\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right)$ for a unique $i = 1,\ldots,K^{n}$. The probability of such a sequence can be found using Theorem 5:

$$P(\epsilon = e) = F\left(\left[\frac{i-1}{K^{n}}, \frac{i}{K^{n}}\right)\right) = m^{i} = l\left([s_{i-1}, s_{i})\right)$$

where l is the length of the above interval, and $s_{i} = \sum_{j=1}^{i}m^{j}$. Therefore, we have now partitioned the interval [0,1) according to the distribution of $\epsilon$ bijectively to the K-nary partition of [0,1) corresponding to the particular sequence. Thus, sampling $u \in [0,1)$ from a uniform distribution and finding the interval $[s_{i-1},s_{i})$ and corresponding i will yield the unique DCRV sequence.

Algorithm Strategy: Given $u \in [0,1)$ and $n \in \mathbb{N}$, find the unique interval $[s_{i-1}, s_{i}), i= 1,\ldots,K^{n}$ that contains u by “moving down the tree” and narrowing down the “search interval” until level n is reached.

We provide an explicit example prior to the pseudocode to illustrate the strategy.

Example DCRV Sequence Generation

Suppose $K = 3, p_{1} = p_{2} = p_{3} = 1/3,$  and  $\delta = 1/2$, and suppose we wish to generate a DCRV sequence of length 2. The figure above gives the probability flow and the corresponding probability intervals $[s_{i-1}, s_{i})$ that partition [0, 1) according to Theorem 4. Now, suppose we sample a uniform random variable u, with observed value $u = \frac{3}{4}$. We now illustrate the process of moving down the above tree to generate the sequence.

1. The first level of the probability tree partitions the interval [0, 1) into three intervals given in the figure above. $u = \frac{3}{4}$ lies in the third interval, which corresponds to $\epsilon_{1} = 3$. Thus, the first entry of e is given by $e_{1} = 3$.
2. Next, the search interval is reduced to the interval from the first step [2/3, 1). We then generate the partitions of [2/3, 1) by cumulatively adding $p_{3}p_{i}^{-}, i = 1,2,3$ to the left endpoint 2/3. Thus, the next partition points are
• 2/3 + (1/3)(1/6) = 13/18
• 2/3 + (1/3)(1/6) + (1/3)(1/6) = 7/9, and
• 2/3 + (1/3)(1/6) + (1/3)(1/6) + (1/3)(2/3) = 1

Yielding the subintervals of [2/3, 1):

• [2/3, 13/18)
• [13/18, 7/9)
• [7/9, 1)

We now find the interval from above that contains u is the second: [13/18,  7/9). Thus, $\epsilon_{2} = 2$. Since we only sought a sequence of length 2, the algorithm is finished, and we have generated the sequence e = (3, 2). If a longer sequence is desired, we repeat step 2 until we are at level n. For the pseudocode of the algorithm, see the pdf, linked at the top of the page.