A Generalized Multinomial Distribution from Dependent Categorical Random Variables

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

For the full paper, which includes all proofs, download the pdf  here.

Construction of Dependent Categorical Variables

Suppose each categorical variable has K possible categories, so the sample space S = \{1,\ldots,K\} (These integers should not be taken as ordered or sequential, but rather as character titles of categories.) The construction of the correlated categorical variables is based on a probability mass distribution over K-adic partitions of [0,1]. We will follow graph terminology in our construction, as this lends a visual representation of the construction. We begin with a parent node and build a K-nary tree, where the end nodes are labeled by the repeating sequence (1,…, K). Thus, after N steps, the graph has K^{N} nodes, with each node labeled 1,…,K repeating in the natural order and assigned injectively to the intervals
\left(0,\frac{1}{K^{N}}\right],\left(\frac{1}{K^{n}},\frac{2}{K^{N}}\right],\ldots,\left(\frac{K^{N}-1}{K^{n}},1\right].

Define

\begin{aligned}&\epsilon_{N} = i \qquad \text{ on } \left(\tfrac{Kj}{K^{N}}, \tfrac{Kj+i}{K^{N}}\right] \\&\epsilon_{N} = K \qquad \text{on } \left(\tfrac{K(j+1)-1}{K^{N}}, \tfrac{K(j+1)}{K^{N}}\right] \end{aligned}

where i = 1,...,K-1, and j = 0,1,\ldots,K^{N-1}-1. An alternate expression for the above equation is

\begin{aligned}\epsilon_{N} =i &\text{ on } \left(\tfrac{l-1}{K^{N}}, \tfrac{l}{K^{N}}\right], & i \equiv l \mod K , & \quad i = 1, \ldots ,K-1\\ \epsilon_{N} = K &\text{ on } \left(\tfrac{l-1}{K^{N}}, \tfrac{l}{K^{N}}\right], & 0 \equiv l \mod K\end{aligned}

To each of the branches of the tree, and by transitivity to each of the K^{N} partitions of [0,1], we assign a probability mass to each node such that the total mass is 1 at each level of the tree in a similar manner to [4].

Let 0 < p_{i} < 1, i=1,\ldots,K such that \sum_{i=1}^{K}p_{i}=1, and let 0 \leq \delta \leq 1 be the dependency coefficient. Define

p_{i}^{+} := p_{i} + \delta\sum_{l \neq i}p_{l}, \quad p_{i}^{-} := p_{i}-\delta p_{i}

for= 1,…,K. These probabilities satisfy two important criteria detailed in the following lemmata:


Lemma 1.

\begin{aligned}\sum_{i=1}^{K}p_{i} = p_{i}^{+} + \sum_{l \neq i}p_{l}^{-} = 1\\ p_{i}p_{i}^{+} + p_{i}^{-}\sum_{l \neq i}p_{l} = p_{i}\end{aligned}


We now give the construction in steps down each level of the K-nary tree.

LEVEL 1:

Parent node has mass 1, with mass split 1\cdot\prod_{i=1}^{K}p_{i}^{[\epsilon_{1} = i]}, where [\cdot] is an Iverson bracket. This level corresponds to a sequence \epsilon of dependent categorical variables of length 1.

\begin{array}{cccc}\epsilon_{1}(\text{\textbf{Branch}}) & \textbf{Path} & \textbf{Mass at Node} & \textbf{Interval}\\1 &\textit{parent}\to 1 & p_{1}& (0,1/K] \\2 &\textit{parent} \to 2 & p_{2}& (1/K,2/K]\\ \vdots & \vdots & \vdots &\vdots \\i &\textit{parent} \to i & p_{i}& ((i-1)/K,i/K]\\K &\textit{parent} \to K & p_{K}& ((K-1)/K,1]\end{array}
LEVEL 2:

Level 2 has K nodes, with K branches stemming from each node. This corresponds to a sequence of length 2: \epsilon = (\epsilon_{1}, \epsilon_{2}). Denote i.1 as node i from level 1. For i = 1,…,K,

Node i.1 has mass p_{i}, with mass split p_{i}\left(p_{i}^{+}\right)^{[\epsilon_{2}=i]}\prod\limits_{j=1,j\neq i}^{K}\left(p_{j}^{-}\right)^{[\epsilon_{2} = j]}.

\begin{array}{cccc}\epsilon_{2}(\textbf{Branch})& \textbf{Path} & \textbf{Mass at Node} & \textbf{Interval}\\1 &i.1 \to 1 & p_{i}p_{1}^{-} & \left(\frac{(i-1)K}{K^{2}},\frac{(i-1)K + 1}{K^{2}}\right]\\2 &i.1 \to 2 & p_{i}p_{2}^{-} & \left(\frac{(i-1)K + 1}{K^{2}},\frac{(i-1)K+ 2}{K^{2}}\right]\\\vdots & \vdots & \vdots & \vdots\\i &i.1 \to i & p_{i}p_{i}^{+}& \left(\frac{(i-1)K + (i-1)}{K^{2}},\frac{(i-1)K+ i}{K^{2}}\right]\\\vdots & \vdots & \vdots & \vdots \\K &i.1 \to K & p_{i}p_{K}^{-}& \left(\frac{iK-1}{K^{2}},\frac{iK}{K^{2}}\right]\end{array}

In this fashion, we distribute the total mass 1 across level 2. In general, at any level r, there are K streams of probability flow at Level r. For \epsilon_{1} = i, the probability flow is given by
p_{i}\prod_{j = 2}^{r}\left[\left(p_{i}^{+}\right)^{[\epsilon_{j} = i]}\prod_{l\neq i }\left(p_{l}^{-}\right)^{[\epsilon_{j} = l]}\right],\qquad i = 1,\ldots,K\qquad\qquad(*)

We use this flow to assign mass to the Kr intervals of [0,1] at level r in the same way as above. We may also verify via algebraic manipulation that

p_{i} = p_{i}\left(p_{i}^{+} + \sum_{l\neq i}p_{i}^{-}\right)^{r-1} =p_{i}\sum_{\epsilon_{2},\ldots,\epsilon_{r}}\prod_{j=2}^{r}\left[\left(p_{i}^{+}\right)^{[\epsilon_{j} = i]}\prod_{l \neq i}\left(p_{l}^{-}\right)^{[\epsilon_{j} = l]}\right]

where the first equality is due to the first statement of Lemma 1, and the second is due to the probability flow at level r, given in (*) above.