A Generalized Multinomial Distribution from Dependent Categorical Random Variables

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

For the full paper, which includes all proofs, download the pdf  here.


FK Dependence Tree showing the probability mass flow

Korzeniowski defined the notion of dependence in a way we will refer to here as dependence of the first kind (FK dependence). Suppose (\epsilon_{1},...,\epsilon_{N}) is a sequence of Bernoulli random variables, and P(\epsilon_{1} = 1) = p. Then, for \epsilon_{i}, i \geq 2, we weight the probability of each binary outcome toward the outcome of \epsilon_{1}, adjusting the probabilities of the remaining outcomes accordingly.

Formally, let 0 \leq \delta \leq 1, and q = 1-p. Then define the following quantities
\begin{aligned}p^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 1) = p + \delta q, &\qquad p^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = p-\delta p \\ q^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 0) = q+\delta p , &\qquad q^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = q -\delta q\end{aligned}

Given the outcome i of \epsilon_{1}, the probability of outcome i occurring in the subsequent Bernoulli variables \epsilon_{2}, \epsilon_{3},...,\epsilon_{n} is p^{+}, i = 1 or q^{+}, i=0. The probability of the opposite outcome is then decreased to q^{-} and p^{-}, respectively.

The above illustrates the possible outcomes of a sequence of such dependent Bernoulli variables. Korzeniowski showed that, despite this conditional dependency, P(\epsilon_{i} = 1) = p for all i. That is, the sequence of Bernoulli variables is identically distributed, with correlation shown to be
\text{Cor}(\epsilon_{i}, \epsilon_{j}) = \left\{\begin{array}{lr} \delta, & i=1 \\ \delta^{2}, & i \neq j;\,i,j \geq 2\end{array}\right.

These identically distributed but correlated Bernoulli random variables yield a Generalized Binomial distribution with a similar form to the standard binomial distribution. In our generalization, we use the same form of FK dependence, but for categorical random variables. We will construct a sequence of identically distributed but dependent categorical variables from which we will build a generalized multinomial distribution. When the number of categories K = 2, the distribution reverts back to the generalized binomial distribution of Korzeniowski [4]. When the sequence is fully independent, the distribution reverts back to the independent categorical model and the standard multinomial distribution, and when the sequence is independent and K=2, we recover the standard binomial distribution. Thus, this new distribution represents a much larger generalization than prior models.