﻿
A Generalized Multinomial Distribution from Dependent Categorical Random Variables

# A Generalized Multinomial Distribution from Dependent Categorical Random Variables

## Background

Korzeniowski defined the notion of dependence in a way we will refer to here as dependence of the first kind (FK dependence). Suppose $(\epsilon_{1},...,\epsilon_{N})$ is a sequence of Bernoulli random variables, and $P(\epsilon_{1} = 1) = p$. Then, for $\epsilon_{i}, i \geq 2$, we weight the probability of each binary outcome toward the outcome of $\epsilon_{1}$, adjusting the probabilities of the remaining outcomes accordingly.

Formally, let $0 \leq \delta \leq 1$, and $q = 1-p$. Then define the following quantities
\begin{aligned}p^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 1) = p + \delta q, &\qquad p^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = p-\delta p \\ q^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 0) = q+\delta p , &\qquad q^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = q -\delta q\end{aligned}

Given the outcome $i$ of $\epsilon_{1}$, the probability of outcome $i$ occurring in the subsequent Bernoulli variables $\epsilon_{2}, \epsilon_{3},...,\epsilon_{n}$ is $p^{+}, i = 1$ or $q^{+}, i=0$. The probability of the opposite outcome is then decreased to $q^{-}$ and $p^{-}$, respectively.

The above illustrates the possible outcomes of a sequence of such dependent Bernoulli variables. Korzeniowski showed that, despite this conditional dependency, $P(\epsilon_{i} = 1) = p$ for all $i$. That is, the sequence of Bernoulli variables is identically distributed, with correlation shown to be
$$\text{Cor}(\epsilon_{i}, \epsilon_{j}) = \left\{\begin{array}{lr} \delta, & i=1 \\ \delta^{2}, & i \neq j;\,i,j \geq 2\end{array}\right.$$

These identically distributed but correlated Bernoulli random variables yield a Generalized Binomial distribution with a similar form to the standard binomial distribution. In our generalization, we use the same form of FK dependence, but for categorical random variables. We will construct a sequence of identically distributed but dependent categorical variables from which we will build a generalized multinomial distribution. When the number of categories $K = 2$, the distribution reverts back to the generalized binomial distribution of Korzeniowski [4]. When the sequence is fully independent, the distribution reverts back to the independent categorical model and the standard multinomial distribution, and when the sequence is independent and $K=2$, we recover the standard binomial distribution. Thus, this new distribution represents a much larger generalization than prior models.