A Generalized Multinomial Distribution from Dependent Categorical Random Variables

A Generalized Multinomial Distribution from Dependent Categorical Random Variables

For the full paper, which includes all proofs, download the pdf  here.

Example Construction

Probability Distribution for = 3 at = 3

For an illustration, refer to Figure 2 above.  In this example, we construct a sequence of length N = 3 of categorical variables with K = 3 categories. At each level r, there are 3r nodes corresponding to 3r partitions of the interval [0,1]. Note that each time the node splits into 3 children, the sum of the split probabilities is 1. Despite the outcome of the previous random variable, the next one always has three possibilities. The sample space of categorical variable sequences of length 3 has 33 = 27 possibilities.


Identically Distributed but Dependent

We now show the most important property of this class of sequences– that they remain identically distributed despite losing independence.

Lemma 2:

P(\epsilon_{r} = i) = p_{i}\qquad i = 1,\ldots,K; \:\: r \in \mathbb{N}

Pairwise Cross-Covariance Matrix

We now give the pairwise cross-covariance matrix for dependent categorical random variables.

Theorem 1: (Cross-Covariance of Dependent Categorical Random Variables).  Denote by \Lambda^{\iota,\tau} the K × K cross-covariance matrix of \epsilon_{\iota} and \epsilon_{\tau}, \iota,\tau = 1,\ldots,n, defined as \Lambda^{\iota,\tau} = E[(\epsilon_{\iota} - E[\epsilon_{\iota}])(\epsilon_{\tau} - E[\epsilon_{\tau}])]. Then the entries of the matrix are given by

\Lambda^{1,\tau}_{ij} = \left\{\begin{array}{cc}\delta p_{i}(1-p_{i}), & i = j \\-\delta p_{i}p_{j}, & i \neq j\end{array}\right.\qquad \tau \geq 2,

\Lambda^{\iota,\tau}_{ij} = \left\{\begin{array}{cc}\delta^{2}p_{i}(1-p_{i}), & i = j \\-\delta^{2}p_{i}p_{j}, & i \neq j\end{array}\right. \qquad\tau > \iota,\:\iota\neq 1.

In the next section, we exploit the desirable identical distribution of the categorical sequence in order to provide a generalized multinomial distribution for the counts in each category.