Vertical Dependency in Sequences of Categorical Random Variables

Vertical Dependency in Sequences of Categorical Random Variables

For the full text, including proofs, download the pdf here.

Figure 1. Probability Mass Flow for FK Dependent Bernoulli Random Variables

We repeat a section from A Generalized Multinomial Distribution from Dependent Categorical Random Variables  in order to give a review of the original first-kind (FK) dependency created by Korzeniowski. For the full details, see the previous work on the topic.

Korzeniowski defined the notion of dependence in a way we will refer to here as dependence of the first kind (FK dependence). Suppose (\epsilon_{1},...,\epsilon_{N}) is a sequence of Bernoulli random variables, and P(\epsilon_{1} = 1) = p. Then, for \epsilon_{i}, i \geq 2, we weight the probability of each binary outcome toward the outcome of \epsilon_{1}, adjusting the probabilities of the remaining outcomes accordingly.

Formally, let 0 \leq \delta \leq 1, and q = 1-p. Then define the following quantities
\begin{aligned}p^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 1) = p + \delta q &\qquad p^{-} :=P(\epsilon_{i} = 0 | \epsilon_{1} = 1) = q -\delta q\\q^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 0) = p-\delta p&\qquad q^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = q + \delta p\end{aligned}

Given the outcome i of \epsilon_{1}, the probability of outcome i occurring in the subsequent Bernoulli variables \epsilon_{2}, \epsilon_{3},..., \epsilon_{n} is p^{+}, i = 1 or q^{+}, i=0. The probability of the opposite outcome is then decreased to q^{-} and p^{-}, respectively.

The figure above illustrates the possible outcomes of a sequence of such dependent Bernoulli variables. Korzeniowski showed that, despite this conditional dependency, P(\epsilon_{i} = 1) = p \quad\forall i. That is, the sequence of Bernoulli variables is identically distributed, with correlation shown to be
\text{Cor}(\epsilon_{i}, \epsilon_{j}) = \left\{\begin{array}{lr}\delta, & i=1 \\\delta^{2}, &i \neq j, \quad i,j \geq 2\end{array}\right.

These identically distributed but correlated Bernoulli random variables yield a Generalized Binomial distribution with a similar form to the standard binomial distribution.

In the previous work, the concept of Bernoulli FK dependence was extended to categorical random variables. That is, given a sequence of categorical random variables with K categories, P(\epsilon_{1} = i) = p_{i}, i = 1,\ldots,K, \begin{aligned}P(\epsilon_{j} = i | \epsilon_{1} = i) &= p_{i}^{+} = p_{i} + \delta(1-p_{i});\\P(\epsilon_{j} = k | \epsilon_{1} = i) &= p_{k}^{-} = p_{k}-\delta p_{k}, i \neq k, \:\:k = 1,\ldots,K.\end{aligned}

Traylor proved that FK dependent categorical random variables remained identically distributed, and showed that the cross-covariance matrix of categorical random variables has the same structure as the correlation between FK dependent Bernoulli random variables. In addition, the concept of a generalized binomial distribution was extended to a generalized multinomial distribution.

In the next section, we will explore a different type of dependency structure, sequential dependency.