﻿
Vertical Dependency in Sequences of Categorical Random Variables

# Vertical Dependency in Sequences of Categorical Random Variables

We repeat a section from A Generalized Multinomial Distribution from Dependent Categorical Random Variables  in order to give a review of the original first-kind (FK) dependency created by Korzeniowski. For the full details, see the previous work on the topic.

Korzeniowski defined the notion of dependence in a way we will refer to here as dependence of the first kind (FK dependence). Suppose $(\epsilon_{1},...,\epsilon_{N})$ is a sequence of Bernoulli random variables, and $P(\epsilon_{1} = 1) = p$. Then, for $\epsilon_{i}, i \geq 2$, we weight the probability of each binary outcome toward the outcome of $\epsilon_{1}$, adjusting the probabilities of the remaining outcomes accordingly.

Formally, let $0 \leq \delta \leq 1$, and $q = 1-p$. Then define the following quantities
\begin{aligned}p^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 1) = p + \delta q &\qquad p^{-} :=P(\epsilon_{i} = 0 | \epsilon_{1} = 1) = q -\delta q\\q^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 0) = p-\delta p&\qquad q^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = q + \delta p\end{aligned}

Given the outcome $i$ of $\epsilon_{1}$, the probability of outcome $i$ occurring in the subsequent Bernoulli variables $\epsilon_{2}, \epsilon_{3},..., \epsilon_{n}$ is $p^{+}, i = 1$ or $q^{+}, i=0$. The probability of the opposite outcome is then decreased to $q^{-}$ and $p^{-}$, respectively.

The figure above illustrates the possible outcomes of a sequence of such dependent Bernoulli variables. Korzeniowski showed that, despite this conditional dependency, $P(\epsilon_{i} = 1) = p \quad\forall i$. That is, the sequence of Bernoulli variables is identically distributed, with correlation shown to be
$$\text{Cor}(\epsilon_{i}, \epsilon_{j}) = \left\{\begin{array}{lr}\delta, & i=1 \\\delta^{2}, &i \neq j, \quad i,j \geq 2\end{array}\right.$$

These identically distributed but correlated Bernoulli random variables yield a Generalized Binomial distribution with a similar form to the standard binomial distribution.

In the previous work, the concept of Bernoulli FK dependence was extended to categorical random variables. That is, given a sequence of categorical random variables with $K$ categories, $P(\epsilon_{1} = i) = p_{i}$, $i = 1,\ldots,K,$ \begin{aligned}P(\epsilon_{j} = i | \epsilon_{1} = i) &= p_{i}^{+} = p_{i} + \delta(1-p_{i});\\P(\epsilon_{j} = k | \epsilon_{1} = i) &= p_{k}^{-} = p_{k}-\delta p_{k}, i \neq k, \:\:k = 1,\ldots,K.\end{aligned}

Traylor proved that FK dependent categorical random variables remained identically distributed, and showed that the cross-covariance matrix of categorical random variables has the same structure as the correlation between FK dependent Bernoulli random variables. In addition, the concept of a generalized binomial distribution was extended to a generalized multinomial distribution.

In the next section, we will explore a different type of dependency structure, sequential dependency.