# On Permuted First-Kind Dependence of Categorical Random Variables

This paper discusses the notion of *horizontal dependency* in sequences of first-kind dependent categorical random variables. We examine the necessary and sufficient conditions for a sequence of first-kind dependent categorical random variables to be identically distributed when the conditional probability distribution of subsequent variables after the first are permuted from the identity permutation used in previous works.

##### To read the full paper with all proofs, download here

## Introduction

*Editor’s note: You can view the video explaining first-kind dependence in more detail here.*

In [2], Traylor extended the notion of first-kind dependence first detailed by Korzeniowski in [1] to categorical random variables. Suppose we take a sequence of categorical random variables \epsilon = (\epsilon_{1},\epsilon_{2},\ldots, \epsilon_{n},\ldots) with K categories \{1,2,...,K\} with P(\epsilon_{1} = j) = p_{j}, j = 1,...,K, with \sum_{j=1}^{K}p_{j} = 1. Define the *dependency coefficient* \delta \in [0,1]. Korzeniowski [1] and Traylor [2] defined the following quantities:

\begin{aligned}p_{j}^{+} &:= p_{j} + \delta(\sum_{i\neq j}p_{i}) = p_{j} + \delta(1-p_{j})\\p_{j}^{-} &:= p_{j}-\delta p_{j} \end{aligned}

for j = 1,2,\ldots, K where we have the following identities.

p_{j}^{+} + \sum_{i \neq j}p_{i}^{-} = 1 \text{ for } j = 1,2,\ldots, K

Under first-kind dependence, Under first-kind dependence,

\begin{aligned}P(\epsilon_{i} = j | \epsilon_{1} = j) &= p_{j}^{+} \\P(\epsilon_{i} = j | \epsilon_{1} \neq j)&= p_{j}^{-}\end{aligned}for j = 1,2,\ldots K and i \geq 2. Figure 1 gives an illustration of standard first-kind dependence for a sequence of categorical random variables with 3 categories. The binary tree shows the allocation of probability mass at each new variable in the sequence given the previous outcomes. Note that standard first-kind dependence weights the probability of subsequent outcomes *in favor* of the outcome of the first random variable, and *away* from the others. As an explicit example, the probability that \epsilon_{3} = 2 is p_{2}^{+} if \epsilon_{1} = 2, but p_{2}^{-} if \epsilon_{1} = 1 or 3.

Traylor [2] proved that P(\epsilon_{i} = j) = p_{j} for all i = 1,2,\ldots and for all j = 1,2,\ldots, K. That is, despite the dependence of subsequent categorical variables in the sequence on the first, all variables remain identically distributed. Traylor and Hathcock [3] also extended this notion to other types of *vertical dependence* by creating a class of *vertical dependency generators* \mathscr{C} that generate other dependency structures in addition to first-kind dependence, such as sequential dependence. Any dependency structure generated by a function in \mathscr{C}, defined by \mathscr{C}_{\delta} = \{\alpha : \mathbb{N}_{\geq 2} \to \mathbb{N}\} such that

- \alpha(n) < n
- \forall\: n\: \exists \: j \in \{1,...,n-1\} : \alpha(n) = j

The generating function for first-kind dependence is \alpha(n) \equiv 1, showing that every categorical random variable, starting with the second, assigns conditional probabilities based on the outcome of the first.

Regardless of vertical dependency structure, we have thus far assumed that we weight in favor of the outcome of \alpha(n). That is, if \epsilon_{\alpha(n)} = j, then P(\epsilon_{n} = j|\epsilon_{\alpha(n)} = j) = p_{j}^{+}, and P(\epsilon_{n} = j|\epsilon_{\alpha(n)} \neq j) = p_{j}^{-}. Under these conditions, any dependent categorical sequence generated by a function \alpha \in \mathscr{C} is identically distributed but dependent, as proven in [3].

This paper examines permuted weights and derives the conditions under which first-kind dependent categorical random variables remain identically distributed under permuted weights. We define permuted weighting in the next section.

## Permuted Weighting

Suppose we take a permutation \sigma of \{1,2,...,K\}, so that \sigma : i \mapsto \sigma_{i}, i = 1,...,K. Now suppose that we take a first-kind dependent sequence of categorical random variables. Now, we define the permuted weights by modifying the conditional probabilities. For i \geq 2 \begin{aligned}P(\epsilon_{i} = j | \epsilon_{1} = \sigma^{-1}_{j}) &= p_{j}^{+} \\P(\epsilon_{i} = j | \epsilon_{1} \neq \sigma^{-1}_{j}) &= p_{j}^{-}\end{aligned}

for j = 1,2,\ldots, K. The *vertical dependency* – first kind dependency in this case- is unchanged, in that \alpha(n) \equiv 1, but *how* those specific probability masses are assigned has been permuted.

* Example:* For example, suppose K=3, and let \sigma = (132). Then for i\geq 2,

The tree for first-kind dependence with weights according to \sigma = (132) is given in Figure 2.

The following lemma gives the unconditional probability distribution of any categorical random variable \epsilon_{i}, i \geq 2 in a permuted first-kind dependent sequence. Note that P(\epsilon_{1} = j) = p_{j} by definition.

* Lemma:* Let \epsilon = (\epsilon_{1},\epsilon_{2},\ldots, \epsilon_{n}) be a first-kind dependent sequence of categorical random variables with permuted weights given by \sigma. Then

This lemma gives the unconditional probability distribution for the subsequent categorical random variables after \epsilon_{1} under permuted first-kind dependence. If \sigma is the identity permutation, then \sigma^{-1}(i) = i, and This lemma gives the unconditional probability distribution for the subsequent categorical random variables after \epsilon_{1} under permuted first-kind dependence. If \sigma is the identity permutation, then \sigma^{-1}(i) = i, and

P(\epsilon_{n}=i)=p_{i}p_{i}^{+}+p_{i}^{-}\sum_{j \neq i}p_{j}=p_{i}and we recover standard first kind dependence from [2]. Now the entire sequence is identically distributed. In general, this may not always be true. The following theorem gives necessary and sufficient conditions on the initial probability distribution to guarantee an identically distributed permuted sequence of first-kind dependent categorical random variables.

* Theorem 1:* Suppose \epsilon = (\epsilon_{1},\epsilon_{2},\ldots \epsilon_{n}) is a sequence of first-kind dependent categorical random variables with K categories. Let \sigma be a permutation of \{1,2,\ldots, K\} that permutes the conditional probabilities of \epsilon_{j}, j \geq 2 according to the outcome of \epsilon_{1}. That is, suppose

Then all variables \epsilon_{i} are identically distributed with P(\epsilon_{i} = k) = p_{k}, k= 1,2,\ldots, K if and only if p_{i} = p_{\sigma^{-1}}(i).

Returning to the example above, where K=3 and \sigma = (132), we have by Theorem 1 that the entire sequence is identically distributed if and only if p_{1} = p_{\sigma^{-1}(1)} = p_{2}, p_{2} = p_{3}, and p_{3} = p_{1}. In this case, we may conclude that only of p_{1} = p_{2} = p_{3} = p will the sequence remain identically distributed under first-kind dependence. Returning to the example above, where K=3 and \sigma = (132), we have by Theorem 1 that the entire sequence is identically distributed if and only if p_{1} = p_{\sigma^{-1}(1)} = p_{2}, p_{2} = p_{3}, and p_{3} = p_{1}. In this case, we may conclude that only of p_{1} = p_{2} = p_{3} = p will the sequence remain identically distributed under first-kind dependence.

* Example:* Suppose now that K=4, and \sigma = (12)(34). Then by Theorem 1, a sequence \epsilon of these categorical random variables under first-kind dependence will be identically distributed if and only if p_{1} = p_{2} and p_{3} = p_{4}. Thus, in this case, p_{1} = p_{2} = p, and p_{3} = p_{4} = p', but p and p' need not necessarily be equal to ensure the sequence is identically distributed.

This example illustrates a corollary to Theorem 1. A permuted FK-dependent sequence of categorical random variables is identically distributed if and only if the probabilities within each disjoint cycle are equal.

* Corollary:* Let \sigma = \tau_{1}\tau_{2}\cdots \tau_{m} be a permutation decomposed into a product of disjoint cycles \tau_{j}. Suppose \epsilon is a first-kind dependent sequence of categorical random variables with weights permuted according to \sigma. Then the sequence is identically distributed if and only if the probabilities within each disjoint cycle are equal.

## Remarks and Conclusion

The notion of permuted weights extends the possibilities for dependence among a sequence of categorical random variables beyond even the class of vertical dependency structures. This paper introduced the notion of *horizontal dependence* via permuting the modification of the subsequent outcome probabilities, giving the possibility of two different sequences with the same vertical dependency structure, but different horizontal dependency. Permuting the conditional probabilities away from the identity permutation requires the addition of restrictions on the initial probability distribution to retain the desired identically distributed nature of the sequence. Theorem 1 and Corollary 2 show that the necessary and sufficient condition to guarantee an identically distributed sequence of first-kind dependent categorical random variables is for the probabilities within each cycle to be equal. The notion of permuted weights extends the possibilities for dependence among a sequence of categorical random variables beyond even the class of vertical dependency structures.

This paper introduced the notion of horizontal dependence via permuting the modification of the subsequent outcome probabilities, giving the possibility of two different sequences with the same vertical dependency structure, but different horizontal dependency. Permuting the conditional probabilities away from the identity permutation requires the addition of restrictions on the initial probability distribution to retain the desired identically distributed nature of the sequence. Theorem 1 and Corollary 1 show that the necessary and sufficient condition to guarantee an identically distributed sequence of first-kind dependent categorical random variables is for the probabilities within each cycle to be equal.

The identity permutation

when decomposed into a product of disjoint cycles. Thus, we can see that this is a special case of Theorem 1 and Corollary 1. First-kind dependent sequences of categorical random variables under the identity permutation are always identically distributed, as shown in [2]. In fact, vertical dependency structures generated by \alpha \in \mathscr{C} from [3] are also always identically distributed under the identity permutation. At the other extreme, suppose \epsilon is a first-kind dependent sequence with permuted weights according to \sigma which is a 1-cycle. Then the only way this sequence can be identically distributed is if p_{1} = \cdots = p_{K}. Future research will examine necessary and sufficient conditions on horizontal dependence to retain the identically distributed property of sequences of categorical random variables in other dependency structures.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

## References

- Andrzej Korzeniowski. On correlated random graphs. Journal of Probability and Statistical Science, pages 43–58, 2013.
- Rachel Traylor. A generalized multinomial distribution from dependent categorical random variables. Academic Advances of the CTO, 2017.
- Rachel Traylor and Jason Hathcock. Vertical dependency in sequences of categorical random variables. AACTO, 2017.