A Generalized Geometric Distribution from Vertically Dependent Bernoulli Random Variables

A Generalized Geometric Distribution from Vertically Dependent Bernoulli Random Variables

 For full proofs and derivations, read here.

Background

The standard geometric distribution is built from a sequence of independent and identically distributed (i.i.d.) Bernoulli random variables with probability of success p and probability of failure q=1-p. There are two “versions” of the geometric distribution:

  • A random variable X has a geometric distribution if it counts the number of Bernoulli trials needed to observe the first success.
  •  A random variable Y = X-1 has a geometric distribution if it counts the number of failures in a sequence of Bernoulli trials before the first observed success.

 

In the first case, X has support \{1,2,3,...\}, because we are looking for the first success, which can occur on trial 1, 2, 3, and so forth. In the latter case, Y has support \{0,1,2,...\} because we are counting the number of failures before the first success occurs. That is, if the first success occurs on Trial 1, then there were 0 failures preceding the first success. If the first success occurs on trial 2, then one failure occurred prior to the first success, and thus Y=1. Essentially, Version 2 is a shifted Version 1, because our perspective changes– we do not include the success in the count in Version 2.

For Version 1, known as the standard geometric distribution the pdf is given by

f_{X}(k) = q^{k-1}p, \quad k = 1, 2, 3, \ldots

For Version 2, (the shifted generalized geometric distribution)the pdf is given by

f_{Y}(k) = q^{k}p, \quad k = 0, 1, 2, \ldots

The next section derives the generalized geometric distribution for FK-dependent random variables, and then shows that the pdf is the same regardless of dependency structure.

Generalized Geometric Distribution

Derivation from FK-Dependent Bernoulli Random Variables

Suppose we have a sequence of FK-dependent Bernoulli Random variables. Recall from [2] and [4] that FK-dependent random variables are weighted toward the outcome of the first variable \epsilon_{1}. That is, in the Bernoulli case, P(\epsilon_{1} = 1) = p and P(\epsilon_{1} = 0) = q = 1-p. For subsequent variables in the sequence,

\begin{aligned}P(\epsilon_{n} = 1 | \epsilon_{1}= 1) = p^{+} &\qquad P(\epsilon_{n} = 1 | \epsilon_{1} = 0) = p^{-} \\P(\epsilon_{n} = 0 | \epsilon_{1} = 1) = q^{-} &\qquad P(\epsilon_{n} = 0 | \epsilon_{1} = 0) = q^{+}\end{aligned}

for n\geq 2, where q = 1-p, p^{+} = p + \delta q, p^{-} = p-\delta p, q^{-}= q-\delta q, q^{+} = q + \delta p, and 0 \leq \delta \leq 1 is the dependency coefficient.

We will first give the generalized “Version 1” of the geometric distribution for FK-dependent random variables.

 


Proposition.
Suppose \epsilon = (\epsilon_{1},\epsilon_{2},\ldots, \epsilon_{n},\ldots) is a FK-dependent sequence of Bernoulli random variables. Let X be the count of such Bernoulli variables needed until the first success. Then X has a generalized geometric distribution with pdf
f_{X}(k) = \left\{\begin{array}{lr}p, & k = 1 \\q\left(q^{+}\right)^{k-2}p^{-}, & k \geq 2\end{array}\right.


 

In a similar fashion, suppose we prefer to count the number of failures before the first success occurs. For this generalized “Version 2”, we have the following proposition.

 


Proposition. 
Suppose \epsilon = (\epsilon_{1},\epsilon_{2},\ldots,\epsilon_{n},\ldots) is a FK-dependent sequence of Bernoulli random variables. Let Y = X-1 be the count of failures prior to the first success. Then Y has a shifted generalized geometric distribution with pdf

f_{Y}(k) = \left\{\begin{array}{lr}p, & k = 0 \\q\left(q^{+}\right)^{k-1}p^{-}, & k \geq 1\end{array}\right.

 

Generalized Geometric Distribution for any Vertical Dependency Structure

Propositions 1 and 2 were derived for FK-dependent random variables, but in fact these random variables X and Y remain distributed according to the generalized geometric distribution and the shifted generalized geometric distribution regardless of the vertical dependency structure specified, as long as the dependency structure was generated from a function in \mathscr{C}_{\delta}.

 


Theorem.
Let \epsilon = (\epsilon_{1}, \epsilon_{2},\ldots,\epsilon_{n},\ldots) be a vertically dependent sequence of Bernoulli random variables, where the dependency is generated by \alpha \in \mathscr{C}_{\delta}. Let X be the count of such Bernoulli trials needed until the first success, and let Y = X-1 be the count of failures of such Bernoulli trials prior to the first success. Then the pdf of X and Y is identical to those given in Propositions 1 and 2.


 

This result is quite powerful, and not one that holds for all generalized distributions constructed from dependent random variables. Given any vertical dependency structure generated from the broad class \mathscr{C}_{\delta}, the count of trials before a success and the count of failures before a success have the same probability distribution. Thus, if this information is desired, no information about the dependency structure other than the membership of its generating function in \mathscr{C}_{\delta} is necessary. The only information needed to calculate the generalized geometric probabilities for dependent Bernoulli trials is p and \delta.

The next section gives some basic properties of the Generalized Geometric Distribution, such as the moment generating function and selected moments.