# A Generalized Geometric Distribution from Vertically Dependent Bernoulli Random Variables

##### For full proofs and derivations, read here.

## Properties of the Generalized Geometric Distribution

### Moment Generating Function

**Fact. **

The moment generating function of the generalized geometric distribution is

M_{X}(t) = pe^{t} + \frac{qp^{-}e^{2t}}{1-q^{+}e^{t}}

### Mean

**Fact. **

The mean of the generalized geometric distribution is

E[X] = \mu = \frac{1-\delta p}{p(1-\delta)}

The effect of dependence can be seen in the plot of E[X] below in Figure 1.. For fixed p, when \delta \to 1, E[X] \to \infty, though the rate changes with p.

To explore further, suppose p = 1/2, and the Bernoulli trials are thus balanced between success and failure. Figure 2 shows the effect of delta for a fixed p. Notice that the effect of \delta on the expected value become more pronounced as \delta \to 1. In particular, for \delta = 1/2 and p=1/2, E[X] = 3, but after this point, an increase of only 1/6 in \delta to \delta = 2/3 increased the expected value to 4 trials before a success. To double the expected number of trials before a success again to E[X] = 8 requires an increase of \delta by only 4/21 to 6/7.

A smaller probability of success p will yield an expected value \mu that is much more susceptible to effects of dependency \delta, and a larger p will yield an expected value more resistant to high dependency \delta. Since the geometric distribution is a count of the number of trials needed to obtain the first success, a higher p increases the probability that the first success occurs on the first trial, while a lower p decreases that probability. Therefore, the dependency \delta would have a higher effect for lower p, because a longer (dependent) sequence is expected to be generated prior to the first success, which increases the expected number of trials faster than if the Bernoulli trials were independent.

*Remark. Notice that when \delta = 0, the Bernoulli trials are independent. The mean of the generalized geometric distribution when \delta = 0 is E[X] = \frac{1}{p}, the mean of the standard geometric distribution.*

### Variance

**Fact. **

The variance of the generalized geometric distribution is

\text{Var}(X) = \sigma^{2} = \frac{1-p + \delta p(1-p)}{p^2(1-\delta)^2}

Figure 3 shows the effect of \delta on the variance for different values of p. As with the mean, a smaller p induces a higher effect of \delta on the variance of the number of dependent Bernoulli trials before the first success is observed. The shape of all 4 cases is similar, but the scales are vastly different. As p increases, the scale of the variance decreases dramatically.

*Remark. Again, note that when \delta = 0, the variance of the generalized geometric distribution reduces to that of the standard geometric distribution.*

### Skew

**Fact. **

The skew of the generalized geometric distribution is given by

\text{Skew}[X] = \frac{2-3p + p^2 + \delta p[q+\delta p q + p(2\delta-1-p)]}{\left(q + \delta pq\right)^{3/2}}

The skew of the generalized geometric distribution gets more complicated in its behavior as a function of both p and \delta. Figure 4 shows the skew as a function of p for the two extreme cases: complete independence (\delta = 0) and complete dependence \delta = 1. From p=0 to p \approx 0.658, the skew for the independent geometric distribution is greater than the completely dependent case. For p \gtrapprox 0.658, the skew is greater under complete dependence.

### Entropy

The *entropy* of a random variable measures the average information contained in the random variable. It can also be viewed as a measure of how unpredictable or “truly random” the variable is [1] . The definition of entropy, denoted H(X), was coined by Claude Shannon [3] in 1948.

**Definition. (Entropy)**

H(X) := -\sum_{i}P(x_{i})\log_{2}(P(x_{i}))

For the standard geometric distribution, the entropy is given by

H_{sg}(X) = \frac{-(1-p)\log_{2}(1-p)-p\log_{2}(p)}{p}

**Fact.**

The entropy for the generalized geometric distribution is

H_{gg}(X) = \frac{-\left[pp^{-}\log_{2}(p) + qp^{-}\log_{2}(qp^{-}) + qq^{+}\log_{2}(q^{+})\right]}{p^{-}}

Figure 5a shows H_{gg}(X) as a function of p for fixed values of \delta. Notice that while the entropy decreases to 0 for all curves as p \to 1, the entropy curve is shifted upward for larger \delta. Figure 5b fixes p and looks at entropy as a function of \delta. Notice that for smaller p, the entropy is much higher, which aligns with intuition.