### Browsed byCategory: Spire

Extensions of the Single Server Efficiency Model

## Abstract

Editor’s note: this paper comprises the third chapter of the PhD dissertation by Rachel Traylor. Visit here and here to see chapters one and two, respectively. Herein we further generalize the single server model presented in [3]. In particular, we consider a multichannel server under the cases of both singular and clustered tasks. In the instance of singular tasks, we present a load balancing allocation scheme and obtain a stochastic breakdown rate process, as well as derive the conditional survival function as a result. The model of a multichannel server taking in clustered tasks gives rise to two possibilities, namely, independent and correlated channels. We derive survival functions for both of these scenarios.

## Load Balancing Allocation for a Multichannel Server

### Model Description

Previously, we had assumed that a web server functions as a single queue that attempts to process jobs as soon as they arrive. These jobs originally brought a constant stress $\eta$ to the server, with the system stress reducing by $\eta$ at the completion of each job.

Now, suppose we have a server partitioned into K channels. Denote each channel as $Q_k,$ $k = 1,\ldots,K$. Jobs arrive via a nonhomogenous Poisson process with rate $\lambda(t)$. Upon arrival, each job falls (or is routed) to the channel with the shortest queue length. If all queue lengths are equal or multiple channels have the shortest length, the job will enter one of the appropriate queues with equal probability.

We retain the previous notation for the baseline breakdown rate, or hazard function. This is denoted by $r_0(t)$ and is the hazard function under an idle system. We also retain the assumption that the arrival times $\mathbf{T}$ are independent. In addition, the service times $\mathfrak{W}$ are i.i.d. with distribution $G_W(w)$. We assume that all channels are serving jobs at the same time, i.e., a job can be completed from any queue at any time. We do not require load balancing for service. In other words, any queue can empty with others still backlogged. We also retain the FIFO service policy for each queue.

Since we have now “balanced,” or distributed, the load of jobs in the server, not all jobs will cause additional stress to the system. Suppose all jobs bring the same constant stress $\eta$ upon arrival. Under load balancing, we will define the additional stress to the system as $\eta\max_k|Q_k|$. Figure 1 shows an example server with current stress of $4\eta$.

### Examples

Due to the dynamic nature of arrival times, allocation to queues, and service times, we have many possible configurations of jobs at any point in time. Therefore, the allocation scheme adds an additional layer of variation to the service times and order of service. The placement of jobs in the various queues (and thus the order of service and service times) is wholly dependent on all arrival times and service times of the prior arrivals. The following examples illustrate the effect on the workload stress added to the system in various scenarios.

###### Figure 2(b). Breakdown Rate Process Trajectory

Example 1Suppose for simplicity we have 2 channels. Suppose at the time of observation of the system, 3 jobs have arrived and none have finished. WLOG, suppose job 3 fell into $Q_{1}$. See Figure 2(a). The stress to the system at $t=t_{\text{obs}}$ is $r_{0}(t_{\text{obs}}) + 2\eta$, as shown in Figure 2(b).

Note in Example 1 that Job 2 does not add any additional stress to the system. Job 1 sees an empty queue upon arrival, and $\max_{K}|Q_{K}| = 1$ when it falls into any particular queue. Job 2 arrives as Job 1 is still being processed, and thus the placement of Job 1 forces Job 2 into the empty channel. Since $\max_{K}|Q_{K}|$ is still 1, the stress to the system doesn’t change. Job 3 arrives as Jobs 1 and 2 are in service, and thus its choice of queue is irrelevant due to the configuration of the two queues at $T_{3}$. Regardless of which queue Job 3 falls into, $\max_{K}|Q_{K}| = 2$. Thus the arrival of Job 3 increases the breakdown rate by $\eta$ again.

The next example shows the change in system stress Job 1 from Example 1 when one job has finished processing before $T_{3}$.

###### Figure 3(b). Example 2 Breakdown Rate Process Trajectory

Example 2. Consider the same two-channel system from Example 1. However, now suppose WLOG that $T_{3} < T_{1}+W_{1}$. In other words, service for Job 1 was completed before Job 3 arrived. Hence Job 3 will fall into the opposite queue as Job 2. The stress to the system at the time of observation would be $r_{0}(t) + \eta$. See Figures 3(a) and 3(b).

In this scenario, the workload due to Job 3 does not contribute any additional stress to the server. Also observe that upon completion of Job 1, the workload stress to the server does not decrease, as Job 2 still resides in the system and is being served.

Contrast this behavior with the breakdown rate process given in [3]. In the single-channel, single-server model described in both [1] and [3], each job adds stress to the server upon arrival. Under the load balancing allocation scheme, the additional stress to the server depends on the arrival and service times of all prior jobs. From a stochastic perspective, this breakdown rate process has full memory.

The examples above illustrate that $\max_{K}|Q_{K}|$ depends on the intersection of the intervals $I_{j} = [T_{j}, T_{j} + W_{j}]$, $j = 1,\ldots,N(t)$. The next section details the methodology to obtain the configuration of jobs in the server at time $t$ by deccomposition of $\bigcup_{j=1}^{N(t)}I_{j}$ into disjoint atoms and derives the stochastic breakdown rate process under the load balancing allocation scheme.

### Breakdown Rate Process and Conditional Survival Function

Let $\epsilon = (\epsilon_{1},\ldots,\epsilon_{N(t)})$ be a $N(t)$-tuple whose components $\epsilon_{j}\in\{\emptyset, c\}$, where $\emptyset$ denotes the empty set, and $c$ denotes the complement of the set. Let $E = \{\epsilon \mathrel{:} \epsilon_{j}\in\{\emptyset, c\}\}$ denote the set of all possible $\epsilon$, excepting $\epsilon = (c,\ldots,c)$. Then by Lemma 1 (see Appendix),

$$\bigcup_{j=1}^{N(t)}I_{j} = \bigcup_{\epsilon \in E}\bigcap_{j=1}^{N(t)}I_{j}^{\epsilon_{j}}$$

Remark. $\cap_{j=1}^{N(t)}I_{j}^{\epsilon_{j}}$ indicates which jobs are still in the server at time $t$. The union is disjoint; thus only one $\epsilon$ will describe the server configuration at any given time $t$. For example, if 3 jobs have arrived to the server at time $t_{\text{obs}}$, $|E| = 3\times 2 - 1 = 5$. These may be enumerated:

• $I_{1} \cap I_{2} \cap I_{3}$

• $I_{1}^{c} \cap I_{2}^{c} \cap I_{3}$

• $I_{1} \cap I_{2}^{c} \cap I_{3}^{c}$
• $I_{1}^{c} \cap I_{2} \cap I_{3}$

• $I_{1}^{c} \cap I_{2} \cap I_{3}^{c}$

As an illustration, refer to Example 1. All three jobs are in the system at $t = t_{\text{obs}}$ (that is, none have completed service), and thus $t_{\text{obs}} \in I_{1} \cap I_{2} \cap I_{3}$. Expanding, $t_{\text{obs}} \in [T_{1}, T_{1}+W_{1}]$, $[T_{2}, T_{2}+W_{2}]$, and $[T_{3}, T_{3}+W_{3}]$.

Compare the case with that of Example 2. In this case, three jobs have arrived at $t = t_{\text{obs}}$, but Job 1 has finished by $t_{\text{obs}}$. Thus $t_{\text{obs}} \not\in I_{1}$, but since Jobs 2 and 3 are still in the system, $t_{\text{obs}} \in I_{2} \cap I_{3}$. Thus $t_{\text{obs}} \in I_{1}^{c} \cap I_{2} \cap I_{3}$.

Now, since the additional workload stress to the server is a multiple of $\eta\max_{K}|Q_{K}|$, it remains to derive the appropriate multiplier that accounts for the number of jobs that contribute additional stress to the system.

Let $n = \sum_{j=1}^{N(t)}\mathbf{1}(\epsilon_{j} = \emptyset | \epsilon_{j} \in \epsilon)$ for a particular $\epsilon$, and let $\alpha_{\epsilon}$ be the multiplier that indicates the number of jobs that contribute stress $\eta$ to the system. Under [1] and the generalization in [3], every uncompleted job in the system contributes stress, thus $\alpha_{\epsilon} = n$.

Under the load balancing scheme, $\alpha_{\epsilon} = \lfloor\frac{n+1}{K}\rfloor$, where $K$ is the number of channels in the server. This is due to the allocation scheme’s attempts to evenly distribute jobs across channels. Thus, for Example 1, $n=3$, and $K=2$, meaning $\alpha_{\epsilon} = 2$, as illustrated in Figure 2(b) and for Example 2, $\alpha_{\epsilon} = \lfloor\frac{3+1}{2}\rfloor = 1$, as in Figure 3(b).

Then, the stochastic breakdown rate process under the load balancing allocation scheme is given by
$$\mathcal{B}(t) = r_{0}(t) + \eta\sum_{\epsilon \in E}\alpha_{\epsilon}\mathbf{1}_{I_{1}^{\epsilon_{1}}\cap I_{2}^{\epsilon_{2}} \cap I_{N(t)}^{\epsilon_{N(t)}}}(t)$$

Under this expression, only one indicator function will be nonzero at any given point in time, since all atoms are disjoint. Now, $I_{1}^{\epsilon_{1}} \cap I_{2}^{\epsilon_{2}} \cap \ldots \cap I_{N(t)}^{\epsilon_{N(t)}}$ may be expressed as one interval $[L_{\epsilon},R_{\epsilon}]$, where

\begin{aligned}L_{\epsilon}&=\max\left(\{T_{j}\mathrel{:}\epsilon_{j} = \emptyset\}_{j=1}^{N(t)}\right);\\R_{\epsilon}&=\min\left(\{T_{j} + W_{j}\mathrel{:}\epsilon_{j} = \emptyset \}_{j=1}^{N(t)},\{T_{j}\mathrel{:}\epsilon_{j} = c\}_{j=1}^{N(t)}\right)\end{aligned}

Thus, for a server with $K$ channels under a load balancing routing scheme with all jobs bringing constant stress $\eta$, the breakdown rate process $\mathcal{B}(t)$ may be expressed as

$$\mathcal{B}(t) = r_{0}(t) + \eta\sum_{\epsilon \in E}\alpha_{\epsilon}\mathbf{1}_{[L_{\epsilon}, R_{\epsilon}]}(t)\qquad(*)$$

Thus, the conditional survival function under the load balancing scheme is given by
\begin{aligned}S_{Y|\mathfrak{T},\mathfrak{W}, N(t)}(t|\mathfrak{t},\mathfrak{w}, n) &= e^{-\int_{0}^{t}\mathcal{B}(s)ds}\\&=\bar{F}_{0}(t)\exp\left(-\eta\int_{0}^{t}\sum_{\epsilon \in E}\alpha_{\epsilon}\mathbf{1}_{[L_{\epsilon}, R_{\epsilon}]}(s)ds\right)\\&=\bar{F}_{0}(t)\exp\left(-\eta\sum_{\epsilon\in E}\alpha_{\epsilon}\min(t-L_{\epsilon},R_{\epsilon})\right) \end{aligned}

### Remarks

Finding the survival function of the single-channel environment relied on the independence of the set of arrival times and service times. From $(*)$, the independence is clearly lost. As noted before, the random breakdown process has full memory, and thus is completely dependent upon the entire trajectory up to $t = t_{\text{obs}}$.

## Clustered Tasks in a Multichannel Server

###### Figure 4. Clustered Tasks in a Multichannel Server

The previous multichannel server model in Section 1 implicitly assumed each job comes with one task, and all channels are identical in their ability to serve any task brought by a job. A classic illustration is a block of registers at a retail establishment. Each customer will survey the length of the various queues at each register before choosing the shortest queue. Viewing each of these separate registers as a channel in a single server under these conditions gave rise to the load balancing allocation model detailed in the previous section. This section presents a different interpretation of a multichannel, single-server model.

Suppose a server has multiple channels $Q_{1},\ldots,Q_{K}$, but each channel serves a different type of task. A customer arrives to the server and may select any number from $0$ to $K$ tasks for the server to perform. Said customer will select each possible task $j$ with probability $p_{j}$. Figure 4 illustrates an example of such a situation in which three customers visit the server and each customer picks a different number and set of tasks at random. A customer is considered fully serviced (i.e. the job is complete) upon completion of the last task belonging to that particular customer.

### Model Assumptions

The following mathematical assumptions are made for the multichannel server with clustered tasks:

1. Customers arrive to the server with $K$ channels via a nonhomogenous Poisson process (NHPP) with intensity $\lambda(t)$.
2. The breakdown rate of the idle server is given by $r_{0}(t)$.
3. Each channel corresponds to a different task the server can perform.
4. The selection of each task is a Bernoulli random variable with probability $p_{k}$. Thus the number of tasks selected by each customer is a binomial random variable.
5. The workload stress to the server is a constant multiple $\eta$ of the number of tasks requested by the customer, i.e. the additional stress is given by $\eta N$, where $N$ is the number of tasks requested.
6. The PDF of each channel’s service time is given by $g_{i}(w)$, $i = 1,\ldots ,K$. Since the customer’s service is not complete until all requested tasks have finished, the service life distribution for the customers is given by $\max_{i}G_{i}(w)$.

Under these assumptions, this model is a special interpretation of the random stress environment developed in [3]. In this case, the random workload stress is $\eta N$, where $N$ is a binomial random variable, and the service life distribution $G_{W}(w) = \max_{i}G_{i}(w)$, which may be easily obtained through the mathematical properties of order statistics. Two variations are considered in the  next section: independent channels and correlated channels.

### Independent Channels in a Clustered Task Server

Suppose the selection probabilities for each task in the server are identical, that is, $p_{1}=p_{2}=\ldots=p_{K}=p$. Then $N\sim\text{Bin}(K,p)$. Using Theorem 3 in [3], the survival function of the multichannel server is given in the following theorem.

Theorem 1 (Survival Function of Multichannel Server with Clustered Tasks and Independent Channels). Suppose model assumptions (1)-(6) above are satisfied. In addition, assume $p_{1}=p_{2}=\ldots=p_{K}=p$. Then the survival function of the server is given by
\begin{aligned}S_{Y}(t)&=\bar{F}_{0}(t)\exp\left(-K\eta\left[e^{-\eta t}\left(1-p+pe^{-\eta t}\right)^{K-1}-p(1-p)^{K-1}\right]\right.\\&\qquad\left.\times\int_{0}^{t}m(t-w)\bar{G}_{W}(w)dw\right)\end{aligned}

where $m(x) = \int_{0}^{x}\lambda(s)ds$, $\bar{F}_{0}(t) = e^{-\int_{0}^{t}r_{0}(s)ds}$, $\bar{G}_{W}(w) = 1-G_{W}(w)$, and $G_{W}(w) = \max_{i}G_{i}(w)$.

### Correlated Channels in a Cluster Server

Now suppose the server tasks are correlated, in that the selection of one particular task may affect the selection of any or all of the other tasks. Thus the channels are a sequence of dependent Bernoulli random variables. The construction of dependent Bernoulli random variables is given in [2], and a summary is given.

#### Dependent Bernoulli Random Variables and the Generalized Binomial Distribution

Korzenwioski [2] constructs a sequence of dependent Bernoulli random variables using a binary tree that distributes probability mass over dyadic partitions of $[0,1]$. Let $0\leq\delta\leq1$, $0, and $q=1-p$. Then define the following quantities:
\begin{aligned}q^{+}\mathrel{:=}q+\delta p&\qquad p^{+}\mathrel{:=}p+\delta q\\q^{-}\mathrel{:=}q(1-\delta)&\qquad p^{-}\mathrel{:=}p(1-\delta)\end{aligned}

The quantities above satisfy the following conditions:
\begin{aligned}q^{+}+p^{-}=q^{-}&+p^{+}=q+p=1\\qq^{+}+pq^{-}=q&,\quad qp^{-}+pp^{+}=1\end{aligned}

###### Figure 5. Construction of Dependent Bernoulli Random Variables

Figure 5 shows the construction shows the dependencies. The following examples using coin flips illustrate the effect of the dependency coefficient $\delta$:

Example 1 $(\delta=1)$. For $\delta = 1$, $q^{+} = q+p=1, q^{-} = 0$, $p^{+} = p+q = 1$, and $p^{-} = 0$. Supposing the first coin flip $\epsilon_{1} = 1$. Then every successive $\epsilon_{i}$ will also be $1$. Similarly if $\epsilon_{1} = 0$. Thus the result of the first coin flip completely determines the outcomes of all the rest.

Example 2 $(\delta = 0)$For $\delta = 0$, $q^{+} = q^{-} = q$, and $p^{+} = p^{-} = p$. Thus, the first coin flip (and all subsequent ones) have no effect on the ones that follow.

Example 3 $(\delta = \frac{1}{4})$Suppose $p=q=\frac{1}{2}$. Then $p^{+} = q^{+} = \frac{5}{8}$, and $p^{-} = q^{-} = \frac{3}{8}$. Then the subsequent outcomes $\epsilon_{i}$, $i \geq 2$ are more likely to match the outcomes of $\epsilon_{1}$ than not.

Now suppose $p = \frac{1}{4}$, $q = \frac{3}{4}$. Then $p^{+} = \frac{7}{16}$$p^{-} = \frac{3}{16}$, $q^{+} = \frac{13}{16}$, and $q^{-} = \frac{9}{16}$. In this example of an unfair coin, the dependency coefficient $\delta$ still attempts to skew the results following the first coin flip in favor of the outcome of $\epsilon_{1}$. However, the dependency here heightens the effect of $\epsilon_{1} = 0$ on subsequent flips, and cannot overcome the discrepancy between the probability of success and failure to skew $\epsilon_{i}$, $i \geq 2$ in favor of a $1$ following the outcome of $\epsilon_{1} = 1$.

Using these dependent Bernoulli random variables, [2] presents a Generalized Binomial Distribution for identically distributed but dependent Bernoulli random variables.

Generalized Binomial Distribution

Let $X = \sum_{i=1}^{n}\epsilon_{i}$, where $\epsilon_{i}$, $i = 1,\ldots ,n$ are identically distributed Bernoulli random variables with probability of success $p$ and dependency coefficient $\delta$.  Then

$$P(X=k) = q\tbinom{n-1}{k}(p^{-})^{k}(q^{+})^{n-1-k} + p\tbinom{n-1}{k-1}(p^{+})^{k-1}(q^{-})^{n-1-(k-1)}$$

#### Survival Function of Correlated Channels in a Cluster Server

Suppose the selection of tasks may be modeled by the dependent Bernoulli random variables given in the previous section. That is, suppose the customer selects Tasks 1-$K$ in sequence, and the selection or rejection of Task 1 affects all subsequent tasks by a dependency coefficient $\delta$. From [2], the correlation between task selections $\epsilon_{i}$ and $\epsilon_{j}$ is given by
$$\rho = \text{Cor}(\epsilon_{i},\epsilon_{j})=\begin{cases}\delta,&i=1;j=2,\ldots,K\\\delta^{2},&i\neq j;i,j\geq 2\end{cases}$$

This illustrates the dependency of Tasks 2-$K$ on the outcome of Task 1, and notes that while Tasks 2-$K$ are still correlated with each other, the dependency is much lower. In a similar fashion to the independent channel server, the survival function is derived.

Theorem 2 (Survival Function of Multichannel Server with Clustered Tasks and Dependent Channels). Suppose model assumptions 1-6 are satisfied. In addition, suppose the selection of channels 1-$K$ are determined by identically distributed Bernoulli random variables with dependency coefficient $\delta$ as defined in [2]. Then the survival function of the server is given by
$$S_{Y}(t)=\bar{F}_{0}(t)\exp\left(-\eta\int_{0}^{t}m(t-w)\bar{G}_{W}(w)\mathscr{S}(w)dw\right)$$

where $m(x) = \int_{0}^{x}\lambda(s)ds$, and

\begin{aligned}\mathscr{S}(w)&=\sum_{n=0}^{K}e^{-\eta n w}\sum_{j=0}^{K-n-1}\tbinom{K-1}{n-1, j,K-1-n-j}p^{K-1-j}(1-p)^{j+1}\delta^{K-1-n-j}(1-\delta)^{n}\\&\quad+\sum_{n=0}^{K}ne^{-\eta nw}\sum_{i=0}^{n-1}\tbinom{K-1}{K-1-n,i,n-1-i}p^{i+1}(1-p)^{K-n}\delta^{n-1-j}(1-\delta)^{K-n-j}\end{aligned}

## Conclusion

The generalized model of a server under random workload proposed in [3] admits further expansion by way of relaxing the assumption that incoming tasks have exactly one queue to enter on arrival. In considering a server partitioned into several channels, a cost is incurred, namely that additional stress to the server is dependent upon arrival and service times of all previous jobs. However, even under these circumstances, we may obtain a breakdown rate process and satisfactory conditional survival function for the server, and the door is opened to further discussion. By examining the multichannel server, we consider the interrelations of the channels themselves, and derive survival functions to meet the case when the channels are independent as well as when they are correlated.

### References

[1] Ki Hwan Cha and Eui Yong Lee. A stochastic breakdown model for an unreliable web server system and an optimal admission control policy. Journal of Applied Probability, 48(2):453–466, 2011.

[2] Andrzej Korzeniowski. On correlated random graphs. Journal of Probability and Statistical Science, 11:43–58, 2013.

[3] R. Traylor. Stochastic reliability of a server under random workload. Academic Advances of the CTO, 1, 2017.

Generalizing the Negative Binomial Distribution via First-Kind Dependence

## Generalizing the Negative Binomial Distribution via First-Kind Dependence

This paper generalizes the negative binomial random variable by generating it from a sequence of first-kind dependent Bernoulli trials under the identity permutation. The PMF, MGF, and various moments are provided, and it is proven that the distribution is indeed an extension of the standard negative binomial random variable. We examine the effect of complete dependence of the Bernoulli trials on the generalized negative binomial random variable. We also show that the generalized geometric random variable is a special case of the generalized negative binomial random variable, but the generalized negative binomial random variable cannot be generated from a sum of i.i.d. generalized geometric random variables.

On Permuted First-Kind Dependence of Categorical Random Variables

## On Permuted First-Kind Dependence of Categorical Random Variables

This paper discusses the notion of horizontal dependency in sequences of first-kind dependent categorical random variables. We examine the necessary and sufficient conditions for a sequence of first-kind dependent categorical random variables to be identically distributed when the conditional probability distribution of subsequent variables after the first are permuted from the identity permutation used in previous works.

On Server Efficiency

## Abstract

Editor’s note: This paper comprises the second chapter of the PhD dissertation by Rachel Traylor. Cha and Lee defined a mathematical notion of server performance by measuring efficiency $\psi$ defined as the long run average number of jobs completed per unit time. The service time distribution heavily influences the shape of the server efficiency as a function of a constant arrival rate $\lambda$. Various classes of distributions are studied in order to find sufficient conditions for the existence of a single maximum. The existence of a maximum allows for simple binary control policies to handle traffic and optimize performance.

## Introduction, Motivation, and Background

Cha and Lee [1] studied the reliability of a single server under a constant stress workload, and also defined a notion of server efficiency $\psi$ for a given intensity $\lambda(t)$ as the long-run average number of jobs completed per unit time as a way to measure server performance. With the number of jobs completed as $M$, the efficiency is defined

$$\psi := \lim\limits_{t \to \infty}\frac{E[M(t)]}{t}$$

Upon breakdown and rebooting, the server is assumed to be ‘as good as new’, in that performance of the server does not degrade during subsequent reboots. In addition, the model assumes the arrival process after reboot, denoted $\{N^{*}(t), t \geq 0\}$, is a nonhomogenous Poisson process with the same intensity function $\lambda(t)$ as before, and that $\{N^{*}(t), t \geq 0\}$ is independent of the arrival process before reboot. In a practical setting, this model assumes no ‘bottlenecking’ of arrivals occurs in the queue during server downtime that would cause an initial flood to the rebooted server. In addition, the reboot time is assumed to follow a continuous distribution $H(t)$ with expected value $\nu$. This process is a renewal reward process, with the renewal $\{R_{n}\} = \{M_{n}\}$, the number of jobs completed. The length of a renewal cycle is $Y_{n} + H_{n}$, where $Y_{n}$ is the length of time the server was operational, and $H_{n}$ is the time to reboot after a server crash. Then, by [2],

$$\psi = \frac{E[M]}{E[Y]+ \nu}$$

where $M$ is the number of jobs completed in a particular renewal cycle, $\nu$ is the mean time to reboot of the server, and $Y$ is the length of a particular renewal cycle. Then, using the definition of $\psi$ the following closed form of the efficiency of a server under all assumptions of Cha and Lee’s model is derived.

Theorem 1 (Server Efficiency under Cha/Lee)
Suppose $\{N(t), t \geq 0\}$ is a nonhomogenous Poisson process with intensity $\lambda(t)\geq 0$. Then the efficiency is given by

\begin{aligned}\psi&=\frac{1}{\int_{0}^{\infty}S_{Y}(t)dt + \nu}\left[\exp\left(-\int_{0}^{t}r_{0}(x)dx-\int_{0}^{t}\lambda(x)dx + a(t) + b(t)\right)\right.\\&\qquad\qquad\left.\times\left(r_{0}(t)a(t)+\eta a(t)b(t) \right)\right]\end{aligned} where $a(t) = \int_{0}^{t}e^{-\eta v}g_{W}(v)m(t-v)dv$, $b(t) = \int_{0}^{t}e^{-\eta(t-r)}\bar{G}_{W}(t-r)\lambda(r)dr$, $\bar{G}_{W}(x) = 1-\int_{0}^{x}g_{W}(s)ds$, and $m(x) = \int_{0}^{x}\lambda(s)ds$.

### Numerical Example and Control Policies

As an illustrative example, Cha and Lee considered the case when $\lambda(t) \equiv \lambda$,
$r_{0}(t) \equiv r_{0} = 0.2$, $\eta = 0.01$, $\nu = 1$, and $g_{W}(w) = we^{-w^{2}/2}$ (the PDF of the Rayleigh distribution). As shown in Figure 1, there exists a $\lambda^{*}$ such that $\psi(\lambda)$ is maximized. Thus one may implement the obvious optimal control policy for server control to avoid server overload:

(1) If the real time arrival rate $\lambda < \lambda^{*}$, do not interfere with arrivals.

(2) If $\lambda \geq \lambda^{*}$, facilitate some appropriate measure of interference.

Examples of interference for a web server in particular include rejection of incoming requests or possible re-routing. Cha and Lee give an interference policy of rejection with probability $1-\frac{\lambda^{*}}{\lambda}$.

The Rayleigh distribution used in Figure 1 has applications in physics, typically when the magnitude of a vector is related to its directional components. It is a special case of the Weibull distribution which is widely used in survival analysis, failure analysis, weather forecasting, and communications. These distributions are not typically used to model service times. The exponential distribution is the most common due to its memoryless properties, followed by the Erlang and uniform distributions.

The efficiency, $\psi$, under the Rayleigh distribution example in Figure 1 shows the existence of a $0 < \lambda^{*} < \infty$ such that $\psi(\lambda)$ is maximized at $\lambda^{*}$. This useful feature of $\psi(\lambda)$ in this case allows for the implementation of a simple control policy for arrivals to the server to prevent overload, given above.

Numerical simulations under a variety of possible distribution classes, including convex, concave, exponential, uniform, and Erlang suggest that the mathematical properties of $\psi$ are heavily influenced by the choice and characteristics of service time distribution $g_{W}(w)$. In particular, it is of interest to seek sufficient conditions of $g_{W}(w)$ that will guarantee the existence of a $\lambda^{*}$ that maximizes $\psi$. This is done for the uniform, compact support, and Erlang classes. Furthermore, it is shown under certain conditions, not only does the server efficiency lack a maximum, but $\psi$ increases without bound. This is not representative of real server behavior, and thus of mathematical and practical interest to note for further modeling.

## Efficiency of a Server under Uniform Service Life Distribution

Suppose $\lambda(x) \equiv \lambda$, and suppose $r_{0}(x) \equiv r_{0} = \max_{x \in (0,\infty)}r_{0}(x)$. The efficiency $\psi$ is given for constant $\eta$ and $\lambda$ here:
$\psi(\lambda) = \frac{1}{\int_{0}^{\infty}S_{Y}(t)dt + \nu}\left[\int_{0}^{\infty}\exp\left(-r_{0}t-\lambda t + a(t)+b(t)\right)(r_{0}+b(t))a(t)dt\right]$

where $S_{Y}(t)$ is the survival function of the node, $a(t) = \int_{0}^{t}e^{-\eta v}g(v)(t-v)dv$, $b(t) = \int_{0}^{t}e^{-\eta(t-r)}\bar{G}(t-r)dr$,
$g(v)$ is the pdf of the service time distribution, and $\bar{G}(x) = 1-\int_{0}^{x}g(s)ds$.

The following theorem gives sufficient conditions for the uniform distribution and $\eta$ that guarantee the existence of a finite maximum efficiency.

Theorem 2(Efficiency under Uniform Service Distribution)
Suppose the service life distribution is given by Uniform($c,d$) for some $0. Then if
$\sigma > \frac{ce^{-c\eta}}{\sqrt{12}\phi(-\eta)(1+\eta(c+d))+c\eta - 1}$ where $\phi(-\eta)$ the standard deviation of the service life $W$, $\psi(\lambda)$ has a maximum on $(0,\infty)$ is the moment generating function of a uniform distribution evaluated at $-\eta$.

Numerical simulations suggest that $\psi$ increases without bound for $c=0, d>1$. The following lemma proves this fact.

Lemma
Suppose the service life distribution is given by Uniform(0,d), with $d>1$. Then $\psi$ increases without bound.

This is worth discussing here. It makes no sense whatsoever for the efficiency of a server to increase forever as the arrival rate increases. So what’s happening here? Notice that if the uniform distribution include 0 as an endpoint, then there is a positive probability that the service time for a job will be exactly 0 (in whatever time units you like). This is impossible in reality. A small service time is possible, but it’s still strictly greater than 0. What we see here is that using distributions with positive support at 0 causes issues in the efficiency function; it’s not the fault of the definition of efficiency, but rather a consequence of using distributions that cannot mirror reality. The next section explores this further with a more broad class of distributions.

## Extension of the Uniform Distribution: Compact Support

The ideas and techniques presented in the previous section yield a powerful extension to any service life distribution with compact support away from 0. Supposing $g_{W}(w)$ has compact support $[a,b]$, it may be bounded above by a positively scaled uniform distribution. In practice, service times are finite and nonzero, thus this extension allows for very simple control policies to be implemented for a much larger class of distributions.

Theorem 3 (Efficiency under Compact Support)
Let $g_{W}(w)$ be the pdf of the service times having compact support $[c,d], d > 0$. Let $m = \max_{w}g_{W}(w) < \infty$, and $R = (d-c)$ be the length of the support. Then $\psi(\lambda)$ has a maximum if $m < \frac{c}{R\eta +e^{-b\eta}-e^{-a\eta}+\eta e^{-a\eta}}$.

Example 1

Let $g_{W}(w) = \frac{2}{5}w\mathbb{1}_{[2,3]}(w)$, and let $r_{0} = \nu = \eta = 1$. Then $m = \frac{6}{5}$. By Theorem 2, $m < \frac{2}{e^{-2}+e^{-3}+1}\approx 1.678$. Thus, the existence of a maximum $\psi$ is guaranteed. Figure 2 gives the numerical result with step size of 0.1. The maximum occurs around $\lambda^{*} = 0.5$

Example 2: Sufficient conditions aren’t always necessary conditions
The condition given in Theorem 2 is rather weak and is only a sufficient condition. As an illustration, consider the same increasing density shifted to a support of $[1,2]$. Thus
$g_{W}(w) = \frac{2}{3}w\mathbb{1}_{[1,2]}(w)$. Retain all other assumptions from Example 1. Then
$$m = \frac{4}{3} > \frac{1}{e^{-2}+1} \approx 0.88$$ which violates the condition. However, consider Figure 3 above. Clearly $\psi$ has a maximum at approximately $\lambda^{*} = 0.8$
The condition given in Theorem 2 relies on overestimating $g_{W}(w)$ by a constant. If the variance $\sigma_{W}^{2}$ of $g_{W}(w)$ is large, the maximum of the pdf will decrease and be more comparable to the rest of the distribution. In these cases, bounding $g_{W}(w)$ by its max $m$ over the support $[a,b]$ will give a reasonable approximation. However, in the case of a small support and high enough skew compared to the size and location of the support, much of the mass is concentrated at the right end of the support, and $m$ will be higher. In these cases, bounding $g_{W}(w)$ by $m$ will result in a large amount of overestimation, and thus the condition may fail, but $\psi$ still has a maximum. The numerical example wherein $g_{W}(w) = \frac{2}{3}w\mathbb{1}_{[1,2]}(w)$ illustrates the conservative nature of this type of estimation.

## Efficiency under Erlang Service Distribution

Now suppose $g(v)$ is of the Erlang class but shifted $\delta > 0$ to the right. For motivation, consider that service times can never be 0 in a practical setting. Then the PDF and the complement of the CDF are given by
\begin{aligned}g(v;k,\gamma,\delta) &= \left\{\begin{array}{lr}0, & 0 \leq v \leq \delta \\\frac{\gamma^{k}(v-\delta)^{k-1}e^{-\gamma(v-\delta)}}{(k-1)!}, & v\geq \delta\end{array}\right.\\\bar{G}(v;k,\gamma,\delta) &= \left\{\begin{array}{lr}1, & 0 \leq v \leq \delta \\e^{\gamma(\delta-v)}\sum_{j=0}^{k-1}\frac{\gamma^{j}(v-\delta)^{j}}{j!}, & v\geq \delta\end{array}\right.\end{aligned}

Theorem 4 (Server Efficiency under Erlang Distribution)
Let $\delta > 0, \eta > 0$. Let $\alpha(\delta) = \left(\frac{\gamma}{\gamma + \eta}\right)^{k}e^{-\eta\delta} + \frac{\gamma^{k}e^{\gamma\delta}(k-1)}{(\eta + \gamma)^{k-1}}$, and $0 < \beta(\delta,\eta) < 1$. If the service life distribution is of the $\delta-$shifted Erlang class, then $\psi(\lambda)$ has a maximum in $\lambda$ on $(0,\infty)$ for $\delta,\eta$ such that $\alpha(\delta) + \beta(\delta,\eta) < 1$.

Thus, we see that the Erlang service distribution produces a maximum as long as the support is shifted away from 0.

## Efficiency under Exponential Distribution

The exponential distribution is the most common service distribution assumed for a queue. The M/M/1 queue assumes a Poisson distribution for arrival times, and an exponential distribution or service times. However, the exponential distribution has most of its mass concentrated at or near 0. As we have seen in other distributions, this becomes a problem, both in considering what makes sense in practical applications, and in its effect on the efficiency.
Suppose $g(v)$ is of the exponential class. That is, $g(v) = \gamma e^{-\gamma v}, \gamma > 0$. It will be proven that under certain conditions on $\eta$ and $\gamma$, an exponential $g(v)$ causes $\psi$ to increase without bound.

Theorem 5 (Efficiency under Exponential Distribution)
Suppose $g(v) = \gamma e^{-\gamma v}$. Then if $\frac{2\gamma}{\gamma + \eta} > 1 + \frac{2}{\frac{2}{\gamma} + W\left(-\frac{\gamma}{\gamma +\eta}e^{-2-\frac{2\eta}{\gamma}}\right)}$, $\psi(\lambda) \to \infty$,  as $\lambda \to \infty$ where $W(\cdot)$ is the Lambert W-function.

## Extension to Random Stress and Nonconstant Intensity

All the above theorems regarding the server efficiency  assumed constant stress $\eta$ and constant intensity $\lambda$. This section generalizes the analyses in the previous sections for nonconstant intensity and random stress.

The stochastic reliability models in both [1]  and the previous chapter both assumed a time-dependent intensity $\lambda(t)$. By setting $\lambda \equiv \max_{t}\lambda(t)$, the established efficiency theorems provide a conservative set of conditions under which a maximum efficiency may be obtained. In these cases, $\psi$ is actually a function of all possible $\lambda_{\max}$.

If the job stresses are random, as in Theorem 2 of Traylor (2016), the above sections may still be utilized. Assume the sample space for $\mathcal{H}$ is compact and nonnegative. The efficiency is given by Theorem 2 of Traylor (2016).
WLOG, again suppose the sample space of $\mathcal{H}$ is discrete, given by $\{\eta_{1},...,\eta_{m}\}$ with respective probabilities $p_{i}, i =1,...,m$. Now suppose all mass is concentrated at $\eta_{[m]}$. Let $a_{m}(t) = \lambda\int_{0}^{t}e^{-\eta_{m}v}g(v)(t-v)dv$, and $b_{m}(t) = \lambda\int_{0}^{t}e^{-\eta_{m}(t-r)}\bar{G}_{W}(t-r)dr$. Then, the following are true:
(1) $E_{\mathcal{H}}[a(t)+b(t)] \leq a_{m}(t) + b_{m}(t)$

(2)$E_{\mathcal{H}}[a(t)] \leq a_{m}(t)$

(3) $E_{\mathcal{H}}[\mathcal{H} a(t)b(t)] \leq \eta_{m}a_{m}(t)b_{m}(t)$

Thus, by replacing the expectations in (1)-(3) with their respective upper bounds in the efficiency theorems, analyses of the efficiency for the uniform, compact support, and Erlang classes may proceed as previously detailed. These estimates are conservative but sufficient.

For an exponential service life distribution and random stress, create a lower bound for the expectations in
(1)-(3) by concentrating all mass at $\eta_{[1]}$. Then the conditions from Theorem 5 guarantee an explosion for the exponential distribution.

## Implications and Numerical Illustrations

### Uniform Service Distribution

Click on an image in the gallery to expand and explore the efficiency of a server under different instances of the uniform service distribution.
For the uniform service distribution, both the variance of $g_{W}(w)$ and the location of the support affect the efficiency itself. The gallery above shows $\psi(\lambda)$ for various uniform distributions as an illustration. In all four cases, $r_{0} = \nu =\eta =1$. The variance of a uniform distribution is given by $\sigma^{2} = \frac{d-c}{12}$. With the exception of the fourth gallery image, which illustrates the explosion of $\psi$ when the distribution has positive mass at 0, Table 1 below compares possible values of $\psi$ for different Uniform distributions. Notice that while the variance $\sigma^{2}$ does affect the range of $\psi$ by several orders of magnitude, the location of the support has a much more powerful effect. Thus, if all service times are equally likely, a server is less efficient if it is consistently but mildly slow (Uniform[10,11]) compared to an inconsistent server (Uniform[1,500]).

$$\begin{array}{l|rrrr}g_{W}(w) & \sigma^{2}&\mu& \text{Approximate Range of } \psi & \text{Approximate } \lambda^{*}\\\text{Uniform}[1,2] & 1/12 &1.5 & (0,0.012) & 1.3\\ \text{Uniform}[10,11] & 1/12 &10.5 & (0, 2\times10^{-11}) & 0.1 \\ \text{Uniform}[1,500] & 499/12 & 250.5 & (0,3\times 10^{-5}) & 1.6\end{array}$$

### Increasing but Compact Service Distribution

Click to expand and explore.

$$\begin{array}{l|rrrr}g_{W}(w) & \sigma^{2}&\mu& \text{Approx. Range of } \psi & \text{Approx. } \lambda^{*}\\\small{\frac{2w}{3}}\mathbb{1}_{[1,2]}(w)& \approx 0.0802&\approx 1.56 & (0,7\times10^{-3}) & 0.75\\\small{\frac{2w}{5}}\mathbb{1}_{[2,3]}(w)&\approx 0.0822 &\approx 2.53 & (0, 7\times10^{-4}) & 0.5 \\\small{\frac{2w}{500^{2}-1}}\mathbb{1}_{[1,500]}(w)& \approx 13888& 333.35 & (0,1.3\times 10^{-7})& 1.4\end{array}$$

As an illustration of a distribution on compact support, consider the class of increasing densities $g_{W}(w) = cx\mathbb{1}_{[a,b]}(x)$. Several examples are given in the above gallery table. For both compact supports of length 1, the variance is approximately the same, but the mean changes, producing an order of magnitude decrease in efficiency. Compared to the compact support of length 499, with a much larger mean, the efficiency decreases by 3 orders of magnitude. Notice, however, that the decline after the maximum is much less sharp.

### Erlang Service Distribution

Click on an image to explore.

$$\begin{array}{l|rrrr}g_{W}(w) & \sigma^{2}&\mu& \text{Approx. Range of } \psi & \text{Approx. } \lambda^{*}\\\text{Erlang}(2,1)& 2&2 & (0,0.7)& 9\\\text{Erlang}(9,1)&9 &9 & (0, 4\times10^{-6}) & 0.5 \\\text{Rayleigh}(1)& (4-\pi)/2& \sqrt{\pi/2} & (0,0.9)& 8\end{array}$$ The gallery above gives two examples of $\psi$ under an Erlang distribution. Notice the change in the efficiency as the mean increases. Here, since $\lambda =1$, $\sigma^{2} = \mu$, so the mean likely has the largest effect on $\psi$.

### Comparing all the examples

Comparing all examples, the Rayleigh(1) service distribution imposes the highest maximum efficiency, followed closely by the Erlang(2,1) service distribution, with the Uniform[1,2] service distribution following. $\lambda^{*}$ under the Erlang(2,1) service distribution is larger than for the Rayleigh(1) service distribution, indicating that a server whose service times follow the former distribution can handle a larger arrival intensity before its efficiency begins to decline than the latter.

The means for the Rayleigh(1), Erlang(2,1), and Uniform[1,2] distributions are similar, as shown in the tables, but the Uniform[1,2] distribution has equal probability of any service time in its support with large negative excess kurtosis. It is posulated that kurtosis, skew, and variance play large roles in the behavior and range of $\psi$. Compare the efficiency under the Erlang(2,1) service distribution with the efficiency under the Erlang(9,1) service distribution. Not only is the mean much lower for the Erlang(2,1) distribution, but the distribution is more strongly positive-skewed than the Erlang(9,1). Thus, more mass is concentrated at the left side of the distribution, indicating that the service times are more often shorter.

Finally, to note the effect of the typical stress level $\eta$ on the range of $\psi$, we compare the original Cha/Lee figure with our Rayleigh figure above. The service distribution and all other quantities remain the same, but Cha and Lee’s numerical example set $\eta = 0.01$, whereas the third image in the Erlang gallery shows $\psi$ under $\eta = 1$. The range of $\psi$ decreases by two orders of magnitude with a 100 fold increase in $\eta$, with the shape remaining similar. In addition, the location of the maximum $\lambda^{*}$ also inversely varies by the same magnitude.

Studying the efficiency under various service distributions aids not only in deciding when to implement a server intervention, but also aids in evaluating the performance of various servers given their service times.

## Conclusion

Many questions have been opened up about the various effects of the service distribution and stress on the server efficiency. The conditions given above are merely sufficient conditions, and have not been proven necessary. Studying the efficiency as a function of the arrival rate under various service distributions and stress provides a way to understand the performance of a server and the various factors that can affect it mathematically.

## References

1.  CHA, K. H., AND LEE, E. Y. A stochastic breakdown model for an unreliable web server system and an optimal admission control policy. Journal of Applied Probability 48, 2 (2011), 453–466.
2.  ROSS, S. Stochastic Processes, 2nd edition.
3.  TRAYLOR, R. Stochastic reliability of a server under random workload. Ph.D. thesis (2016).
A Generalized Geometric Distribution from Vertically Dependent Bernoulli Random Variables

## Abstract

This paper generalizes the notion of the geometric distribution to allow for dependent Bernoulli trials generated from dependency generators as defined in Traylor and Hathcock’s previous work. The generalized geometric distribution describes a random variable $X$ that counts the number of dependent Bernoulli trials until the first success. The main result of the paper is  $X$ can count dependent Bernoulli trials from any dependency structure and retain the same distribution. That is, if  $X$ counts Bernoulli trials with dependency generated by $\alpha_{1} \in \mathcal{C}_{\delta}$, and $Y$ counts Bernoulli trials with dependency generated by $\alpha_{2} \in \mathscr{C}_{\delta}$, then the distributions of  $X$ and $Y$ are the same, namely the generalized geometric distribution. Other characterizations and properties of the generalized geometric distribution are given, including the MGF, mean, variance, skew, and entropy.

## Introduction

The standard geometric distribution counts one of two phenomena:

1.  The count of i.i.d. Bernoulli trials until the first success
2.  The count of i.i.d. Bernoulli trials that resulted in a failure prior to the first success

The latter case is simply a shifted version of the former. However, this distribution, in both forms, has limitations because it requires a sequence of independent and identically distributed Bernoulli trials. Korzeniowski [2] originally defined what is now known as first-kind (FK) dependent Bernoulli random variables, and gave a generalized binomial distribution that allowed for dependence among the Bernoulli trials. Traylor [4] extended the work of Korzeniowski into FK-dependent categorical random variables and derived a generalized multinomial distribution in a similar fashion. Traylor and Hathcock [5] extended the notion of dependence among categorical random variables to include other kinds of dependency besides FK dependence, such as sequential dependence. Their work created a class of vertical dependency structures generated by a set of functions

$$\mathscr{C}_{\delta} = \{\alpha: \mathbb{N}_{\geq 2} \to \mathbb{N} : \alpha(n) < n \text{ and } \forall n \exists j \in \{1,\ldots,n-1\} : \alpha(n) = j\},$$

where the latter property is known as dependency continuity. In this paper, we derive a generalized geometric distribution from identically distributed but dependent Bernoulli random variables. The main result is that the pdf for the generalized geometric distribution is the same regardless of the dependency structure. That is, for any $\alpha \in \mathscr{C}_{\delta}$ that generates a sequence of identically distributed but dependent Bernoulli trials, the generalized geometric distribution remains unchanged.

## Background

The standard geometric distribution is built from a sequence of independent and identically distributed (i.i.d.) Bernoulli random variables with probability of success $p$ and probability of failure $q=1-p$. There are two “versions” of the geometric distribution:

• A random variable $X$ has a geometric distribution if it counts the number of Bernoulli trials needed to observe the first success.
•  A random variable $Y = X-1$ has a geometric distribution if it counts the number of failures in a sequence of Bernoulli trials before the first observed success.

In the first case, $X$ has support $\{1,2,3,...\}$, because we are looking for the first success, which can occur on trial 1, 2, 3, and so forth. In the latter case, $Y$ has support $\{0,1,2,...\}$ because we are counting the number of failures before the first success occurs. That is, if the first success occurs on Trial 1, then there were 0 failures preceding the first success. If the first success occurs on trial 2, then one failure occurred prior to the first success, and thus $Y=1$. Essentially, Version 2 is a shifted Version 1, because our perspective changes– we do not include the success in the count in Version 2.

For Version 1, known as the standard geometric distribution the pdf is given by

$$f_{X}(k) = q^{k-1}p, \quad k = 1, 2, 3, \ldots$$

For Version 2, (the shifted generalized geometric distribution)the pdf is given by

$$f_{Y}(k) = q^{k}p, \quad k = 0, 1, 2, \ldots$$

The next section derives the generalized geometric distribution for FK-dependent random variables, and then shows that the pdf is the same regardless of dependency structure.

## Generalized Geometric Distribution

### Derivation from FK-Dependent Bernoulli Random Variables

Suppose we have a sequence of FK-dependent Bernoulli Random variables. Recall from [2] and [4] that FK-dependent random variables are weighted toward the outcome of the first variable $\epsilon_{1}$. That is, in the Bernoulli case, $P(\epsilon_{1} = 1) = p$ and $P(\epsilon_{1} = 0) = q = 1-p$. For subsequent variables in the sequence,

\begin{aligned}P(\epsilon_{n} = 1 | \epsilon_{1}= 1) = p^{+} &\qquad P(\epsilon_{n} = 1 | \epsilon_{1} = 0) = p^{-} \\P(\epsilon_{n} = 0 | \epsilon_{1} = 1) = q^{-} &\qquad P(\epsilon_{n} = 0 | \epsilon_{1} = 0) = q^{+}\end{aligned}

for $n\geq 2$, where $q = 1-p$, $p^{+} = p + \delta q$, $p^{-} = p-\delta p$, $q^{-}= q-\delta q$, $q^{+} = q + \delta p$, and $0 \leq \delta \leq 1$ is the dependency coefficient.

We will first give the generalized “Version 1” of the geometric distribution for FK-dependent random variables.

Proposition.
Suppose $\epsilon = (\epsilon_{1},\epsilon_{2},\ldots, \epsilon_{n},\ldots)$ is a FK-dependent sequence of Bernoulli random variables. Let $X$ be the count of such Bernoulli variables needed until the first success. Then $X$ has a generalized geometric distribution with pdf
$$f_{X}(k) = \left\{\begin{array}{lr}p, & k = 1 \\q\left(q^{+}\right)^{k-2}p^{-}, & k \geq 2\end{array}\right.$$

In a similar fashion, suppose we prefer to count the number of failures before the first success occurs. For this generalized “Version 2”, we have the following proposition.

Proposition.
Suppose $\epsilon = (\epsilon_{1},\epsilon_{2},\ldots,\epsilon_{n},\ldots)$ is a FK-dependent sequence of Bernoulli random variables. Let $Y = X-1$ be the count of failures prior to the first success. Then $Y$ has a shifted generalized geometric distribution with pdf

$$f_{Y}(k) = \left\{\begin{array}{lr}p, & k = 0 \\q\left(q^{+}\right)^{k-1}p^{-}, & k \geq 1\end{array}\right.$$

### Generalized Geometric Distribution for any Vertical Dependency Structure

Propositions 1 and 2 were derived for FK-dependent random variables, but in fact these random variables $X$ and $Y$ remain distributed according to the generalized geometric distribution and the shifted generalized geometric distribution regardless of the vertical dependency structure specified, as long as the dependency structure was generated from a function in $\mathscr{C}_{\delta}$.

Theorem.
Let $\epsilon = (\epsilon_{1}, \epsilon_{2},\ldots,\epsilon_{n},\ldots)$ be a vertically dependent sequence of Bernoulli random variables, where the dependency is generated by $\alpha \in \mathscr{C}_{\delta}$. Let $X$ be the count of such Bernoulli trials needed until the first success, and let $Y = X-1$ be the count of failures of such Bernoulli trials prior to the first success. Then the pdf of $X$ and $Y$ is identical to those given in Propositions 1 and 2.

This result is quite powerful, and not one that holds for all generalized distributions constructed from dependent random variables. Given any vertical dependency structure generated from the broad class $\mathscr{C}_{\delta}$, the count of trials before a success and the count of failures before a success have the same probability distribution. Thus, if this information is desired, no information about the dependency structure other than the membership of its generating function in $\mathscr{C}_{\delta}$ is necessary. The only information needed to calculate the generalized geometric probabilities for dependent Bernoulli trials is $p$ and $\delta$.

The next section gives some basic properties of the Generalized Geometric Distribution, such as the moment generating function and selected moments.

## Properties  of the Generalized Geometric Distribution

### Moment Generating Function

Fact.

The moment generating function of the generalized geometric distribution is
$$M_{X}(t) = pe^{t} + \frac{qp^{-}e^{2t}}{1-q^{+}e^{t}}$$

### Mean

Fact.

The mean of the generalized geometric distribution is
$$E[X] = \mu = \frac{1-\delta p}{p(1-\delta)}$$

The effect of dependence can be seen in the plot of $E[X]$ below in Figure 1.. For fixed $p$, when $\delta \to 1$, $E[X] \to \infty$, though the rate changes with $p$.

To explore further, suppose $p = 1/2$, and the Bernoulli trials are thus balanced between success and failure. Figure 2 shows the effect of delta for a fixed $p$. Notice that the effect of $\delta$ on the expected value become more pronounced as $\delta \to 1$. In particular, for $\delta = 1/2$ and $p=1/2$, $E[X] = 3$, but after this point, an increase of only $1/6$ in $\delta$ to $\delta = 2/3$ increased the expected value to 4 trials before a success. To double the expected number of trials before a success again to $E[X] = 8$ requires an increase of $\delta$ by only $4/21$ to $6/7$.

A smaller probability of success $p$ will yield an expected value $\mu$ that is much more susceptible to effects of dependency $\delta$, and a larger $p$ will yield an expected value more resistant to high dependency $\delta$. Since the geometric distribution is a count of the number of trials needed to obtain the first success, a higher $p$ increases the probability that the first success occurs on the first trial, while a lower $p$ decreases that probability. Therefore, the dependency $\delta$ would have a higher effect for lower $p$, because a longer (dependent) sequence is expected to be generated prior to the first success, which increases the expected number of trials faster than if the Bernoulli trials were independent.

Remark. Notice that when $\delta = 0$, the Bernoulli trials are independent. The mean of the generalized geometric distribution when $\delta = 0$ is $E[X] = \frac{1}{p}$, the mean of the standard geometric distribution.

### Variance

Fact.

The variance of the generalized geometric distribution is

$$\text{Var}(X) = \sigma^{2} = \frac{1-p + \delta p(1-p)}{p^2(1-\delta)^2}$$

Figure 3 shows the effect of $\delta$ on the variance for different values of $p$. As with the mean, a smaller $p$ induces a higher effect of $\delta$ on the variance of the number of dependent Bernoulli trials before the first success is observed. The shape of all 4 cases is similar, but the scales are vastly different. As $p$ increases, the scale of the variance decreases dramatically.

Remark.  Again, note that when $\delta = 0$, the variance of the generalized geometric distribution reduces to that of the standard geometric distribution.

### Skew

Fact.

The skew of the generalized geometric distribution is given by
$$\text{Skew}[X] = \frac{2-3p + p^2 + \delta p[q+\delta p q + p(2\delta-1-p)]}{\left(q + \delta pq\right)^{3/2}}$$

The skew of the generalized geometric distribution gets more complicated in its behavior as a function of both $p$ and $\delta$. Figure 4 shows the skew as a function of $p$ for the two extreme cases: complete independence ($\delta = 0$) and complete dependence $\delta = 1$. From $p=0$ to $p \approx 0.658$, the skew for the independent geometric distribution is greater than the completely dependent case. For $p \gtrapprox 0.658$, the skew is greater under complete dependence.

### Entropy

The entropy of a random variable measures the average information contained in the random variable. It can also be viewed as a measure of how unpredictable or “truly random” the variable is [1] . The definition of entropy, denoted $H(X)$, was coined by Claude Shannon [3] in 1948.

Definition. (Entropy)
$$H(X) := -\sum_{i}P(x_{i})\log_{2}(P(x_{i}))$$

For the standard geometric distribution, the entropy is given by

$$H_{sg}(X) = \frac{-(1-p)\log_{2}(1-p)-p\log_{2}(p)}{p}$$

Fact.
The entropy for the generalized geometric distribution is
$$H_{gg}(X) = \frac{-\left[pp^{-}\log_{2}(p) + qp^{-}\log_{2}(qp^{-}) + qq^{+}\log_{2}(q^{+})\right]}{p^{-}}$$

Figure 5a shows $H_{gg}(X)$ as a function of $p$ for fixed values of $\delta$. Notice that while the entropy decreases to 0 for all curves as $p \to 1$, the entropy curve is shifted upward for larger $\delta$. Figure 5b fixes $p$ and looks at entropy as a function of $\delta$. Notice that for smaller $p$, the entropy is much higher, which aligns with intuition.

## Conclusion

The standard geometric distribution counts the number of independent Bernoulli trials until the first success. This paper uses the works of Korzeniowski, and Traylor and Hathcock [2, 4, 5] on dependent Bernoulli and categorical random variables to develop a generalized geometric distribution built from dependent Bernoulli random variables. The main result of the paper is that the pdf for the generalized geometric distribution is independent of the dependency structure of the Bernoulli random variables that comprise it. That is, regardless of dependency structure, the pdf for the generalized geometric distribution is given by Proposition 1. Various properties and characterizations were given, including the moment generating function, mean, variance, skew, and entropy. The effect of dependency on each property was studied.

## References

1. Skewness. https://en.wikipedia.org/wiki/skewness.
2. Andrzej Korzeniowski. On correlated random graphs. Journal of Probability and Statistical Science,pages 43–58, 2013.
3. Claude Shannon. A mathematical theory of communication. The Bell System Technical Journal, 1948.
4.  Rachel Traylor. A generalized multinomial distribution from dependent categorical random variables. Academic Advances of the CTO, 2017.
5. Rachel Traylor and Jason Hathcock. Vertical dependency in sequences of categorical random variables. Academic Advances of the CTO, 2017.
Stochastic Reliability of a Server under a Random Workload

## Abstract

Editor’s note: This paper is the first chapter from a PhD thesis published in 2016 by Rachel Traylor. We generalize a 2011 model from Cha and Lee that gave a closed form for the survival function of a server under a random workload, and with random service times. In this work, the constant stress assumption of Cha and Lee is relaxed and allowed to be random; that is, every request to the server brings a random stress or workload to the server for the duration of its stay until completion. The efficiency measure of a random stress server is defined as the long run average number of jobs completed per unit time, and provides a way to measure server performance.

## Introduction

There are many types of systems which can be dubbed servers, such as a retail checkout counter, a shipping company, a web server, or a customer service hotline. All of these systems have common general behavior. Requests or customers arrive via a stochastic process, the service times vary randomly, and each request stresses the server if only temporarily. A general stochastic model that describes the reliability of such a server can provide the necessary information for optimal resource allocation and efficient task scheduling, leading to significant cost savings for businesses and improved performance metrics [6]. Such topics have been studied in literature for several decades [1, 2, 3, 20].

Much attention was devoted to reliability principles that model software failures and bug fixes, starting with Jelinski and Moranda in 1972 [11]. The hazard function under this model shows the time between the i-th failure and the (+ 1)st failure. Littlewood (1980) [16] extended this initial reliability model for software by assuming differences in error size [12].

These models have been extended into software testing applications [4, 5, 19] and optimal software release times [7, 8, 17, 22]. The explosion of e-commerce and the resulting increase in internet traffic have led to the development of reliability models for Web applications. Heavy traffic can overload and crash a server; thus, various control policies for refreshing content and admission of page requests were created [10, 13, 15, 18, 23].

In particular, Cha and Lee (2011) [9] proposed a stochastic breakdown model for an unreliable web server whose requests arrive at random times according to a nonhomogenous Poisson process and bring a constant stress factor to the system that dissipates upon service completion. The authors provide a fairly general survival function under any service distribution $g_{W}(w)$, define server efficiency to measure performance, and illustrate a possible admission control policy due to an observed property of the server efficiency under a specific numerical example.

Thus far, no extensions of [9] have been proposed. This work generalizes the model put forth by Cha and Lee in a variety of ways. First, the assumption of constant job stress is relaxed and replaced by a random variable, and a new survival function and efficiency equation are derived. This work, while suitable for IT applications, is general enough for use in almost any industry, including logistics, retail, manufacturing, and engineering systems.

## Background (Cha and Lee’s Model)

### System Description and Survival Function

Cha and Lee considered a web server wherein each request arrives via a nonhomogenous Poisson process $\{N(t): t \geq 0\}$ with intensity function $\lambda(t)$. Each request adds a constant stress $\eta$, increasing the breakdown rate for the duration of service. Suppose $r_{0}(t)$ is the breakdown rate of the idle server. Then the breakdown rate process $\mathscr{B}(t)$ is defined as

$$\mathscr{B}(t) := r_{0}(t) + \eta\sum_{j=1}^{N(t)}\mathbb{1}(T_{j} \leq t \leq T_{j} + W_{j})$$

where $N(t), \{T_{j}\}_{j=1}^{N(t)}, \{W_{j}\}_{j=1}^{N(t)}$ are the random variables that describe the number of arrivals, arrival times, and service times, respectively. It is assumed that $\{T_{j}\}_{j=1}^{N(t)}$ are independent of each other. Furthermore, the $\{W_{j}\}_{j=1}^{N(t)} \sim g_{W}(w)$ are i.i.d. and are mutually independent of all $T_{j}$‘s.

Under these conditions, Cha and Lee proved the following theorem that gives the explicit form of the survival function under this model:

Theorem.

Suppose that $\{N(t), t \geq 0\}$ is a nonhomogenous Poisson process with intensity function $\lambda(t)$, i.e. $m(t) \equiv \int_{0}^{t}\lambda(x)dx$. Assuming the conditional survival function is given by

$$P\left(Y > t \left| N(t), \{T_{j}\}_{j=1}^{N(t)}, \{W_{j}\}_{j=1}^{N(t)}\right.\right) = \bar{F}_{0}(t)\exp\left(-\eta\sum_{j=1}^{N(t)}\min(W_{j}, t-T_{j})\right)$$

and $m(t)$ has an inverse, the survival function of $Y$ is given by

$$S_{Y}(t) = \bar{F}_{0}(t)\exp\left(-\eta\int_{0}^{t}\exp(-\eta w)\bar{G}_{W}(w)m(t-w)dw\right)$$

and the hazard function of $Y$, denoted $r(t)$, is given by

$$r(t) = r_{0}(t) + \eta\int_{0}^{t}e^{-\eta w}\bar{G}_{W}(w)\lambda(t-w)dw$$

### Efficiency of the Server

It is natural to develop some measure of server performance. Cha and Lee measure such performance by defining the efficiency, $\psi$, of the web server as the long-run expected number of jobs completed per unit time. That is, with the number of jobs completed as $M$, the efficiency is defined

$$\psi := \lim\limits_{t \to \infty}\frac{E[M(t)]}{t}$$

Upon breakdown and rebooting, the server is assumed to be ‘as good as new’, in that performance of the server does not degrade during subsequent reboots. In addition, the model assumes the arrival process after reboot, denoted $\{N^{*}(t), t \geq 0\}$, is a nonhomogenous Poisson process with the same intensity function $\lambda(t)$ as before, and that $\{N^{*}(t), t \geq 0\}$ is independent of the arrival process before reboot. In a practical setting, this model assumes no ‘bottlenecking’ of arrivals occurs in the queue during server downtime that would cause an initial flood to the rebooted server. In addition, the reboot time is assumed to follow a continuous distribution $H(t)$ with expected value $\nu$. This process is a renewal reward process, with the renewal $\{R_{n}\} = \{M_{n}\}$, the number of jobs completed. The length of a renewal cycle is $Y_{n} + H_{n}$, where $Y_{n}$ is the length of time the server was operational, and $H_{n}$ is the time to reboot after a server crash. Then, by [21],

$$\psi = \frac{E[M]}{E[Y]+ \nu}$$

where $M$ is the number of jobs completed in a particular renewal cycle, $\nu$ is the mean time to reboot of the server, and $Y$ is the length of a particular renewal cycle. Then, using the definition of efficiency, the following closed form of the efficiency of a server under all assumptions of Cha and Lee’s model is derived.

Theorem. (Efficiency of the Constant Stress Server)
Suppose $\{N(t), t \geq 0\}$ is a nonhomogenous Poisson process with intensity $\lambda(t)\geq 0$. Then the efficiency is given by

\begin{aligned}\psi &= \frac{1}{\int_{0}^{\infty}S_{Y}(t)dt + \nu}\left[\exp\left(-\int_{0}^{t}r_{0}(x)dx-\int_{0}^{t}\lambda(x)dx + a(t) + b(t)\right)\right.\\&\kern{10em}\biggl.\times\left(r_{0}(t)a(t) + \eta a(t)b(t)\right)\biggr]\end{aligned}

where $a(t) = \int_{0}^{t}e^{-\eta v}g_{W}(v)m(t-v)dv$, $b(t) = \int_{0}^{t}e^{-\eta(t-r)}\bar{G}_{W}(t-r)\lambda(r)dr$, $\bar{G}_{W}(x) = 1-\int_{0}^{x}g_{W}(s)ds$, and $m(x) = \int_{0}^{x}\lambda(s)ds$.

### Numerical Examples and Control Policies

As an illustrative example, Cha and Lee considered the case when $\lambda(t) \equiv \lambda$,
$r_{0}(t) \equiv r_{0} = 0.2$, $\eta = 0.01$, $\nu = 1$, and $g_{W}(w) = we^{-w^{2}/2}$ (the PDF of the Rayleigh distribution). As shown in the figure above, there exists a $\lambda^{*}$ such that $\psi(\lambda)$ is maximized. Thus one may implement the obvious optimal control policy for server control to avoid server overload:

1. If the real time arrival rate $\lambda < \lambda^{*}$, do not interfere with arrivals.
2. If $\lambda \geq \lambda^{*}$, facilitate some appropriate measure of interference.

Examples of interference for a web server in particular include rejection of incoming requests or possible re-routing. Cha and Lee give an interference policy of rejection with probability $1-\frac{\lambda^{*}}{\lambda}$. The next section presents a generalization of the above model by relaxing the assumption that the job stress $\eta$ is constant. This yields a far more generalized model that can encompass random stress with any distribution.

## Random Stress Reliability Model

The basic model here is very similar to Cha and Lee’s. The main difference is that we relax the constant stress assumption and allow it to be random.

Assume that each job $j$ coming into the server adds a random stress $\mathcal{H}_{j}$ to the server for the duration of its time in the system. Suppose $\{\mathcal{H}_{j}\}_{j=1}^{N(t)} \sim \mathcal{H}$, and are i.i.d. where WLOG $\mathcal{H}$ is a discrete random variable with a finite sample space $S = \{\eta_{i}: \eta_{i} \in \mathbb{R}^{+}, i = 1,...,m \text{ for } m \in \mathbb{N}\}$ and probability distribution given by

$$P(\mathcal{H} = \eta_{i}) = p_{i}, \quad i = 1,\ldots,m$$

The following assumptions from Cha and Lee are retained:

1. (CL1) Requests arrive via a nonhomogenous Poisson Process $\{N(t), t \geq 0\}$ with intensity $\lambda(t)$.
2. (CL2) Arrival times $\{T_{j}\}_{j=1}^{N(t)}$ are independent.
3. (CL3) Service times $\{W_{j}\}_{j=1}^{N(t)}$ are i.i.d. with pdf $g_{W}(w)$ and mutually independent of all arrival times.

Then, the random stress breakdown rate (RSBR) process $\mathscr{B}(t)$ is given by

$$\mathscr{B}(t) = r_{0}(t) + \sum_{j=1}^{N(t)}\mathcal{H}_{j}\mathbb{1}(T_{j} < t \leq T_{j} + W_{j}), \quad t \geq 0$$

Compare the sample trajectory shown in the figure above to the sample trajectory of breakdown rate under the original model (see page 2). The random stress brought by each request still disappears upon job completion, but the effect on the server is no longer deterministic. Thus, for the same set of arrival times and respective completion times, the realization of the breakdown rate process under the RSBR model has one more element of variation.

Let $Y$ be the random time to breakdown of the web server given the workload from client requests. Let $\mathfrak{T} = \{T_{j}\}_{j=1}^{N(t)}$, $\mathfrak{W} = \{W_{j}\}_{j=1}^{N(t)}$, and $\mathfrak{H} = \{\mathcal{H}_{j}\}_{j=1}^{N(t)}$, with observed values $\mathfrak{t} = \{t_{j}\}_{j=1}^{N(t)}$, $\mathfrak{w} = \{w_{j}\}_{j=1}^{N(t)}$, and $\mathfrak{h} = \{\eta_{i_{j}}\}_{i_{j}=1}^{N(t)}$. Then the conditional survival function of the server, given the arrival process of client requests ($N(t)$), job stresses ($\mathfrak{H}$), service times ($\mathfrak{W}$), and arrival times ($\mathfrak{T}$) is

\begin{aligned}S_{\scriptstyle{Y|N(t), \mathfrak{T}, \mathfrak{W}, \mathfrak{H} }}(t|n, \mathfrak{t}, \mathfrak{w}, \mathfrak{h}) &=e^{ -\int_{0}^{t}\mathscr{B}(x)dx}\\&= \bar{F}_{0}(t)e^{-\sum_{j=1}^{N(t)} \mathcal{H}_{i}\min(W_{j}, t-T_{j})}\end{aligned}

where $\bar{F}_{0}(t) = \exp\left(-\int_{0}^{t}r_{0}(x)dx\right)$.

### Survival Function for the RSBR Server

Under the RSBR generalization, the survival function of the server is given in the following theorem.

Theorem 1. (Survival Function for a Random Stress Server)
Suppose that jobs arrive to a server according to a nonhomogenous Poisson process
$\{N(t), t \geq 0\}$ with intensity function $\lambda(t) \geq 0$ and $m(t) \equiv E[N(t)] = \int_{0}^{t}\lambda(x)dx$. Let the arrival times $\{T_{j}\}_{j=1}^{N(t)}$ be independent, and let the service times $\{W_{j}\}_{j=1}^{N(t)} g_{W}(w)$ be i.i.d. and mutually independent of all arrival times. Assume the random job stresses $\mathcal{H}_{j} \sim \mathcal{H}$. Then

$$S_{Y}(t) = \bar{F}_{0}(t)\exp\left(- E_{\mathcal{H}}\left[\mathcal{H}\int_{0}^{t}e^{-\mathcal{H} w}m(t-w)\bar{G}_{W}(w)dw\right]\right)$$

where $\bar{F}_{0}(t) = \exp\left(-\int_{0}^{t}r_{0}(s)ds\right)$.

The compound failure rate function (or hazard function) $r(t)$ is given by

$$r(t) = -\frac{d}{dt}\ln(S_{Y}(t)) = r_{0}(t) + E_{\mathcal{H}}\left[\mathcal{H}\int_{0}^{t}e^{-\mathcal{H} w}m(t-w)\bar{G}_{W}(w)dw\right]$$

See [14] or [21].

## Efficiency Measure of a Server under RSBR

Upon server crash, the server must be rebooted. This section gives the server efficiency as defined in [9], but for the RSBR model. The server efficiency given in Theorem 1 holds quite a few similarities to that of [9], except that, like Theorem 1, the expectation must be taken over the random stress variable. The following assumptions from the original model are retained:

1. (E1) The arrival process after rebooting, $\{N^{rb}(t), t \geq 0\}$, remains a nonhomogenous Poisson process with the same intensity function $\lambda(t), t \geq 0$ as before.
2. (E2) $\{N^{rb}(t), t \geq 0\}$ is independent of the arrival process of client requests before rebooting. Hence, $\{N^{rb}(t), t \geq 0\} = \{N(t), t \geq 0\}$, since it retains all the same characteristics as before.
3. (E3) The time to reboot the server follows a continuous distribution $H(t)$ with mean $\nu$.

Recall that $M(t)$ is defined as the total number of jobs completed by the server during the time $(0,t]$. Also, recall the definition of server efficiency from [9]:

$$\psi \equiv \lim\limits_{t \to \infty}\frac{E[M(t)]}{t}$$

The efficiency of the server under a random stress environment is given in the following theorem.

Theorem (Server Efficiency under Random Stress Environment)
Suppose that $\{N(t): t \geq 0\}$ is a nonhomogenous Poisson process with intensity function $\lambda(t), t \geq 0$. Suppose also the conditions of Theorem 1 and the conditions (E1)-(E3) are met. Then the efficiency of the server is given by

\begin{aligned}\psi &= \frac{1}{\int_{0}^{\infty}S_{Y}(t)dt + \nu}\\&\kern{2em}\times\left\{\int_{0}^{\infty}e^{-\int_{0}^{t}r_{0}(x)-\int_{0}^{t}\lambda(x)dx+E_{\mathcal{H}}[a(t)+b(t)]}(r_{0}(t)E_{\mathcal{H}}[a(t)] + E_{\mathcal{H}}[\mathcal{H} a(t)b(t)])dt \right\}\end{aligned}

where

$$a(t) = \int_{0}^{t}e^{-\mathcal{H} v}g_{W}(v)m(t-v)dv$$

and

$$b(t) = \int_{0}^{t}e^{-\mathcal{H}(t-r)}\bar{G}_{W}(t-r)\lambda(r)dr.$$

## Remarks and Implications

Note that $\mathcal{H}$ was assumed discrete, but the proofs of Theorems 1 and 2 are unaffected if $\mathcal{H}$ is continuous. Thus the generality of this model is significantly stronger than in [9]. In practical considerations, these integrals will need to be numerically evaluated.

For certain distributions of $\mathcal{H}$, $S_{Y}(t)$ has a fairly compact form. Subsequent installments will examine the case where $\mathcal{H}$ has a binomial distribution, formed from both independent and dependent Bernoulli trials. We will also explore the effects of various service life distributions on $\psi$.

## Conclusion

This paper has given a simple but large generalization of [9]. The random stress model allows for the reality that we likely will not know the stress a request will bring to the server, but may know its distribution. Subsequent installments will further study the properties of the RSBR model and extend it to other single-server models, and networks.

Vertical Dependency in Sequences of Categorical Random Variables

## Abstract

This paper develops a more general theory of sequences of dependent categorical random variables, extending the works of Korzeniowski (2013) and Traylor (2017) that studied first-kind dependency in sequences of Bernoulli and categorical random variables, respectively. A more natural form of dependency, sequential dependency, is defined and shown to retain the property of identically distributed but dependent elements in the sequence. The cross-covariance of sequentially dependent categorical random variables is proven to decrease exponentially in the dependency coefficient $\delta$ as the distance between the variables in the sequence increases. We then generalize the notion of vertical dependency to describe the relationship between a categorical random variable in a sequence and its predecessors, and define a class of generating functions for such dependency structures. The main result of the paper is that any sequence of dependent categorical random variables generated from a function in the class $\mathscr{C}_{\delta}$ that is dependency continuous yields identically distributed but dependent random variables. Finally, a graphical interpretation is given and several examples from the generalized vertical dependency class are illustrated.

## Introduction

Many statistical tools and distributions rely on the independence of a sequence or set of random variables. The sum of independent Bernoulli random variables yields a binomial random variable. More generally, the sum of independent categorical random variables yields a multinomial random variable. Independent Bernoulli trials also form the basis of the geometric and negative binomial distributions, though the focus is on the number of failures before the first (or rth success). [2] In data science, linear regression relies on independent and identically distributed (i.i.d.) error terms, just to name a few examples.

The necessity of independence filters throughout statistics and data science, although real data rarely is actually independent. Transformations of data to reduce multicollinearity (such as principal component analysis) are commonly used before applying predictive models that require assumptions of independence. This paper aims to continue building a formal foundation of dependency among sequences of random variables in order to extend these into generalized distributions that do not rely on mutual independence in order to better model the complex nature of real data. We build on the works of Korzeniowski [1] and Traylor [3] who both studied first-kind (FK) dependence for Bernoulli and categorical random variables, respectively, in order to define a general class of functions that generate dependent sequences of categorical random variables.

The next section gives a brief review of the original work by Korzeniowski [1] and Traylor [3]. In Section 2, a new dependency structure, sequential dependency is introduced, and the cross-covariance matrix for two sequentially dependent categorical random variables is derived. Sequentially dependent categorical random variables are identically distributed but dependent. Section 3.1 generalized the notion of vertical dependency structures into a class that encapsulates both the first-kind (FK) dependence of Korzeniowski [1] and Traylor [3] and shows that all such sequences of dependent random variables are identically distributed. We also provide a graphical interpretation and illustrations of several examples of vertical dependency structures.

We repeat a section from A Generalized Multinomial Distribution from Dependent Categorical Random Variables  in order to give a review of the original first-kind (FK) dependency created by Korzeniowski. For the full details, see the previous work on the topic.

Korzeniowski defined the notion of dependence in a way we will refer to here as dependence of the first kind (FK dependence). Suppose $(\epsilon_{1},...,\epsilon_{N})$ is a sequence of Bernoulli random variables, and $P(\epsilon_{1} = 1) = p$. Then, for $\epsilon_{i}, i \geq 2$, we weight the probability of each binary outcome toward the outcome of $\epsilon_{1}$, adjusting the probabilities of the remaining outcomes accordingly.

Formally, let $0 \leq \delta \leq 1$, and $q = 1-p$. Then define the following quantities
\begin{aligned}p^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 1) = p + \delta q &\qquad p^{-} :=P(\epsilon_{i} = 0 | \epsilon_{1} = 1) = q -\delta q\\q^{+} := P(\epsilon_{i} = 1 | \epsilon_{1} = 0) = p-\delta p&\qquad q^{-} := P(\epsilon_{i} = 0 | \epsilon_{1} = 0) = q + \delta p\end{aligned}

Given the outcome $i$ of $\epsilon_{1}$, the probability of outcome $i$ occurring in the subsequent Bernoulli variables $\epsilon_{2}, \epsilon_{3},..., \epsilon_{n}$ is $p^{+}, i = 1$ or $q^{+}, i=0$. The probability of the opposite outcome is then decreased to $q^{-}$ and $p^{-}$, respectively.

The figure above illustrates the possible outcomes of a sequence of such dependent Bernoulli variables. Korzeniowski showed that, despite this conditional dependency, $P(\epsilon_{i} = 1) = p \quad\forall i$. That is, the sequence of Bernoulli variables is identically distributed, with correlation shown to be
$$\text{Cor}(\epsilon_{i}, \epsilon_{j}) = \left\{\begin{array}{lr}\delta, & i=1 \\\delta^{2}, &i \neq j, \quad i,j \geq 2\end{array}\right.$$

These identically distributed but correlated Bernoulli random variables yield a Generalized Binomial distribution with a similar form to the standard binomial distribution.

In the previous work, the concept of Bernoulli FK dependence was extended to categorical random variables. That is, given a sequence of categorical random variables with $K$ categories, $P(\epsilon_{1} = i) = p_{i}$, $i = 1,\ldots,K,$ \begin{aligned}P(\epsilon_{j} = i | \epsilon_{1} = i) &= p_{i}^{+} = p_{i} + \delta(1-p_{i});\\P(\epsilon_{j} = k | \epsilon_{1} = i) &= p_{k}^{-} = p_{k}-\delta p_{k}, i \neq k, \:\:k = 1,\ldots,K.\end{aligned}

Traylor proved that FK dependent categorical random variables remained identically distributed, and showed that the cross-covariance matrix of categorical random variables has the same structure as the correlation between FK dependent Bernoulli random variables. In addition, the concept of a generalized binomial distribution was extended to a generalized multinomial distribution.

In the next section, we will explore a different type of dependency structure, sequential dependency.

## Sequentially Dependent Random Variables

While FK dependence yielded some interesting results, a more realistic type of dependence is sequential dependence, where the outcome of a categorical random variable depends with coefficient δ on the outcome of the variable immediately preceeding it in the sequence. Put formally, if we let $\mathcal{F}_{n} = \{\epsilon_{1},\ldots,\epsilon_{n-1}\}$, then $P(\epsilon_{n}|\epsilon_{1},\ldots,\epsilon_{n-1}) = P(\epsilon_{n}|\epsilon_{n-1}) \neq P(\epsilon_{n})$. That is, $\epsilon_{n}$ only has direct dependence on the previous variable $\epsilon_{n-1}$. We keep the same weighting as for FK-dependence. That is,

$$\begin{array}{lr}P(\epsilon_{n} = j | \epsilon_{n-1} = j) = p_{j}^{+} = p_{j} + \delta(1-p_{j}),&\\ P(\epsilon_{n} = j | \epsilon_{n-1} = i) = p_{j}^{-} = p_{j}-\delta p_{j}, &\:\: j = 1,\ldots,K;\:\:i \neq j\end{array}$$ As a comparison, for FK dependence, $P(\epsilon_{n}|\epsilon_{1},\ldots,\epsilon_{n-1}) = P(\epsilon_{n}|\epsilon_{1}) \neq P(\epsilon_{n})$. That is, $\epsilon_{n}$ only has direct dependence on $\epsilon_{1}$, and

$$\begin{array}{lr}P(\epsilon_{n} = j | \epsilon_{1} = j) = p_{j}^{+} = p_{j} + \delta(1-p_{j}),&\\ P(\epsilon_{n} = j | \epsilon_{1} = i) = p_{j}^{-} = p_{j}-\delta p_{j},&\:\:j = 1,\ldots,K;\:\: i \neq j\end{array}$$ Let $\epsilon = (\epsilon_{1},\ldots,\epsilon_{n})$ be a sequence of categorical random variables of length n (either independent or dependent) where the number of categories for all $\epsilon_{i}$ is K. Denote $\Omega_{n}^{K}$ as the sample space of this random sequence. For example,

$$\Omega_{3}^{3} = \{(1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2),\ldots,(3,3,1),(3,3,2),(3,3,3)\}$$

Dependency structures like FK-dependence and sequential dependence change the probability of a sequence $\epsilon$ of length n taking a particular $\omega = (\omega_{1},\ldots,\omega_{n}) \in \Omega_{n}^{K}$. The probability of a particular $\omega \in \Omega_{n}^{K}$ is given by the dependency structure. For example, if the variables are independent, $P((1,2,1)) = p_{1}^{2}p_{2}$. Under FK-dependence, $P((1,2,1)) = p_{1}p_{2}^{-}p_{1}^{+}$, and under sequential dependence, $P((1,2,1)) = p_{1}p_{2}^{-}p_{1}^{-}$. See Figures 2 and 3 for a comparison of the probability mass flows of sequential dependence and FK dependence. Sequentially dependent sequences of categorical random variables remain identically distributed but dependent, just like FK-dependent sequences. That is,

Lemma 1

Let $\epsilon = (\epsilon_{1},\ldots,\epsilon_{n})$ be a sequentially dependent categorical sequence of length n with K categories. Then

$$P(\epsilon_{j} = i) = p_{i}; \qquad i = 1,\ldots,K;\:\:j = 1,\ldots,n;\:\: n \in \mathbb{N}$$

## Cross-Covariance Matrix

The $K \times K$ cross-covariance matrix $\Lambda^{m,n}$ of $\epsilon_{m}$ and $\epsilon_{n}$ in a sequentially dependent categorical sequence has entries $\Lambda_{i,j}^{m,n}$ that give the cross-covariance $\text{Cov}([\epsilon_{m} = i]$, $[\epsilon_{n} = j])$, where $[\cdot]$ denotes an Iverson bracket. In the FK-dependent case, the entries of $\Lambda^{m,n}$ are given by

$$\Lambda^{1,n}_{ij}=\left\{\begin{array}{lc}\delta p_{i}(1-p_{i}), & i = j \\-\delta p_{i}p_{j}, & i \neq j,\end{array}\right.\quad\,n \geq 2,$$

and

$$\Lambda^{m,n}_{ij}=\left\{\begin{array}{lc}\delta^{2}p_{i}(1-p_{i}), & i = j \\-\delta^{2}p_{i}p_{j}, & i \neq j,\end{array}\right. \quad\,n > m; m\neq 1$$

Thus, the cross covariance between any two $\epsilon_{m}$ and $\epsilon_{n}$ in the FK-dependent case is never smaller than δ 2 times the independent cross-covariance. In the sequentially dependent case, the cross-covariances of $\epsilon_{m}$ and $\epsilon_{n}$ decrease in powers of δ as the distance between the two variables in the sequence increases.

Theorem 1 (Cross-Covariance of Dependent Categorical Random Variables).

Denote $\Lambda^{m,n}$ as the $K \times K$ cross-covariance matrix of $\epsilon_{m}$ and $\epsilon_{n}$ in a sequentially dependent sequence of categorical random variables of length N, $m \leq n$, and $n \leq N$, defined as $\Lambda^{m,n} = E[(\epsilon_{m} - E[\epsilon_{m}])(\epsilon_{n} - E[\epsilon_{n}])]$. Then the entries of the matrix are given by

$$\Lambda^{m,n}_{ij} = \left\{\begin{array}{lc}\delta^{n-m}p_{i}(1-p_{i}), & i = j \\-\delta^{n-m} p_{i}p_{j}, & i \neq j.\end{array}\right.$$

The pairwise covariance between two Bernoulli variables in a sequentially dependent sequence is given in the following corollary

Corollary 1

Denote $P(\epsilon_{i} = 1) = p$; $i = 1,\ldots,n$, and let $q=1-p$. Under sequential dependence,
$$\text{Cov}(\epsilon_{m}, \epsilon_{n}) = pq\delta^{n-m}.$$

We give some examples to illustrate.

Example 1 (Bernoulli Random Variables).

If we want to find the covariance between $\epsilon_{2}$ and $\epsilon_{3}$, then we note that the set $S = \{\omega \in \Omega_{3}^{2}\,:\,\omega_{2} = 1, \omega_{3} = 1\}$ is given by $S = \{(1,1,1),(0,1,1)\}$. $P(\epsilon_{2} = 1, \epsilon_{3} = 1) = P(S)$. Thus,

$$\begin{array}{rl}\text{Cov}(\epsilon_{2},\epsilon_{3})&=P(\epsilon_{2} = 1, \epsilon_{3} = 1)-P(\epsilon_{2} = 1)P(\epsilon_{3} = 1)\\&=pp^{+}p^{+} + qp^{-}p^{+}-p^{2}\\&= p^{+}(pp^{+} + qp^{-})-p^{2}\\&= p(p^{+}-p)\\&= pq\delta\end{array}$$

Similarly,
$$\begin{array}{rl}\text{Cov}(\epsilon_{1}, \epsilon_{3}) &= P(\epsilon_{1} = 1, \epsilon_{3} = 1)-P(\epsilon_{1} = 1)P(\epsilon_{3} = 1)\\&= (pp^{+}p^{+} + pq^{-}p^{-})-p^{2} \\&= p((p+\delta q)^{2} + pq(1-\delta)^{2})-p^{2}\\&= p(p^{2} + pq+\delta^{2}q^{2} + \delta^{2}pq)-p^{2}\\&= p(p(p+q) + \delta^{2}q(q+p)- p^{2} \\&= p(p + \delta^{2}q)-p^{2}\\&= pq\delta^{2}\end{array}$$

Example 2 (Categorical Random Variables).

Suppose we have a sequence of categorical random variables, where K = 3. Then
$[\epsilon_{m} = i]$ is the Bernoulli random variable with the binary outcome of 1 if $\epsilon_{m} = i$ and 0 if not. Thus,

$$\text{Cov}([\epsilon_{m} = i], [\epsilon_{n} = j]) = P(\epsilon_{m} = i \vee \epsilon_{n}= j)-P(\epsilon_{m} = i)P(\epsilon_{n} = j).$$

We have shown that every $\epsilon_{n}$ in the sequence is identically distributed, so $P(\epsilon_{m} = i) = p_{i}$ and $P(\epsilon_{n} = j) = p_{j}$.

$$\begin{array}{rl}\text{Cov}([\epsilon_{2} = 1], [\epsilon_{3} = 1]) &=(p_{1}p_{1}^{+}p_{1}^{+} + p_{2}p_{1}^{-}p_{1}^{+} + p_{3}p_{1}^{-}p{1}^{+})-p_{1}^{2} \\&= p_{1}^{+}(p_{1}p_{1}^{+} + p_{2}p_{1}^{-} + p_{3}p_{1}^{-})-p_{1}^{2} \\&= p_{1}^{+}p_{1}-p_{1}^{2} \\&=\delta p_{1}(1-p_{1})\end{array}$$ $$\begin{array}{rl}\text{Cov}([\epsilon_{2} = 1], [\epsilon_{3} = 2]) &=(p_{1}p_{1}^{+}p_{2}^{-} + p_{2}p_{1}^{-}p_{2}^{-}+p_{3}p_{1}^{-}p_{2}^{-})-p_{1}p_{2} \\&= p_{2}^{-}(p_{1}p_{1}^{+} + p_{2}p_{1}^{-} + p_{3}p_{1}^{-})-p_{1}p_{2}\\&= p_{1}p_{2}^{-}-p_{1}p_{2} \\&=-\delta p_{1}p_{2}\end{array}$$

We may obtain the other entries of the matrix in a similar fashion. So, the cross-covariance matrix for a $\epsilon_{2}$ and $\epsilon_{3}$ with K = 3 categories is given by

$$\Lambda^{2,3}=\delta\begin{pmatrix}p_{1}(1-p_{1})&-p_{1}p_{2}&-p_{1}p_{3}\\-p_{1}p_{2} & p_{2}(1-p_{2}) &-p_{2}p_{3}\\-p_{1}p_{3} &-p_{2}p_{3} & p_{3}(1-p_{3})\end{pmatrix}$$

Note that if $\epsilon_{2}$ and $\epsilon_{3}$ are independent, then the cross-covariance matrix is all zeros, because δ = 0.

## Dependency Generators

Both FK-dependency and sequential dependency structures are particular examples of a class of vertical dependency structures. We denote the dependency of subsequent categorical random variables on a previous variable in the sequence as a vertical dependency. In this section, we define a class of functions that generate vertical dependency structures with the property that all the variables in the sequence are identically distributed but dependent.

### Vertical Dependency Generators Produce Identically Distributed Sequences

Define the set of functions

$$\mathscr{C}_{\delta} = \{\alpha: \mathbb{N}_{\geq 2} \to \mathbb{N} : \alpha(n) \geq 1 \wedge \alpha(n) < n\:\: \forall\:n\}.$$

The mapping defines a direct dependency of n on $\alpha(n)$, denoted $n \stackrel{\delta}{\rightsquigarrow} \alpha(n)$. We have already defined the notion of direct dependency, but we formalize it here.

Definition. (Direct Dependency)

Let $\epsilon_{m}$, $\epsilon_{n}$ be two categorical random variables in a sequence, where $m < n$. We say $\epsilon_{n}$ has a direct dependency on $\epsilon_{m}$, denoted $\epsilon_{n} \rightsquigarrow_{\delta} \epsilon_{m}$, if $P(\epsilon_{n}|\epsilon_{n-1},\ldots,\epsilon_{1}) = P(\epsilon_{n}|\epsilon_{m})$.

Example 3.
For FK-dependence, $\alpha(n) \equiv 1$. That is, for any $n, \epsilon_{n} \rightsquigarrow_{\delta} \epsilon_{1}$

Example 4.
The function $\alpha(n) = n-1$ generates the sequential dependency structure.

We now define the notion of dependency continuity.

Definition. (Dependency Continuity)
A function $\alpha: \mathbb{N}_{\geq 2} \to \mathbb{N}$ is dependency continuous if $\forall n \exists j \in \{1,...,n-1\}$ such that $\alpha(n) = j$

We require that the functions in $\mathscr{C}_{\delta}$ be dependency continuous. Thus, the formal definition of the class of dependency generators $\mathscr{C}_{\delta}$ is

Definition. (Dependency Generator)
We define the set of functions $\mathscr{C}_{\delta} = \{\alpha : \mathbb{N}_{\geq 2} \to \mathbb{N}\}$ such that

•  $\alpha(n) < n$
• $\forall n \exists j \in \{1,...,n-1\} : \alpha(n) = j$

and refer to this class as dependency generators.

Each function in $\mathscr{C}_{\delta}$ generates a unique dependency structure for sequences of dependent categorical random variables where the individual elements of the sequence remain identically distributed.

Theorem 2. (Identical Distribution of Sequences Generated by Dependency Generators)
Let $\alpha \in \mathscr{C}_{\delta}$. Then for any $n\in \mathbb{N}, n \geq 2$, the dependent categorical random sequence generated by $\alpha$ has identically distributed elements.

### Graphical Interpretation and Illustrations

We may visualize the dependency structures generated by $\alpha \in \mathscr{C}_{\delta}$ via directed dependency graphs. Each $\epsilon_{n}$ represents a vertex in the graph, and a directed edge connects $n$ to $\alpha(n)$ to represent the direct dependency generated by $\alpha$. This section illustrates some examples and gives a graphical interpretation of the result in Theorem 2.

First-Kind Dependency

For FK-dependency, $\alpha(n) \equiv 1$ generates the graph above. Each $\epsilon_{i}$ depends directly on $\epsilon_{1}$, and thus we see no connections between any other vertices $i, j$ where $j \neq 1$. There are $n-1$ separate subsequences of length 2 in this graph.

Sequential Dependency

$\alpha(n) = n-1$ generates the sequential dependency structure this work has studied in detail. We can see that a path exists from any $n$ back to 1. This  is a visual way to see the result of Theorem2,  in that if a path exists from any $n$ back to 1, then the variables in that path must be identically distributed. Here, there is only one sequence and no subsequences.

A Monotonic Example

Here, $\alpha(n) = \lfloor \sqrt{n}\rfloor$ gives an example of a monotonic function in $\mathscr{C}_{\delta}$ and the dependency structure it generates. Again, note that any $n$ has a path to 1, and the number of subsequences is between 1 and $n-1$.

A Nonmonotonic Example

Here, $\alpha(n)=\left\lfloor\frac{\sqrt{n}}{2}\left(\sin(n)\right)+\frac{n}{2}\right\rfloor$ illustrates a more contrived example where the function is nonmonotonic. It is neither increasing nor decreasing. The previous examples have all been nondecreasing.

A Prime Example

Let $\alpha \in \mathscr{C}_{\delta}$ be defined in the following way. Let $p_{m}$ be the $m$th prime number, and let $\{kp_{m}\}$ be the set of all postive integer multiples of $p_{m}$. Then the set $\mathscr{P}_{m} = \{kp_{m}\} \setminus \cup_{i=1}^{m-1}\{kp_{i}\}$ gives a disjoint partitioning of $\mathbb{N}$. That is, $\mathbb{N} = \sqcup_{m}\mathscr{P}_{m}$, and thus for any $n \in \mathbb{N}, n \in \mathscr{P}_{m}$ for exactly one $m$. Now let $\alpha(n) = m$. Thus, the function is well-defined. We may write $\alpha: \mathbb{N}_{\geq 2} \to \mathbb{N}$ as $\alpha[\mathscr{P}_{m}] = \{m\}$. As an illustration,

\begin{aligned}\alpha[\{2k\}] &= \{1\} \\\alpha[\{3k\}\setminus \{2k\}] &= \{2\} \\\alpha[\{5k\}\setminus (\{2k\}\cup \{3k\})] &= \{3\} \\&\qquad\vdots\\\alpha[\{kp_{m}\}\setminus (\cup_{i=1}^{m-1}\{kp_{i}\})] &= \{m\}\end{aligned}

## Conclusion

This paper has extended the concept of dependent random sequences first put forth in the works of Korzeniowski [1] and Traylor [3] and developed a generalized class of vertical dependency structures. The sequential dependency structure was studied extensively, and a formula for the cross-covariance obtained. The class of dependency generators was defined and shown to always produce a unique dependency structure for any $\alpha \in \mathscr{C}_{\delta}$ in which the a sequence of categorical random variables under that $\alpha$ is identically distributed but dependent. We provided a graphical interpretation of this class, and illustrated with several key examples.

## References

1. Andrzej Korzeniowski. On correlated random graphs. Journal of Probability and Statistical Science, pages 43–58, 2013.
2. Joseph McKean, Robert Hogg, and Allen Craig. Introduction to Mathematical Statistics. Prentice Hall, 6 edition.
3. Rachel Traylor. A generalized multinomial distribution from dependent categorical random variables. Academic Advances of the CTO, 2017.
Summation Chains of Sequences Part 3: Sequence Chains from Linear Functions

## Summation Chains of Sequences Part 3: Sequence Chains from Linear Functions

### Abstract

(Editor’s note:) This paper represents the third installment of a masters thesis by Jonathan Johnson. The first two can be found here and here. This paper continues the development of the theory of summation chains of sequences. Since summation chains are doubly infinite, it’s important to know how little information we actually need to define a chain. The linearity of the function rules that generates a summation chain helps to answer this question. The notion of uniquely completable is defined from the set of positions, and several important theorems are developed to determine when a set of positions is uniquely completeable.

### Introduction

(Editor’s note:) In Part 2, Johnson notes that summation chains can be generated by the function rule $T_{\Sigma}: M^{\infty} \to M^{\infty}$,
$$T_{\Sigma}(x_{1}, x_{2}, x_{3},\ldots) = (x_{1}, x_{1} + x_{2}, x_{1} + x_{2} + x_{3},\ldots).$$ where $M^{\infty}$ is a $\mathbb{Z}$-Module. This is a formal way to define the operator that generates the partial sums of a given sequence. Johnson proves in this installment that this function rule is a linear operator. The linearity of these function operators assists in determining how little information we need to know about the chain to be able to uniquely define it.

Johnson introduces the notion of the set of positions, which becomes the smallest amount of information that can define a summation chain under certain conditions.

### Chains Generated by Linear Functions

Since vector-valued chains are vector-valued functions, they inherit the scalar multiplication and addition operations defined for functions on vector spaces. The linearity of the summation chain function rule leads to many interesting and useful results. The first of these is the closure of the set of summation chains under linear operations. This result holds for any function rule that is a linear operator.

#### Lemma.

Let M be a an 1-dimensional vector space with scalar field F. The summation function rule $T_{\Sigma}$ defined in Remark 3.1 of [6] is a linear operator on $M^{\infty}$.

#### Proposition.

Let $T:V\to V$ be a function rule where V is a vector space with scalar field F.
$\mathcal{C}_T$ is closed under scalar multiplication and addition if and only if T is a linear operator.

#### Corollary 1.

Let $T:V\to V$ be a bijective linear operator on vector space V with scalar field F, then $\mathcal{C}_T$ is a subspace of the space of all functions from $\mathbb{Z}$ to V, $V^{\mathbb{Z}}$, $\Phi_{\Sigma[M]}$ is an isomorphism, and $\mathcal{C}_T\cong V$.

#### Corollary 2.

Let M be a an 1-dimensional vector space with scalar field F.  $\mathcal{C}_{\Sigma[M]}$ is a vector space, and $\Phi_{\Sigma[M]}$ is an isomorphism from $M^{\infty}$ to $\mathcal{C}_{\Sigma[M]}$.

### Sets of Positions and Completeability

Viewing the summation operator as a linear operator is useful for answering the question on how little information must be known to define a chain.

Definition. A set of positions, U, is a nonempty subset of $\mathbb{Z}\times\mathbb{N}$. Let $N\in\mathbb{N}, U_N=\{(m,n)\in U:n\leq N\}$

Definition. Let M be an 1-dimensional vector space with scalar field F. Let $T:M^{\infty}\to M^{\infty}$ be a bijective linear operator. A set of positions, U, is (uniquely) completable for if for every function $d:U\to M$ there exists a (unique) T-chain, c, such that $c(m,n)=d(m,n)$ for all $(m,n)\in U$.

To say that a set of positions U is uniquely compleatable for T is the same as saying that every T-chain can be defined by defining each of the positions in U.

#### Lemma

Let M be an 1-dimensional vector space with scalar field F. Let $T:M^{\infty}\to M^{\infty}$ be a bijective linear operator. Let U be a set of positions, U is completable for T if and only if every nonempty subset of U is completable for T.

#### Lemma

Let M be an 1-dimensional vector space with scalar field F. Let $T:M^{\infty}\to M^{\infty}$ be a bijective linear operator. Let U be a set of positions, and let V be a set of positions containing U. If U is uniquely completable for T and V is completable for T, then V is uniquely completable for T.

How much information is needed to define a summation chain? The summation function rule has the additional property that it can be represented by an infinite-dimensional lower-triangular matrix. The question of how much information is need to define a summation chain will be addressed by studying matrices of this form. First, an ordering of the elements of $\mathbb{Z}\times\mathbb{N}$ is defined in order to create a consistent indexing of sets of positions.

Definition. Let $(m_1,n_1)$ and $(m_2,n_2)$ be in $\mathbb{Z}\times\mathbb{N}$, define $(m_1,n_1)\leq(m_2,n_2)$ to be the dictionary ordering $n_1 or both $n_1=n_2$ and $m_1\leq m_2$.

Remark. This definition defines a total order on $\mathbb{Z}\times\mathbb{N}$.[7]

#### Lemma

Let U be a set of positions such that $|U_N|$ is finite for all natural numbers, N, then U is well-ordered by the ordering in the above definition.

For the remainder of this section, given a set of positions, U, the elements U are assumed to be indexed by the well-ordering induced by the above definition, and $(m_i,n_i)=u_i$ for all $u_i\in U$. Given a vector space M, a linear function rule $T:M^{\infty}\to M^{\infty}$, and a set of positions U, a linear function on $M^{\infty}$ can be defined that maps a sequence $x\in M^{\infty}$ to that values of the positions in U for $\Phi_T(x)$.

Definition. Let M be an 1-dimensional vector space with scalar field F. Let $T:M^{\infty}\to M^{\infty}$ be a bijective linear operator with a lower-triangular matrix. Let U be a set of positions such that the cardinality of $U_N$ is finite for all natural numbers, N. The completion function for T with  is defined as follows.$S^T_U:M^{\infty}\to M^{|U|}$ where for all $x\in M^{\infty}$,

$$S^T_U(x)=[(T^{m_i}(x))_{n_i}]^{|U|}_{i=1}$$

The remaining results in this section show how to use completion functions to determine the completeability of sets of positions.

#### Proposition

Let M be an 1-dimensional vector space with scalar field F. Let $T:M^{\infty}\to M^{\infty}$ be a bijective linear operator with a lower-triangular matrix. Let U be a set of positions such that the cardinality of $U_N$ is finite for all natural numbers, N.

1. U is completable for T if and only if $S^T_U$ is surjective.
2. U is uniquely completable for T if and only if $S^T_U$ is bijective.

Solving completion problems utilizes many properties of infinite matrices that can be referenced in Cooke [4]. The next proposition addresses sets of positions for which a completion function is not defined.

#### Proposition

Let M be an 1-dimensional vector space with scalar field F. Let U be a set of positions such that for a natural number, N, $N<|U_N|$, then U is not completable for any bijective linear operator $T:M^{\infty}\to M^{\infty}$ with a lower-triangular matrix.

#### Proposition

Let M be an 1-dimensional vector space with scalar field F. Let U be a set of positions such that for a natural number, N, $N = |U_N|$, then U is uniquely completable for any bijective linear operator $T:M^{\infty}\to M^{\infty}$ with a lower-triangular matrix.

### Examples of Determining Completeability

Remark. Let S, T be linear transformations from vector space V to vector space W, and let $P:W\to W$ be a bijective linear operator such that $S=PV$.
$$\dim\text{Im}(S)=\dim\text{Im}(T)$$ and S is bijective if and only if T is bijective.[4, 8]

Example 1.
$$\begin{array}{c|cccccccc}m\backslash n& 1 & 2 & 3 & 4 & 5 & 6 & 7 & \cdots\\\vdots & & & & & & &\\2& & & & & & &\\1 & & & \bigcirc & & \bigcirc & & \bigcirc\\0& & & \bigcirc & & \bigcirc & & \bigcirc\\-1& & & & & & &\\\vdots & & & & & & &\end{array}$$

Let $T:\mathbb{R}^{\infty}\to\mathbb{R}^{\infty}$ where for $(x_n)\in\mathbb{R}^{\infty}$, $T(x_n)=(y_n)$ defined as follows:
$$y_n= \left\{\begin{array}{ll}x_1& \text{ if } n=1\\x_3-x_1&\text{ if } n=3\\x_n&\text{ if } n\text{ is even}\\x_n-x_{n-3}&\text{ if } n>4 \text{ and } n \text{ is odd}\\\end{array}\right.$$ Let $U=\{(0,n),(1,n):n\in\mathbb{N}, n>2, n \text{ is odd }\}$.
$$\mathcal{M}(S^T_U)=\left( \begin{array}{cccccccc}0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\1 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 1 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 0 & 0 & 1 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & 1 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\end{array}\right)$$

Let $A$ be the matrix row-equivalent to $\mathcal{M}(S^T_U)$ defined as follows.
\begin{aligned}A&=\left( \begin{array}{ccccccccc}-1 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0& -1 & 1 & 0 & 0 & \cdots \\1 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 0 & -1 & 1 & \cdots \\0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\end{array}\right)\left( \begin{array}{cccccccc}0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\1 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 1 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 0 & 0 & 1 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & 1 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \end{array}\right)\\&=Id\\&=\left( \begin{array}{cccccccc}0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\1 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 1 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 0 & 0 & 1 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & 1 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\end{array}\right)\left( \begin{array}{ccccccccc}-1 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & -1 & 1 & 0 & 0 & \cdots \\1 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 0 & -1 & 1 & \cdots \\0 & 0 & 1 & 0 & 0 & 0 & 0 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots &\vdots\end{array}\right)\\\end{aligned} $S^T_U$ has an inverse so $S^T_U$ is bijective.Therefore, it is uniquely completable for T.

Example 2.
$$\begin{array}{c|cccccccc}m\backslash n& 1 & 2 & 3 & 4 & 5 & 6 & 7 & \cdots\\\vdots & & & & & & &\\2& & & & & & &\\1 & & \bigcirc &\bigcirc & & & & & \\0 & & & & \bigcirc & & & &\\-1& & & & \bigcirc & & & &\\2& & & & & & &\\\vdots & & & & & & &\end{array}$$

Let T be the summation operator on $\mathbb{R}^{\infty}$ with a set of positions $U=\{(1,2),(1,3),(-1,4),(0,4)\}$.

$$\mathcal{M}(S^T_U)=\left(\begin{array}{cccccccc}1 & 1 & 0 & 0 & 0 & 0 & \cdots\\1 & 1 & 1 & 0 & 0 & 0 & \cdots\\0 & 0 & -1 & 1 & 0 & 0 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & \cdots\end{array}\right)$$

Let $A$ be the matrix row-equivalent to $\mathcal{M}(S^T_U)$ defined as follows.

\begin{aligned}A&=\left(\begin{array}{cccc}1 & 0 & 0 & 0 \\-1 & 1 & 1 & -1 \\0 & 0 & -1 & 1 \\0 & 0 & 0 & 1 \end{array}\right)\left(\begin{array}{cccccccc}1 & 1 & 0 & 0 & 0 & 0 & \cdots\\1 & 1 & 1 & 0 & 0 & 0 & \cdots\\0 & 0 & -1 & 1 & 0 & 0 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & \cdots\end{array}\right)\\&=\left(\begin{array}{cccccccc}1 & 1 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 1 & 0 & 0 & 0 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & \cdots\end{array}\right)\end{aligned} $\dim\text{Im}(S^T_U)=\text{rank}(A)=3<4$ so $S^T_U$ is not surjective. Therefore, U is not completable for T.

Example 3.
$$\begin{array}{c|cccccccc}m\backslash n& 1 & 2 & 3 & 4 & 5 & 6 & 7 & \cdots\\\vdots & & & & & & &\\4& & & & & & & &\\3& & & & & &\bigcirc & &\\2 & & & & \bigcirc & & & &\\1 & & \bigcirc & & & & & & \\0&&&&&& & &\\-1 & & \bigcirc & & & & & &\\-2 & & & & \bigcirc & & & &\\-3& & & & & &\bigcirc & &\\-4& & & & & & & &\\\vdots & & & & & & &\end{array}$$

Let T be the summation operator on $\mathbb{R}^{\infty}$ with a set of positions $U=\{(n,2n),(-n,2n):n\in\mathbb{N}\}$.

$$\mathcal{M}(S^T_U)=\left(\begin{array}{ccccccccccc}1 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 & \cdots\\-1 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 & \cdots\\4 & 3 & 2 & 1 & 0 & 0 & 0 &\cdots & 0 & 0 & \cdots\\0 & 2 & -2 & 1 & 0 & 0 & 0 & \cdots & 0 & 0 & \cdots\\21 & 15 & 10 & 6 & 3 & 1 & 0 & \cdots & 0 & 0 & \cdots\\0 & 0 & -1 & 3 & -3 & 1 & 0 & \cdots & 0 & 0 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots & \\& & & & & & & \cdots & n & 1 & \cdots\\& & & & & & & \cdots & -n & 1 & \cdots\\& & & & & & & & \vdots & \vdots &\end{array}\right)\begin{array}{l}\\\\\\\\\\\\\\\leftarrow \text{ row } 2n-1\\\leftarrow\text{ row }2n\\\\\end{array}$$

Let $A$ be the matrix row-equivalent to $\mathcal{M}(S^T_U)$ defined as follows.

\begin{aligned}A&=\left(\begin{array}{cccccccc}1 & -1 & 0 & 0 & 0 & 0 & \cdots\\0 & 1 & 0 & 0 & 0 & 0 & \cdots\\0 & 0 & 1 & -1 & 0 & 0 & \cdots\\0 & 0 & 0 & 1 & 0 & 0 & \cdots\\0 & 0 & 0 & 0 & 1 & -1 & \cdots\\0 & 0 & 0 & 0 & 0 & 1 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots\end{array}\right)\mathcal{M}(S^T_U)\\&=\left( \begin{array}{ccccccccccc}2 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 & \cdots\\-1 & 1 & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 & \cdots\\4 & 1 & 4 & 0 & 0 & 0 & 0 &\cdots & 0 & 0 & \cdots\\0 & 2 & -2 & 1 & 0 & 0 & 0 & \cdots & 0 & 0 & \cdots\\21 & 15 & 11 & 3 & 6 & 0 & 0 & \cdots & 0 & 0 & \cdots\\0 & 0 & -1 & 3 & -3 & 1 & 0 & \cdots & 0 & 0 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots & \\& & & & & & & \cdots & 2n & 0 & \cdots\\& & & & & & & \cdots & -n & 1 & \cdots\\& & & & & & & & \vdots & \vdots & \end{array}\right)\\\end{aligned}

Since $A$ is lower-triangular with no zeros on the diagonal, $A$ is the matrix of a bijective linear operator. Since $A$ is equivalent to $\mathcal{M}(S^T_U)$, $S^T_U$ is bijective.
Therefore, $U$ is uniquely completable for $T$.

### Conclusion

(Editor’s Note): This installment examined sequence chains from linear functions with the goal of looking for the minimum amount of information necessary to uniquely define a summation chain. The set of positions was defined, and the notion of uniquely completable is based on this definition. Several elegant theorems gave the conditions under which the set of positions is uniquely completable. The fourth and final installment will discuss the notion of convergence for complex-valued summation chains.

### References

1. Apostol, T., 1974, Mathematical Analysis 2nd Edition, Addison-Wesley Publishing Company
2. Brin, M. and Stuck, G., 2002, Introduction to Dynamical Systems, Cambridge University Press
3. Cameron, P. J., 1992, Combinatorics: Topics, Techniques, Algorithms, Cambridge University Press
4. Cook, R. G., 2014, Infinite Matrices and Sequence Spaces, Dover Publications, Inc.
5. Johnson, J. (2017) Summation Chains of Sequences Part 1: Introduction, Generation, and Key Definitions. AACTO, Vol. 2, Issue 2.
6. Johnson, J. (2017) Summation Chains of Sequences Part 2: Relationships Between Sequences via Summation Chains AACTO, Vol. 2, Issue 2.
7. Lee, J., 2011, Introduction to Topological Manifolds, Springer Science+Business Media LLC
8. Leon, S. J., 2002, Linear Algebra with Applications 6th Edition, Prentice Hall, Inc.
9. Scheinerman, E. R., 1996, Invitation to Dynamical Systems, Dover Publications, Inc.
Summation Chains of Sequences Part 2: Relationships between Sequences via Summation Chains

## Summation Chains of Sequences Part 2: Relationships between Sequences via Summation Chains

### Abstract

Editor’s note: This paper represents the second installment of a masters thesis by Jonathan Johnson. This paper continues the development of the theory of summation chains of sequences. The concept of sum-related is defined: two sequences are sum-related if one sequence appears in the summation chain of the other. The main result is a theorem to determine if two given sequences are sum-related.

### Introduction

Editor’s note: Previously, the concept of a summation chain of sequences was defined. We reproduce the example and introduction here:

Given a complex-valued sequence $(a_n)^{\infty}_{n=1}$, the sequence of partial sums of $(a_n)$ is given by the sequence

$$(a_1,a_1+a_2,a_1+a_2+a_3,\ldots,\sum^n_{i=1}a_i,\ldots).$$

The sequence of differences of $(a_n)$ is given by the sequence

$$(a_1, a_2-a_1,a_3-a_2,\ldots,a_n-a_{n-1},\ldots).$$

The processes of finding the sequence of partial sums and finding the sequence of differences of a sequence are inverses of each other so every sequence is the sequence of differences of its sequence of partial sums and the sequence of partial sums of its sequence of differences. Every sequence has a unique sequence of partial sums and a unique sequence of differences so it is always possible to find the sequence of partial sums of the sequence of partial sums and repeat the process ad infinitum. Similarly, we can find the sequence of differences of the sequence of differences and repeat ad infinitum. The result is a doubly infinite sequence or “chain” of sequences where each sequence is the sequence of partial sums of the previous sequence and the sequence of differences of the following sequence.

Example
Let $a^{(0)}$ be the sequence defined by $a^{(0)}_n=(-1)^{n+1}$ for all $n\in\mathbb{N}$. For all integers $m>0$, let $a^{(m)}$ be the sequence of partial sums of $a^{(m-1)}$, and for all integers $m<0$, let $a^{(m)}$ be the sequence of differences of $a^{(m+1)}$.

$$\begin{array}{r|ccccccccc}\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\a^{(2)} & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & \cdots\\a^{(1)} & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & \cdots\\a^{(0)} & 1 & -1 & 1 & -1 & 1 & -1 & 1 & -1 & \cdots\\a^{(-1)} & 1 & -2 & 2 & -2 & 2 & -2 & 2 & -2 & \cdots\\a^{(-2)} & 1 & -3 & 4 & -4 & 4 & -4 & 4 & -4 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\\end{array}$$

Every sequence can be used to create a unique chain of sequences.

Editor’s note: This installment examines the relationship between sequences and their summation chains, and provides a way to determine whether two sequences are sum-related.

### Summation Chains and Sequences

#### Sum-Related Chains

Consider the following sequences with their $\Sigma$-chains.

$$s_1=(1,0,0,0,\ldots)$$ $$s_2=(1,1,1,1,\ldots)$$ $$s_3=(0,1,0,0,\ldots)$$ $\Phi_{\Sigma[\mathbb{R}]}(s_1) : \begin{array}{c|ccccc} \color{SteelBlue}{m} \backslash \color{LightSeaGreen}{n} & \color{LightSeaGreen}{1} &\color{LightSeaGreen}{2} & \color{LightSeaGreen}{3} & \color{LightSeaGreen}{4} &\color{LightSeaGreen}{\cdots}\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots &\\\color{SteelBlue}{2} & 1 & 2 & 3 & 4 & \cdots\\\color{SteelBlue}{1} & 1 & 1 & 1 & 1 & \cdots\\\color{SteelBlue}{0} & 1 & 0 & 0 & 0 & \cdots\\\color{SteelBlue}{-1} & 1 &-1 & 0 & 0 & \cdots\\\color{SteelBlue}{-2} & 1 & -2 & 1 & 0 & \cdots\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots &\\\end{array}\quad\Phi_{\Sigma[\mathbb{R}]}(s_2) : \begin{array}{c|ccccc} \color{SteelBlue}{m} \backslash \color{LightSeaGreen}{n} & \color{LightSeaGreen}{1} &\color{LightSeaGreen}{2} & \color{LightSeaGreen}{3} & \color{LightSeaGreen}{4} &\color{LightSeaGreen}{\cdots}\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots &\\\color{SteelBlue}{2} & 1 & 3 & 6 & 10& \cdots\\\color{SteelBlue}{1} & 1 & 2 & 3 & 4& \cdots\\\color{SteelBlue}{0} & 1 & 1 & 1 & 1 & \cdots\\\color{SteelBlue}{-1} & 1 & 0 & 0 & 0 & \cdots\\\color{SteelBlue}{-2} & 1 & -1 & 0 & 0 & \cdots\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots &\\\end{array}$ $$\Phi_{\Sigma[\mathbb{R}]}(s_3) : \begin{array}{c|ccccc} \color{SteelBlue}{m} \backslash \color{LightSeaGreen}{n} & \color{LightSeaGreen}{1} &\color{LightSeaGreen}{2} & \color{LightSeaGreen}{3} & \color{LightSeaGreen}{4} &\color{LightSeaGreen}{\cdots}\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots &\\\color{SteelBlue}{2} & 0 & 1 & 2 & 3 & \cdots\\\color{SteelBlue}{1} & 0 & 1 & 1 & 1 & \cdots\\\color{SteelBlue}{0} & 0 & 1 & 0 & 0 & \cdots\\\color{SteelBlue}{-1} & 0 &1 &-1 & 0 & \cdots\\\color{SteelBlue}{-2} & 0 & 1 & -2 & 1 & \cdots\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots &\\\end{array}$$

These chains, while distinct, contain the same pattern of numbers. The entries of $\Phi_{\Sigma[\mathbb{R}]}(s_2)$ are the entries of $\Phi_{\Sigma[\mathbb{R}]}(s_1)$ moved up.This occurs because $s_2$ is the sequence of partial sums of $s_1$. The entries of $\Phi_{\Sigma[\mathbb{R}]}(s_3)$ are the entries of $\Phi_{\Sigma[\mathbb{R}]}(s_1)$ with an extra column of zeros.

The remainder of this section focuses on these relationships.

Definition. Let M be a $\mathbb{Z}$-Module. Let $(a_n)$ and $(b_n)$ be sequences in M. $(a_n)$ is sum-related to $(b_n)$, denoted $(a_n)\sim_{\Sigma}(b_n)$, if $(a_n)$ is a sequence in $\Phi_{\Sigma} ((b_n))$.

If two sequences are sum-related, then one sequence can be obtained by finding the sequences of partial sums of the sequences of partial sums of the other sequence.

Remark. The trivial sequence $(0,0,0,\ldots)$ is sum-related only to itself.

#### Proposition.

Sum relation defines an equivalence class.

#### Proposition.

Let c and d be $\Sigma$-chains. If there exists $m\in \mathbb{Z}$ such that $c(m,n)=d(m,n)$, for all $n\in \mathbb{N}$, then c=d.

#### Proposition.

Let M be a $\mathbb{Z}$-Module. Let c be a $\Sigma$-chain in M. Let $d:\mathbb{Z}\times\mathbb{N}\to M$ be defined such that there exists $k\in \mathbb{Z}$ such that $d(m,n)=c(m+k,n)$ for every $m\in \mathbb{Z}$ and $n\in \mathbb{N}$, thend is a $\Sigma$-chain,and every sequence in c is sum-related to every sequence in d.

Definition. Let c be a $\Sigma$-chain, and let $k\in \mathbb{Z}$.The $\Sigma$-chain $c^{\,k}$ is defined for all integers m and natural numbers n, by $c^{\,k}(m,n)=c(m+k,n)$. $c^{\,k}$ is said to be the $\Sigma$-chain c shifted by k.

Remark. If $s_1$ and $s_2$ are sequences generating $\Sigma$-chains $c_1$ and $c_2$ respectively, then saying $s_1 \sim_{\Sigma} s_2$ is equivalent to saying $c_1=c_2^k$ for some integer k.

When do two sequences appear in the same summation chain? By the above definition, this is the same as asking when sequences are sum-related. Consider the sequence $(x_k)=(0,0,1,-1,1,-1,\ldots)$. The sequences sum-related to $(x_k)$ are the sequences that are in $\Phi_{\Sigma[\mathbb{R}]}((x_k))$.

$$\Phi_{\Sigma[\mathbb{R}]}((x_{k}) : \begin{array}{c|ccccccc} \color{SteelBlue}{m} \backslash \color{LightSeaGreen}{n} & \color{LightSeaGreen}{1} &\color{LightSeaGreen}{2} & \color{LightSeaGreen}{3} & \color{LightSeaGreen}{4} &\color{LightSeaGreen}{5} &\color{LightSeaGreen}{6} &\color{LightSeaGreen}{\cdots}\\\color{SteelBlue}\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots &\\\color{SteelBlue}{2} & 0 & 0 & 1 & 1 & 2 & 2 & \cdots\\\color{SteelBlue}{1} & 0 & 0 & 1 & 0 & 1 & 0 & \cdots\\\color{SteelBlue}{0} & 0 & 0 & 1 & -1 & 1 & -1 & \cdots\\\color{SteelBlue}{-1} & 0 & 0 & 1 & -2 & 2 & -2 & \cdots\\\color{SteelBlue}{-2} & 0 & 0 & 1 & -3 & 4 & -4 & \cdots\\\color{SteelBlue}{\vdots} & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots &\\\end{array}$$

Notice that each sequence in the chain starts with two zeros followed by a one. The sequences $(0,1,10,20,30,\ldots)$ and $(0,0,4,5,6,\ldots)$ could not be sum-related to $(x_k)$ since they do not begin with this sequence. This strategy can eliminate most pairs of sequences.

#### Lemma.

Given two nontrivial sequences $(a_k)$ and $(b_k)$ such that $(a_k)\sim_{\Sigma}(b_k)$, define $N_a$ and $N_b$ to be the indices of the first nonzero terms of $(a_k)$ and $(b_k)$ respectively, then $N_a= N_b$, and $a_{N_a}=b_{N_b}$.

In general, after exploiting the above lemma, it is difficult to determine sum relation of sequences. It is not possible to exhaustively check every sequence in a chain. However, when every nonzero element in the $\mathbb{Z}$-Module has infinite order, the corollary from the previous post tells where to look for sum-related sequences.
Thus in this case, a closed form procedure for determining sum relation can be used.

#### Theorem.

Let M be a $\mathbb{Z}$-Module. such that the order of every nonzero element in M is infinite. Let $(a_n)$ and $(b_n)$ be sequences in M, and let $c_a=\Phi_{\Sigma}((a_n))$. Let $N_1$ be the index of the first nonzero term of $(a_n)$, and let $N_2$ be the index of the first nonzero term of $(b_{n})$. Thus, $(a_{n})\sim_{\Sigma}(b_{n})$ if and only if the following conditions hold.

1. $N_1=N_2$, if this is true let $N=N_1=N_2$
2. $a_N=b_N$,
3. There exists $m \in \mathbb{Z}$ such that $m\cdot a_N=b_{N+1}-a_{N+1}$,
4. $b_n=c_a(m,n), \forall n \in \mathbb{N}$.

Remark.  When the integer m exists in the theorem above, it is unique.

Example. Let $M=\mathbb{Z}$. Let $(a_n)=(1,-1,-1,1,0,0,0,0,0,\ldots)$ and $b_n=n^2$, $\forall n\in \mathbb{N}$. Let $c_a=\Phi_{\Sigma}((a_n))$.

The index of the first nonzero terms of both sequences is 1. Also, $a_1=b_1=1$ so (1) and (2) are satisfied. $b_2-a_2=5=5\cdot a_1$ so (3) holds. If there is a chain relation, $(b_n)$ will be the fifth sequence in $c_a$. (4) has already been shown for $n=1$ and $n=2$. Using formulas from the theorem, the computation for the other indices is as follows:

$$c_a(5,3)={6 \choose 4}(1)+{5 \choose 4}(-1)+{4 \choose 4}(-1)=15-5-1=9=b_3$$

For $n\geq 4$,
\begin{aligned}c_a(5,n)=&\sum^{n}_{k=1}{4+n-k \choose 4}a_k\\=&{n+3 \choose 4}(1)+{n+2 \choose 4}(-1)+{n+1 \choose 4}(-1)+{n \choose 4}(1)\\=&\frac{(n+3)(n+2)(n+1)n}{24}-\frac{(n+2)(n+1)n(n-1)}{24}\\&-\frac{(n+1)n(n-1)(n-2)}{24}+\frac{n(n-1)(n-2)(n-3)}{24}\\=&\, n^2\\=&\, b_n\\\end{aligned}

Therefore, by the theorem above, $(a_n)\sim_{\Sigma}(b_n)$.

Example
Let $M=\mathbb{R}$. Let $a_n=\frac{1}{n}$ and $b_n=n$, $\forall n\in \mathbb{N}$. Let $c_a=\Phi_{\Sigma}((a_n))$. The index of the first nonzero terms of both sequences is 1. Also, $a_1=b_1=1$ so (1) and (2) are satisfied.
However $b_2-a_2=1.5$ which cannot be obtained by multiplying an integer by $a_1$.
Therefore, $(a_n)\nsim_{\Sigma}(b_n)$.

Example

Let $M=\mathbb{Z}$. Let $a_n=n^2-3n+2$ and $b_n=-\frac{n^4+3n^3+14n^2-48n+32}{7}$, $\forall n\in \mathbb{N}$. Let $c_a=\Phi_{\Sigma}((a_n))$. The index of the first nonzero terms of both sequences is 3. Also, $a_3=b_3=2$ so (1) and (2) are satisfied.
$b_4-a_4=-6=-3\cdot a_3$ so (3) holds. If there is a chain relation, $(b_n)$ will be the third sequence down in $c_a$. However, $c_a(-3,5)=0$ while $b_5=-\frac{108}{7}$.
Therefore, $(a_n)\nsim_{\Sigma}(b_n)$.

Example
Let $M=\mathcal{Z}[X]$. Let $a_n=\sum^{n-1}_{i=0}(i+1)x^{n-i}$, $\forall n\in \mathbb{N}$, and let $b_1=x$ and $b_n=x^n-x^{n-1}$, $n\geq 2$. Let $c_a=\Phi_{\Sigma}((a_n))$. The index of the first nonzero terms of both sequences is 1.
Also, $a_1=b_1=x$ so (1) and (2) are satisfied. $b_2-a_2=(x^2-x)-(x^2+2x)=-3x=-3\cdot a_1$ so (3) holds. If there is a chain relation, $(b_n)$ will be the third sequence down in $c_a$.
(4) has already been shown for $n=1$ and $n=2$.
\begin{aligned}c_a(-3,3)&={3 \choose 1}(x)-{3 \choose 2}(x^2+2x)+{3 \choose 3}(x^3+2x^2+3x)\\&=3x+3(x^2+2x)+x^3+2x^2+3x\\&=x^3-x^2\\&=b_3\\\end{aligned}

For $n\geq 4$,

\begin{aligned}c_a(-3,n)=&\sum^{3}_{k=0}(-1)^{3-k}{3 \choose 3-k}a_{n-3+k}\\=&-{3 \choose 0}\sum^{n-4}_{i=0}(i+1)x^{n-i-3}+{3 \choose 1}\sum^{n-3}_{i=0}(i+1)x^{n-i-1}\\&-{3 \choose 2}\sum^{n-2}_{i=0}(i+1)x^{n-i-1}+{3 \choose 3}\sum^{n-1}_{i=0}(i+1)x^{n-i}\\=&-\sum^{n-1}_{i=3}(i-2)x^{n-i}+3\sum^{n-1}_{i=2}(i-1)x^{n-i}-3\sum^{n-1}_{i=1}ix^{n-i}\\ &\quad+\sum^{n-1}_{i=0}(i+1)x^{n-i}\\=&-\sum^{n-1}_{i=3}(i-2)x^{n-i}+3\sum^{n-1}_{i=3}(i-1)x^{n-i}-3\sum^{n-1}_{i=3}ix^{n-i}\\&+\sum^{n-1}_{i=3}(i+1)x^{n-i}+3x^{n-2}-3x^{n-1}-6x^{n-2}+x^n+2x^{n-1}+3x^{n-2}\\=&\sum^{n-1}_{i=3}[-(i-2)+3(i-1)-3i(i+1)]x^{n-i}+x^n-x^{n-1}\\=&\, x^n-x^{n-1}\\=&\, b_n\\\end{aligned} Therefore,$(a_n)\sim_{\Sigma}(b_n)$.

#### Chains of Sequences with Leading Zeros

Adding zeros to the beginning of a sequence adds columns of zeros to be beginning of the sequences summation chain.

#### Lemma.

Let c be a nontrivial $\Sigma$-chain in $M$. Let $N=\min\{n \in \mathbb{N}:c(0,n)\neq 0\}$. For every integer, m, and for every natural number, $n, $c(m,n)=0$.

#### Proposition.

Let M be a $\mathbb{Z}$-Module. Let c be a nontrivial $\Sigma$-chain in $M$. Let the natural number $N=\min\{n \in \mathbb{N}:c(0,n)\neq 0\}$. Let $d:\mathbb{Z}\times\mathbb{N}\to M$ be defined such that $d(m,n) = c(m,n+N+1)$ for every $m\in \mathbb{Z}$ and $n\in \mathbb{N}$. Then d is a $\Sigma$-chain.

#### Corollary

Let M be a $\mathbb{Z}$-Module, and let $(a_n)$ and $(b_n)$ be sequences in M, such that for a positive integer k, $b_{n+k}=a_n$ for all natural numbers n and $b_n=0$ when $n\leq k$. Let $c_a=\Phi_{\Sigma}((a_n))$ and $c_b=\Phi_{\Sigma}((a_b))$. For every integer m and natural number $n$, $c_b(m,n+k)=c_a(m,n)$, and $c_b(m,n)=0$ when $n< k$.

### Conclusion

Editor’s note: This paper represents the second installment of the concept of summation chains of complex valued sequences. Here, a new way to determine the relationship between two sequences was defined. The concept of sum-relation was defined, and a theorem provided a way to determine the relationship between two sequences.  Part 3 will discuss sequence chains generated by linear function rules.

### References

1. Apostol, T., 1974, Mathematical Analysis 2nd Edition, Addison-Wesley Publishing Company
2. Brin, M. and Stuck, G., 2002, Introduction to Dynamical Systems, Cambridge University Press
3. Cameron, P. J., 1992, Combinatorics: Topics, Techniques, Algorithms, Cambridge University Press
4. Cook, R. G., 2014, Infinite Matrices and Sequence Spaces, Dover Publications, Inc.
5. Johnson, J. (2017)Summation Chains of Sequences Part 1: Introduction, Generation, and Key Definitions. AACTO, Vol. 2, Issue 2.
6. Lee, J., 2011, Introduction to Topological Manifolds, Springer Science+Business Media LLC
7. Leon, S. J., 2002, Linear Algebra with Applications 6th Edition, Prentice Hall, Inc.Scheinerman, E. R., 1996, Invitation to Dynamical Systems, Dover Publications, Inc.
Summation Chains of Sequences Part 1: Introduction, Generation, and Key Definitions

## Abstract

(Editor’s note:) This paper represents the first installment of a masters thesis by Jonathan Johnson. This work introduces the notion of summation chains of sequences. It examines the sequence of sequences generated by partial sums and differences of terms in each level of the chain, looks at chains generated by functions, then introduces a formal definition and key formulae in the analysis of such chains.

## Introduction

Given a complex-valued sequence $(a_n)^{\infty}_{n=1}$, the sequence of partial sums of $(a_n)$ is given by the sequence $(a_1,a_1+a_2,a_1+a_2+a_3,\ldots,\sum^n_{i=1}a_i,\ldots)$. The sequence of differences of $(a_n)$ is given by the sequence $(a_1, a_2-a_1,a_3-a_2,\ldots,a_n-a_{n-1},\ldots)$.

The processes of finding the sequence of partial sums and finding the sequence of differences of a sequence are inverses of each other so every sequence is the sequence of differences of its sequence of partial sums and the sequence of partial sums of its sequence of differences. Every sequence has a unique sequence of partial sums and a unique sequence of differences so it is always possible to find the sequence of partial sums of the sequence of partial sums and repeat the process ad infinitum. Similarly, we can find the sequence of differences of the sequence of differences and repeat ad infinitum. The result is a doubly infinite sequence or  “chain” of sequences where each sequence is the sequence of partial sums of the previous sequence and the sequence of differences of the following sequence.

Example
Let $a^{(0)}$ be the sequence defined by $a^{(0)}_n=(-1)^{n+1}$ for all $n\in\mathbb{N}$.
For all integers $m>0$, let $a^{(m)}$ be the sequence of partial sums of $a^{(m-1)}$,
and for all integers $m<0$, let $a^{(m)}$ be the sequence of differences of $a^{(m+1)}$.

$$\begin{array}{r|ccccccccc}\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\a^{(2)} & 1 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & \cdots\\a^{(1)} & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & \cdots\\a^{(0)} & 1 & -1 & 1 & -1 & 1 & -1 & 1 & -1 & \cdots\\a^{(-1)} & 1 & -2 & 2 & -2 & 2 & -2 & 2 & -2 & \cdots\\a^{(-2)} & 1 & -3 & 4 & -4 & 4 & -4 & 4 & -4 & \cdots\\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\\end{array}$$

Every sequence can be used to create a unique chain of sequences. This paper studies properties of these chains and explores the relationship between sequences and the chains they create. In particular, the following questions are investigated:

• How can the sequences in a summation chain be computed quickly? Clearly, every sequence in a chain of sequences can be computed by repeatedly finding sequences of partial sums of sequences of partial sums or finding sequences of differences of sequences of differences. It is useful, however, to be able compute any sequence in a chain given the starting sequence, $a^{(0)}$, without having to compute all the sequences in between. Methods for computing chains are discussed on Page 3.
• When do two given sequences appear in the same summation chain? When two sequences appear in the same chain, one sequence can be obtained by repeatedly finding the sequences of partial sums of sequences of partial sums of the other sequence. This process could take a long time, and it is not able to determine if two sequences do not appear in the same chain. (Editor’s note: The next installment presents a process to determine with certainty whether or not two sequences are in the same chain.)
• How much information is needed to define a summation chain? Once a chain has been computed, it appears as an array of entries. Starting with a blank array, if some numbers are added to a blank array, can they be used to define the remaining entries uniquely? (Editor’s note: Later installments explore how much information in an array of numbers is needed to determine a chain.)
• How are the convergent behaviors of sequences in a summation chain related? Can every sequence in a chain diverge? Can every sequence in a chain converge? (Editor’s note: The final installment investigates the nature of the limits of sequences in a chain.)

## Chains Generated by Functions

The summation chains we have considered are doubly infinite sequences (of sequences) in which a fixed rule determines the chain of sequences recursively in both directions. The recursive rule in the backward direction is the inverse of the rule in the forward direction. Unlike single infinite sequences, in a chain there is no starting point to begin the recursive pattern. A sequence must be chosen to “start” the chain, and could be thought of as a “seed” of the chain. Starting with a given sequence, the next sequence in the chain can be obtained by applying a rule, and the previous sequence can be obtained by applying the rule’s inverse. However, this same pattern of sequences could be generated starting with any other sequence in the chain, together with the same generating rule. In order for this process to be well-defined, the rule used must have a well-defined inverse. The following definition captures abstractly what chains are. It also generalizes chains of sequences to chains of elements of more general sets.

Definition. Let $f:X\to X$ be a bijective function on a set X. We will call f a function rule. An f-chain is any function $c:\mathbb{Z} \to X$ such that $c(m)=f(c(m-1))$ for all $m \in \mathbb{Z}$. $c(0)\in X$ is called the seed of c, and c is generated by f with seed c(0). c(m) is called the mth term of the chain.

Each term of a chain is given uniquely as $c(m)=f^{m}(c(0))$.

Definition. Let f be a function rule on X. Define $\mathcal{C}_f$ to be the set of all f-chains. The f-chain seed mapping is the function $\Phi_f :X \to \mathcal{C}_f$ where $\Phi_f (x)$ is the chain generated by the seed x.

Remark. $\Phi_f$ is bijective.

Example.
Let $f:\mathbb{Z}_5 \to \mathbb{Z}_5$ where $f(n)=2n$.

This function has five chains generated by 0, 1, 2, 3, and 4. See below. Notice that the chains generated by 1, 2, 3, and 4 contain the same pattern differing only by a shift in starting point.

##### Chain Seeded by 0
$$\begin{array}{cccccccccc}m: & \cdots & -3 & -2 & -1 & 0 & 1 & 2 & 3 & \cdots \\ c(m): & \cdots & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \cdots \\\end{array}$$
##### Chain Seeded by 1
$$\begin{array}{cccccccccc}m: & \cdots & -3 & -2 & -1 & 0 & 1 & 2 & 3 & \cdots \\ c(m): & \cdots & 2 & 4 & 3 & 1 & 2 & 4 & 3 & \cdots \\\end{array}$$
##### Chain Seeded by 2
$$\begin{array}{cccccccccc}m: & \cdots & -3 & -2 & -1 & 0 & 1 & 2 & 3 & \cdots \\ c(m): & \cdots & 1 & 2 & 4 & 3 & 1 & 2 & 4 & \cdots \\\end{array}$$

Definition Let f be a function rule on X. Let $x_1$ and $x_2$ be in $X$. $x_1$ is f-chain related to $x_2$, denoted $x_1 \sim_{f} x_2$, if $x_1$ is in the image of the function chain $\Phi_f (x_2)$.

#### Proposition.

An f-chain relation is an equivalence relation.

This relation is useful to find relationships between elements in the set X induced by f. In the example above, 1, 2, 3, and 4 are all f-chain related while 0 is related only to itself. This section ends with a few notable properties of chains that are useful, in particular when dealing with summation chains.

#### Proposition.

Let $c_1$ and $c_2$ be f-chains. If for any $m \in \mathbb{Z}, c_1(m)=c_2(m)$, then $c_1=c_2$.

#### Proposition.

Let X be a set with rule f. Let c be an f-chain with co-domain X, and let function $d:\mathbb{Z}\to X$ be such that for some $k\in \mathbb{Z}$, $d(m)=c(m+k)$ for all $m \in \mathbb{Z}$, then d is an f-chain. Also, for every pair $x\in \text{Im}(c)$ and $y\in \text{Im}(d)$, $x \sim_{f} y$.

A more detained analysis of sequences generated by recursive funtions in Brin and Stuck (2). The remainder of this paper will focus on summation chains of sequences.

## Summation Chains of Sequences

### Key Definition and Formulas

Let M be a set, and let X be the set of sequences of elements in M. Given a function rule f on X, and an f-chain, c, in order to keep notation clean, new notation is defined as follows. c(m,n) := the nth term of the sequence c(m).

Using this notation we give a clean definition of a summation chain of sequences.

Definition. Let M be a $\mathbb{Z}$-Module. A summation chain, denoted $\Sigma$-chain, of is a function $c:\mathbb{Z} \times \mathbb{N} \to M$ where $c(m,n)=\sum_{k=1}^{n} c(m-1,k)$, for all $m\in \mathbb{Z}$ and for all $n\in \mathbb{N}$$\Phi_{\Sigma[M]}$ is the seed mapping for summation chains in M. The notation $\Phi_{\Sigma}$ is used when the $\mathbb{Z}$-Module, M, is clear. $\mathcal{C}_{\Sigma[M]}$ denotes the set of all M-valued summation chains.

Remark. Summation chains are generated by the function rule $T_{\Sigma}:M^{\infty}\to M^{\infty}$ where $T_{\Sigma}(x_1,x_2,x_3,\ldots)=(x_1,x_1+x_2,x_1+x_2+x_3,\ldots)$.

Example. Let $t:\mathbb{Z}\times \mathbb{N} \to M$ be the constant function $t(m,n)=0$.

$$\sum_{k=1}^{n} t(m-1,k) =\sum_{k=1}^{n} 0 =0=t(m,n),$$

so t is a $\Sigma$-chain. This chain is the trivial chain and is generated with the sequence of all zeros.

How can the sequences in a summation chain be computed quickly? The next two results provide ways to compute entries of a summation chain. The next proposition provides a way to compute the entries in a chain using the other entries close to it.

#### Proposition.

Let M be a $\mathbb{Z}$-Module. Let $c:\mathbb{Z} \times \mathbb{N} \to M$ be a function. c is a $\Sigma$-chain in M if and only if for all $m,m' \in \mathbb{Z}$ and $n \in \mathbb{N}$, the following equalities hold.

1. $c(m,1)=c(m',1)$,
2. If $n \geq 2$, then
$$c(m,n)=c(m,n-1)+c(m-1,n)=c(m+1,n)-c(m+1,n-1).$$

The next theorem provides a way to calculate any sequence in a summation chain directly given any other sequence in the chain without having to compute any of the sequences in between.

#### Lemma.

Let natural numbers, n and k, where k < n.
$${n \choose k}={n-1 \choose k-1}{n-1 \choose k}$$

#### Theorem.

Let c be a $\Sigma$-chain in $\mathbb{Z}$-Module M. Let $m_1,m_2 \in \mathbb{Z}$ such that $m_1>m_2$, and let $n\in \mathbb{N}$. Define $m=m_1-m_2$.
$$c(m_1,n)=\sum_{k=1}^{n} {m+n-k-1 \choose m-1} c(m_2,k)$$

If $n\geq m+1$,

$$c(m_2,n)= \sum_{k=0}^{m} (-1)^{m-k}{m \choose m-k} c(m_1,n-m+k).$$

If $n\leq m+1$,

$$c(m_2,n)=\sum_{k=1}^{n} (-1)^{n-k}{m \choose n-k} c(m_1,k).$$

Remark.When $n=m+1$, the two equations above are the same.

Example. Let $c=\Phi_{\Sigma[\mathbb{R}]}((2^{k-1}))$. The following is a computation of the sequences c(-5,n) and c(-6,n).

c(-5,1)=c(0,1)=1.
c(-5,2)=c(0,2)-5c(0,1)=2-5(1)=-3.
c(-5,3)=c(0,3)-5c(0,2)+10c(0,1)=4-5(2)+10(1)=4.
c(-5,4)=c(0,4)-5c(0,3)+10c(0,2)-10c(0,1)=8-5(4)+10((2)-10(1)=-2.
c(-5,5)=c(0,5)-5c(0,4)+10c(0,3)-10c(0,2)+5c(0,1)=16-5(8)+10(4)-10((2)+5(1)=1.

When $n\geq 6$,

\begin{aligned}c(-5,n)&=\sum_{i=0}^4{4 \choose i}c(0,n-i)\\&=c(0,n)-5c(0,n-1)+10c(0,n-2)\\&\kern{3em}-10c(0,n-3)+5c(0,n-4)-c(0,n-5)\\&=2^{n-1}-5\cdot 2^{n-2}+10\cdot 2^{n-3}-10\cdot 2^{n-4}+5\cdot 2^{n-5}-2^{n-6}\\&=2^{n-6}(32-5\cdot 16+10\cdot 6-10\cdot 4+5\cdot 2-1)\\&=2^{n-6}(1)\\&=2^{n-6}\end{aligned}

c(-6,1)=c(-5,1)=1
c(-6,2)=c(-5,2)-c(-5,1)=-3-1=-4
c(-6,3)=c(-5,3)-c(-5,2)=4-(-3)=7
c(-6,4)=c(-5,4)-c(-5,3)=-2-4=-6
c(-6,5)=c(-5,5)-c(-5,4)=1-(-2)=3
c(-6,6)=c(-5,6)-c(-5,5)=1-1=0

When $n\geq 7, c(-6,n)=c(-5,n)-c(-5,n-1)=2^{n-6}-2^{n-7}=2^{n-7}$

Example. Let $a_1=1, a_2=-1$, and $a_k=0$ when $k\geq 3$. Let $c=\Phi_{\Sigma[\mathbb{R}]}((a_k))$. The following is a computation of the sequence c(4,n).

c(4,1)=c(0,1)=a_1=1

When $n\geq 2$,
\begin{aligned}c(4,n)&=\sum^{n}_{k=1}{3+n-k \choose 3}c(0,k)\\&={n+2 \choose 3}-{n+1 \choose 3}\\&=\frac{n(n+1)(n+2)}{6}-\frac{(n-1)n(n+1)}{6}\\&=\frac{n(n+1)[(n+2)-(n-1)]}{6}\\&=\frac{n(n+1)}{2}\\\end{aligned}

### Notable Corollaries

In general, there is no way to determine the exact distance between two sequences that occur in a summation chain.
Under certain conditions, however, the corollary below determines the location of sequences in relation to each other within a summation chain. This corollary is essential in determining whether two arbitrary sequences appear in the same summation chain.

#### Corollary

Let M be a $\mathbb{Z}$-Module such that the order of every nonzero element in M is infinite. Let c be a non-trivial $\Sigma$-chain in M. Let $N=\min\{n \in \mathbb{N}\::\:c(0,n)\neq 0\}$. Let $m^*,m_1,m_2 \in \mathbb{Z}$.
$$m^*\cdot c(m_1,N)=c(m_2,N+1)-c(m_1,N+1)$$

if and only if $m^*=m_2-m_1$.

Remark. When M is a field with unity $1_M$, the equation above can be written,
$$m^*\cdot(1_M)=\frac{c(m_2,N+1)-c(m_1,N+1)}{c(m_1,N)}$$

In the case of complex valued summation chains, each chain never repeats a sequence unless the chain is trivial. This property comes from the fact that zero is the only complex number that has finite order as stated by the next corollary.

#### Corollary

Let M be a $\mathbb{Z}$-Module such that the order of every nonzero element in M is infinite. Let c be a $\Sigma$-chain in M. Let $m_1,m_2 \in \mathbb{Z}$ such that $m_1\neq m_2$. If $c(m_1,n)=c(m_2,n), \forall n\in \mathbb{N}$, then c is the trivial chain.

It has not been determined whether or not the corollary above is true for all summation chains of sequences.

## Conclusion

Editor’s note: This paper represents the introduction of the concept of summation chains of complex valued sequences. Key concepts and definitions that set the stage for further results in the development of this theory. Part 2 will discuss sum-related chains, and handling leading zeros.