Browsed by
Month: January 2018

All the Same Opposites

All the Same Opposites

Editor’s note: see this appendix for supporting proofs.

Fields are among the most convenient algebraic structures, preserving much of the arithmetic we know and love from familiar fields like the rationals \mathbb{Q} and the real numbers \mathbb{R}.

Now, it is unnecessary that a set possess infinitely many elements to possibly constitute a field (under the right binary operations). Indeed, Dr. Rachel Traylor recently invited readers to finite field theory by way of GF(4), the field with four elements. In this post, I propose to discuss the simplest of all possible fields, namely \text{GF}(2).

What is \text{GF}(2)?

As Dr. Traylor explains in her post on GF(4), a field is built via a nonempty set \mathbb{F} and two binary operations, typically denoted + and \cdot when no further specifying is needed.  Speaking abstractly, \text{GF}(2) is the case when \mathbb{F} is taken to be any set of two elements, say, \{a,b\}, satisfying the operation tables below1.

+ a b
a a b
b b a


\cdot a b
a a a
b a b


So, what makes \text{GF}(2) simplest?

Briefly put, there is no field with fewer elements than has \text{GF}(2). Why is this so? The operations + and \cdot each require an identity element, and these must be distinct. As a result, any field must contain at least two elements. And, as it happens, those are enough to define a fully-fledged field.

As it is the most basic of fields, it might be expected that \text{GF}(2) is only trivially interesting, or only appears in coverage of the subject on the theoretical front. (No judgment here; sometimes the simplest examples of structures are illustrative of very little. See the trivial topology or trivial algebra of sets on a set X.) However, \text{GF}(2) is the mathematical representation of many dualities we encounter on a daily basis. We will take a look at some of the prominent cases.

Even & Odd

Let’s denote by {\bf e} and {\bf o} arbitrarily chosen even and odd integers, respectively. Truly, in this way, we are defining two equivalence classes on the set of integers, \mathbb{Z}, by way of modulo 2 addition.

Reminder, we say an integer m is even to mean there exists an integer k such that m=2k, and m is odd to mean there exists an integer j such that m=2j+1. Collect these in a set: \{{\bf e},{\bf o}\}, and consider this set under ordinary addition + and multiplication \cdot .

These results are summarized in tables.  For instance, in the + table below, we can think of {\bf e}+{\bf o} as

(2k)+(2j+1)=2(k+j)+1\equiv{\bf o},

since k+j is also an integer.

+ {\bf e} {\bf o}
{\bf e} {\bf e} {\bf o}
{\bf o} {\bf o} {\bf e}


\cdot {\bf e} {\bf o}
{\bf e} {\bf e} {\bf e}
{\bf o} {\bf e} {\bf o}


From these tables, {\bf e} serves as additive identity for the field, and {\bf o} the multiplicative identity.


Even more readily encountered than even and odd is binary arithmetic. We have a series of posts here on the theory of coding, and all of it rests on \{0,1\} when taken with the operations of +_2 (addition modulo 2) and \cdot .

+_2 0 1
0 0 1
1 1 0


\cdot 0 1
0 0 0
1 0 1


The similarities to the tables for (\{{\bf e},{\bf o}\},+,\cdot) are evident.  With binary, 0 is the additive (modulo 2) identity, and 1 is the multiplicative identity.  This seems to follow naturally.  After all, 0 is an even integer, and 1 is an odd integer, so these elements are easily believed to work alike.  With that in mind, I move to an example with a less immediate connection.

Truth & Falsehood

Now I want to consider the set \{{\bf T},{\bf F}\} of truth values2 true and false, respectively. 

It is worthwhile to stop and think on which operations make sense for this set. Two are needed to construct a field. Just as even and odd integers may be combined by adding and multiplying, mathematical statements may be combined via disjunction (“or,” \vee), conjunction (“and,” \wedge), and implication (“if, then,” \Rightarrow).  For this case, I am interested in the special “exclusive or,” also called XOR3, denoted by \oplus, and conjunction.

\oplus {\bf F} {\bf T}
{\bf F} {\bf F} {\bf T}
{\bf T} {\bf T} {\bf F}


\wedge {\bf F} {\bf T}
{\bf F} {\bf F} {\bf F}
{\bf T} {\bf F} {\bf T}

Opposite… in Precisely the Same Way

The only thing setting these examples apart is an exchange of symbols. Truly,

a, {\bf e}, 0, and {\bf F},

are interchangeable, just as are

b, {\bf o}, 1, and {\bf T}.

What matters is that these individual elements behave in exactly the same way with respect to their operations.  In the language of algebra, it is said 


(\{{\bf e},{\bf o}\},+,\cdot)

(\{0,1\},+_2,\cdot), and 

(\{{\bf F},{\bf T}\},\oplus,\wedge)

are isomorphic, that is, structurally equivalent.  

Definition (Isomorphism of fields). We say two fields (\mathbb{F},+,\times) and (\mathbb{H},\boxplus,\boxtimes) are isomorphic to mean there exists a function \varphi\mathrel{:}\mathbb{F}\to\mathbb{H} such that \varphi is one-to-one, onto, and, for all x,y\in\mathbb{F},

\varphi(x+y)=\varphi(x)\boxplus\varphi(y);\quad\varphi(x\times y)=\varphi(x)\boxtimes\varphi(y).

In the case of \text{GF}(2), proving isomorphism amounts to simply defining the function that swaps the symbols as needed.  For example, to show (\{{\bf e},{\bf o}\},+,\cdot) and (\{0,1\},+_2,\cdot) are isomorphic, define4\varphi\mathrel{:}\{{\bf e},{\bf o}\}\to\{0,1\} by putting \varphi({\bf e})=0 and \varphi({\bf o})=1.

Concluding Remarks

Mathematics in many respects is the art5 of extension and generalization. The goal is frequently to take an existing object, assume an aerial viewpoint to learn its structure and what makes it work. (This is often carried out by seeing how few pieces of the whole are really needed to accomplish those key characteristics.)  

With the right perspective, even these sets of opposites can bear likeness.  I take comfort in that.  

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Extensions of the Single Server Efficiency Model

Extensions of the Single Server Efficiency Model

For the full paper, which includes all proofs, click here.


Editor’s note: this paper comprises the third chapter of the PhD dissertation by Rachel Traylor. Visit here and here to see chapters one and two, respectively. Herein we further generalize the single server model presented in [3]. In particular, we consider a multichannel server under the cases of both singular and clustered tasks. In the instance of singular tasks, we present a load balancing allocation scheme and obtain a stochastic breakdown rate process, as well as derive the conditional survival function as a result. The model of a multichannel server taking in clustered tasks gives rise to two possibilities, namely, independent and correlated channels. We derive survival functions for both of these scenarios.

Load Balancing Allocation for a Multichannel Server

Model Description

Previously, we had assumed that a web server functions as a single queue that attempts to process jobs as soon as they arrive. These jobs originally brought a constant stress \eta to the server, with the system stress reducing by \eta at the completion of each job.

Now, suppose we have a server partitioned into K channels. Denote each channel as Q_k, k = 1,\ldots,K. Jobs arrive via a nonhomogenous Poisson process with rate \lambda(t). Upon arrival, each job falls (or is routed) to the channel with the shortest queue length. If all queue lengths are equal or multiple channels have the shortest length, the job will enter one of the appropriate queues with equal probability.

We retain the previous notation for the baseline breakdown rate, or hazard function. This is denoted by r_0(t) and is the hazard function under an idle system. We also retain the assumption that the arrival times \mathbf{T} are independent. In addition, the service times \mathfrak{W} are i.i.d. with distribution G_W(w). We assume that all channels are serving jobs at the same time, i.e., a job can be completed from any queue at any time. We do not require load balancing for service. In other words, any queue can empty with others still backlogged. We also retain the FIFO service policy for each queue.

Since we have now “balanced,” or distributed, the load of jobs in the server, not all jobs will cause additional stress to the system. Suppose all jobs bring the same constant stress \eta upon arrival. Under load balancing, we will define the additional stress to the system as \eta\max_k|Q_k|. Figure 1 shows an example server with current stress of 4\eta.

Figure 1. Partitioned Server with Load Balancing.


Due to the dynamic nature of arrival times, allocation to queues, and service times, we have many possible configurations of jobs at any point in time. Therefore, the allocation scheme adds an additional layer of variation to the service times and order of service. The placement of jobs in the various queues (and thus the order of service and service times) is wholly dependent on all arrival times and service times of the prior arrivals. The following examples illustrate the effect on the workload stress added to the system in various scenarios.

Figure 2(a). Example 1 Load Balancing Queue Configuration

Figure 2(b). Breakdown Rate Process Trajectory

Example 1Suppose for simplicity we have 2 channels. Suppose at the time of observation of the system, 3 jobs have arrived and none have finished. WLOG, suppose job 3 fell into Q_{1}. See Figure 2(a). The stress to the system at t=t_{\text{obs}} is r_{0}(t_{\text{obs}}) + 2\eta, as shown in Figure 2(b).

Note in Example 1 that Job 2 does not add any additional stress to the system. Job 1 sees an empty queue upon arrival, and \max_{K}|Q_{K}| = 1 when it falls into any particular queue. Job 2 arrives as Job 1 is still being processed, and thus the placement of Job 1 forces Job 2 into the empty channel. Since \max_{K}|Q_{K}| is still 1, the stress to the system doesn’t change. Job 3 arrives as Jobs 1 and 2 are in service, and thus its choice of queue is irrelevant due to the configuration of the two queues at T_{3}. Regardless of which queue Job 3 falls into, \max_{K}|Q_{K}| = 2. Thus the arrival of Job 3 increases the breakdown rate by \eta again.

The next example shows the change in system stress Job 1 from Example 1 when one job has finished processing before T_{3}.

Figure 3(a). Example 2 Load Balancing Queue Configuration

Figure 3(b). Example 2 Breakdown Rate Process Trajectory

Example 2. Consider the same two-channel system from Example 1. However, now suppose WLOG that T_{3} < T_{1}+W_{1}. In other words, service for Job 1 was completed before Job 3 arrived. Hence Job 3 will fall into the opposite queue as Job 2. The stress to the system at the time of observation would be r_{0}(t) + \eta. See Figures 3(a) and 3(b).

In this scenario, the workload due to Job 3 does not contribute any additional stress to the server. Also observe that upon completion of Job 1, the workload stress to the server does not decrease, as Job 2 still resides in the system and is being served.

Contrast this behavior with the breakdown rate process given in [3]. In the single-channel, single-server model described in both [1] and [3], each job adds stress to the server upon arrival. Under the load balancing allocation scheme, the additional stress to the server depends on the arrival and service times of all prior jobs. From a stochastic perspective, this breakdown rate process has full memory.

The examples above illustrate that \max_{K}|Q_{K}| depends on the intersection of the intervals I_{j} = [T_{j}, T_{j} + W_{j}], j = 1,\ldots,N(t). The next section details the methodology to obtain the configuration of jobs in the server at time t by deccomposition of \bigcup_{j=1}^{N(t)}I_{j} into disjoint atoms and derives the stochastic breakdown rate process under the load balancing allocation scheme.

Breakdown Rate Process and Conditional Survival Function

Let \epsilon = (\epsilon_{1},\ldots,\epsilon_{N(t)}) be a N(t)-tuple whose components \epsilon_{j}\in\{\emptyset, c\}, where \emptyset denotes the empty set, and c denotes the complement of the set. Let E = \{\epsilon \mathrel{:} \epsilon_{j}\in\{\emptyset, c\}\} denote the set of all possible \epsilon, excepting \epsilon = (c,\ldots,c). Then by Lemma 1 (see Appendix),

\bigcup_{j=1}^{N(t)}I_{j} = \bigcup_{\epsilon \in E}\bigcap_{j=1}^{N(t)}I_{j}^{\epsilon_{j}}

Remark. \cap_{j=1}^{N(t)}I_{j}^{\epsilon_{j}} indicates which jobs are still in the server at time t. The union is disjoint; thus only one \epsilon will describe the server configuration at any given time t. For example, if 3 jobs have arrived to the server at time t_{\text{obs}}, |E| = 3\times 2 - 1 = 5. These may be enumerated:


    • I_{1} \cap I_{2} \cap I_{3} 


    • I_{1}^{c} \cap I_{2}^{c} \cap I_{3}


  • I_{1} \cap I_{2}^{c} \cap I_{3}^{c}
    • I_{1}^{c} \cap I_{2} \cap I_{3}


  • I_{1}^{c} \cap I_{2} \cap I_{3}^{c}


As an illustration, refer to Example 1. All three jobs are in the system at t = t_{\text{obs}} (that is, none have completed service), and thus t_{\text{obs}} \in I_{1} \cap I_{2} \cap I_{3}. Expanding, t_{\text{obs}} \in [T_{1}, T_{1}+W_{1}], [T_{2}, T_{2}+W_{2}], and [T_{3}, T_{3}+W_{3}].

Compare the case with that of Example 2. In this case, three jobs have arrived at t = t_{\text{obs}}, but Job 1 has finished by t_{\text{obs}}. Thus t_{\text{obs}} \not\in I_{1}, but since Jobs 2 and 3 are still in the system, t_{\text{obs}} \in I_{2} \cap I_{3}. Thus t_{\text{obs}} \in I_{1}^{c} \cap I_{2} \cap I_{3}.

Now, since the additional workload stress to the server is a multiple of \eta\max_{K}|Q_{K}|, it remains to derive the appropriate multiplier that accounts for the number of jobs that contribute additional stress to the system.

Let n = \sum_{j=1}^{N(t)}\mathbf{1}(\epsilon_{j} = \emptyset | \epsilon_{j} \in \epsilon) for a particular \epsilon, and let \alpha_{\epsilon} be the multiplier that indicates the number of jobs that contribute stress \eta to the system. Under [1] and the generalization in [3], every uncompleted job in the system contributes stress, thus \alpha_{\epsilon} = n.

Under the load balancing scheme, \alpha_{\epsilon} = \lfloor\frac{n+1}{K}\rfloor, where K is the number of channels in the server. This is due to the allocation scheme’s attempts to evenly distribute jobs across channels. Thus, for Example 1, n=3, and K=2, meaning \alpha_{\epsilon} = 2, as illustrated in Figure 2(b) and for Example 2, \alpha_{\epsilon} = \lfloor\frac{3+1}{2}\rfloor = 1, as in Figure 3(b).

Then, the stochastic breakdown rate process under the load balancing allocation scheme is given by
\mathcal{B}(t) = r_{0}(t) + \eta\sum_{\epsilon \in E}\alpha_{\epsilon}\mathbf{1}_{I_{1}^{\epsilon_{1}}\cap I_{2}^{\epsilon_{2}} \cap I_{N(t)}^{\epsilon_{N(t)}}}(t)

Under this expression, only one indicator function will be nonzero at any given point in time, since all atoms are disjoint. Now, I_{1}^{\epsilon_{1}} \cap I_{2}^{\epsilon_{2}} \cap \ldots \cap I_{N(t)}^{\epsilon_{N(t)}} may be expressed as one interval [L_{\epsilon},R_{\epsilon}], where

\begin{aligned}L_{\epsilon}&=\max\left(\{T_{j}\mathrel{:}\epsilon_{j} = \emptyset\}_{j=1}^{N(t)}\right);\\R_{\epsilon}&=\min\left(\{T_{j} + W_{j}\mathrel{:}\epsilon_{j} = \emptyset \}_{j=1}^{N(t)},\{T_{j}\mathrel{:}\epsilon_{j} = c\}_{j=1}^{N(t)}\right)\end{aligned}

Thus, for a server with K channels under a load balancing routing scheme with all jobs bringing constant stress \eta, the breakdown rate process \mathcal{B}(t) may be expressed as

\mathcal{B}(t) = r_{0}(t) + \eta\sum_{\epsilon \in E}\alpha_{\epsilon}\mathbf{1}_{[L_{\epsilon}, R_{\epsilon}]}(t)\qquad(*)

Thus, the conditional survival function under the load balancing scheme is given by
\begin{aligned}S_{Y|\mathfrak{T},\mathfrak{W}, N(t)}(t|\mathfrak{t},\mathfrak{w}, n) &= e^{-\int_{0}^{t}\mathcal{B}(s)ds}\\&=\bar{F}_{0}(t)\exp\left(-\eta\int_{0}^{t}\sum_{\epsilon \in E}\alpha_{\epsilon}\mathbf{1}_{[L_{\epsilon}, R_{\epsilon}]}(s)ds\right)\\&=\bar{F}_{0}(t)\exp\left(-\eta\sum_{\epsilon\in E}\alpha_{\epsilon}\min(t-L_{\epsilon},R_{\epsilon})\right) \end{aligned}


Finding the survival function of the single-channel environment relied on the independence of the set of arrival times and service times. From (*), the independence is clearly lost. As noted before, the random breakdown process has full memory, and thus is completely dependent upon the entire trajectory up to t = t_{\text{obs}}.

Clustered Tasks in a Multichannel Server


Figure 4. Clustered Tasks in a Multichannel Server

The previous multichannel server model in Section 1 implicitly assumed each job comes with one task, and all channels are identical in their ability to serve any task brought by a job. A classic illustration is a block of registers at a retail establishment. Each customer will survey the length of the various queues at each register before choosing the shortest queue. Viewing each of these separate registers as a channel in a single server under these conditions gave rise to the load balancing allocation model detailed in the previous section. This section presents a different interpretation of a multichannel, single-server model.

Suppose a server has multiple channels Q_{1},\ldots,Q_{K}, but each channel serves a different type of task. A customer arrives to the server and may select any number from 0 to K tasks for the server to perform. Said customer will select each possible task j with probability p_{j}. Figure 4 illustrates an example of such a situation in which three customers visit the server and each customer picks a different number and set of tasks at random. A customer is considered fully serviced (i.e. the job is complete) upon completion of the last task belonging to that particular customer.

Model Assumptions

The following mathematical assumptions are made for the multichannel server with clustered tasks:

  1. Customers arrive to the server with K channels via a nonhomogenous Poisson process (NHPP) with intensity \lambda(t).
  2. The breakdown rate of the idle server is given by r_{0}(t).
  3. Each channel corresponds to a different task the server can perform.
  4. The selection of each task is a Bernoulli random variable with probability p_{k}. Thus the number of tasks selected by each customer is a binomial random variable.
  5. The workload stress to the server is a constant multiple \eta of the number of tasks requested by the customer, i.e. the additional stress is given by \eta N, where N is the number of tasks requested.
  6. The PDF of each channel’s service time is given by g_{i}(w), i = 1,\ldots ,K. Since the customer’s service is not complete until all requested tasks have finished, the service life distribution for the customers is given by \max_{i}G_{i}(w).

Under these assumptions, this model is a special interpretation of the random stress environment developed in [3]. In this case, the random workload stress is \eta N, where N is a binomial random variable, and the service life distribution G_{W}(w) = \max_{i}G_{i}(w), which may be easily obtained through the mathematical properties of order statistics. Two variations are considered in the  next section: independent channels and correlated channels.

Independent Channels in a Clustered Task Server

Suppose the selection probabilities for each task in the server are identical, that is, p_{1}=p_{2}=\ldots=p_{K}=p. Then N\sim\text{Bin}(K,p). Using Theorem 3 in [3], the survival function of the multichannel server is given in the following theorem.

Theorem 1 (Survival Function of Multichannel Server with Clustered Tasks and Independent Channels). Suppose model assumptions (1)-(6) above are satisfied. In addition, assume p_{1}=p_{2}=\ldots=p_{K}=p. Then the survival function of the server is given by
\begin{aligned}S_{Y}(t)&=\bar{F}_{0}(t)\exp\left(-K\eta\left[e^{-\eta t}\left(1-p+pe^{-\eta t}\right)^{K-1}-p(1-p)^{K-1}\right]\right.\\&\qquad\left.\times\int_{0}^{t}m(t-w)\bar{G}_{W}(w)dw\right)\end{aligned}

where m(x) = \int_{0}^{x}\lambda(s)ds, \bar{F}_{0}(t) = e^{-\int_{0}^{t}r_{0}(s)ds}, \bar{G}_{W}(w) = 1-G_{W}(w), and G_{W}(w) = \max_{i}G_{i}(w).

Correlated Channels in a Cluster Server

Now suppose the server tasks are correlated, in that the selection of one particular task may affect the selection of any or all of the other tasks. Thus the channels are a sequence of dependent Bernoulli random variables. The construction of dependent Bernoulli random variables is given in [2], and a summary is given.

Dependent Bernoulli Random Variables and the Generalized Binomial Distribution

Korzenwioski [2] constructs a sequence of dependent Bernoulli random variables using a binary tree that distributes probability mass over dyadic partitions of [0,1]. Let 0\leq\delta\leq1, 0<p<1, and q=1-p. Then define the following quantities:
\begin{aligned}q^{+}\mathrel{:=}q+\delta p&\qquad p^{+}\mathrel{:=}p+\delta q\\q^{-}\mathrel{:=}q(1-\delta)&\qquad p^{-}\mathrel{:=}p(1-\delta)\end{aligned}

The quantities above satisfy the following conditions:
\begin{aligned}q^{+}+p^{-}=q^{-}&+p^{+}=q+p=1\\qq^{+}+pq^{-}=q&,\quad qp^{-}+pp^{+}=1\end{aligned}

Figure 5. Construction of Dependent Bernoulli Random Variables

Figure 5 shows the construction shows the dependencies. The following examples using coin flips illustrate the effect of the dependency coefficient \delta:

Example 1 (\delta=1). For \delta = 1, q^{+} = q+p=1, q^{-} = 0, p^{+} = p+q = 1, and p^{-} = 0. Supposing the first coin flip \epsilon_{1} = 1. Then every successive \epsilon_{i} will also be 1. Similarly if \epsilon_{1} = 0. Thus the result of the first coin flip completely determines the outcomes of all the rest.

Example 2 (\delta = 0)For \delta = 0, q^{+} = q^{-} = q, and p^{+} = p^{-} = p. Thus, the first coin flip (and all subsequent ones) have no effect on the ones that follow.

Example 3 (\delta = \frac{1}{4})Suppose p=q=\frac{1}{2}. Then p^{+} = q^{+} = \frac{5}{8}, and p^{-} = q^{-} = \frac{3}{8}. Then the subsequent outcomes \epsilon_{i}, i \geq 2 are more likely to match the outcomes of \epsilon_{1} than not.

Now suppose p = \frac{1}{4}, q = \frac{3}{4}. Then p^{+} = \frac{7}{16}p^{-} = \frac{3}{16}, q^{+} = \frac{13}{16}, and q^{-} = \frac{9}{16}. In this example of an unfair coin, the dependency coefficient \delta still attempts to skew the results following the first coin flip in favor of the outcome of \epsilon_{1}. However, the dependency here heightens the effect of \epsilon_{1} = 0 on subsequent flips, and cannot overcome the discrepancy between the probability of success and failure to skew \epsilon_{i}, i \geq 2 in favor of a 1 following the outcome of \epsilon_{1} = 1.

Using these dependent Bernoulli random variables, [2] presents a Generalized Binomial Distribution for identically distributed but dependent Bernoulli random variables.


Generalized Binomial Distribution

Let X = \sum_{i=1}^{n}\epsilon_{i}, where \epsilon_{i}, i = 1,\ldots ,n are identically distributed Bernoulli random variables with probability of success p and dependency coefficient \delta.  Then

P(X=k) = q\tbinom{n-1}{k}(p^{-})^{k}(q^{+})^{n-1-k} + p\tbinom{n-1}{k-1}(p^{+})^{k-1}(q^{-})^{n-1-(k-1)}

Survival Function of Correlated Channels in a Cluster Server

Suppose the selection of tasks may be modeled by the dependent Bernoulli random variables given in the previous section. That is, suppose the customer selects Tasks 1-K in sequence, and the selection or rejection of Task 1 affects all subsequent tasks by a dependency coefficient \delta. From [2], the correlation between task selections \epsilon_{i} and \epsilon_{j} is given by
\rho = \text{Cor}(\epsilon_{i},\epsilon_{j})=\begin{cases}\delta,&i=1;j=2,\ldots,K\\\delta^{2},&i\neq j;i,j\geq 2\end{cases}

This illustrates the dependency of Tasks 2-K on the outcome of Task 1, and notes that while Tasks 2-K are still correlated with each other, the dependency is much lower. In a similar fashion to the independent channel server, the survival function is derived.

Theorem 2 (Survival Function of Multichannel Server with Clustered Tasks and Dependent Channels). Suppose model assumptions 1-6 are satisfied. In addition, suppose the selection of channels 1-K are determined by identically distributed Bernoulli random variables with dependency coefficient \delta as defined in [2]. Then the survival function of the server is given by

where m(x) = \int_{0}^{x}\lambda(s)ds, and

\begin{aligned}\mathscr{S}(w)&=\sum_{n=0}^{K}e^{-\eta n w}\sum_{j=0}^{K-n-1}\tbinom{K-1}{n-1, j,K-1-n-j}p^{K-1-j}(1-p)^{j+1}\delta^{K-1-n-j}(1-\delta)^{n}\\&\quad+\sum_{n=0}^{K}ne^{-\eta nw}\sum_{i=0}^{n-1}\tbinom{K-1}{K-1-n,i,n-1-i}p^{i+1}(1-p)^{K-n}\delta^{n-1-j}(1-\delta)^{K-n-j}\end{aligned}


The generalized model of a server under random workload proposed in [3] admits further expansion by way of relaxing the assumption that incoming tasks have exactly one queue to enter on arrival. In considering a server partitioned into several channels, a cost is incurred, namely that additional stress to the server is dependent upon arrival and service times of all previous jobs. However, even under these circumstances, we may obtain a breakdown rate process and satisfactory conditional survival function for the server, and the door is opened to further discussion. By examining the multichannel server, we consider the interrelations of the channels themselves, and derive survival functions to meet the case when the channels are independent as well as when they are correlated.

 Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


[1] Ki Hwan Cha and Eui Yong Lee. A stochastic breakdown model for an unreliable web server system and an optimal admission control policy. Journal of Applied Probability, 48(2):453–466, 2011.

[2] Andrzej Korzeniowski. On correlated random graphs. Journal of Probability and Statistical Science, 11:43–58, 2013.

[3] R. Traylor. Stochastic reliability of a server under random workload. Academic Advances of the CTO, 1, 2017.

The Red-Headed Step-Distributions

The Red-Headed Step-Distributions

Almost every textbook in probability or statistics will speak of classifying distributions into two different camps: discrete (singular in some older textbooks) and continuous. Discrete distributions have either a finite or a countable sample space (also known as a set of Lebesgue measure 0), such as the Poisson or binomial distribution, or simply rolling a die. The probability of each point in the sample space is nonzero. Continuous distributions have a continuous sample space, such as the normal distribution. A distribution in either of these classes is either characterized by a probability mass function (pmf) or probability distribution function (pdf) derived from the distribution function via taking a derivative. There is, however, a third kind.

One rarely talked about, or mentioned quickly and then discarded. This class of distributions is defined on a set of Lebesgue measure 0, yet the probability of any point in the set is 0, unlike discrete distributions. The distribution function is continuous, even uniformly continuous, but not absolutely continuous, meaning it’s not a continuous distribution. The pdf doesn’t exist, but one can still find moments of the distribution (e.g. mean, variance). They are almost never encountered in practice, and the only real example I’ve been able to find thus far is based on the Cantor set. This class is the set of red-headed step-distributions– the singular continuous distributions.

Back up, what is Lebesgue measure?

Measure theory itself can get extremely complicated and abstract. The idea of measures is to give the “size” of subsets of a space. Lebesgue measure is one type of measure, and is actually something most people are familiar with: the “size” of subsets of Euclidean space in n dimensions. For example, when n=1, we live in 1D space. Intervals. The Lebesgue measure of an interval [a,b] on the real line is just the length of that interval: b-a. When we move to two dimensions, \mathbb{R}\times \mathbb{R}, the Cartesian product of 1D space with itself, our intervals combine to make rectangles. The Lebesgue measure in 2D space is area; so a rectangle built from [a,b]\times [c,d] has Lebesgue measure (b-a)(d-c). Lebesgue measure in 3D space is volume. And so forth. 

Now, points are 0-dimensional in Euclidean space. They have no size, no mass. They have Lebesgue measure 01. Intuitively, we can simply see that Lebesgue measure helps us see how much “space” something takes up in the Euclidean world, and points take up no space, and hence should have measure 0. 

In fact, any countable set of points has Lebesgue measure 0. Even an infinite but countable set. The union of disjoint Lebesgue measurable sets has a measure equal to the sum of the individual sets. Points are certainly disjoint, and they each have measure 0, and summing 0 forever still yields 0.2 So, the set \{0,1,2\} has Lebesgue measure 0. But so do the natural numbers \mathbb{N}and the rational numbers \mathbb{Q}, even though the rational numbers contain the set of natural numbers.

It is actually possible to construct an uncountable infinite set that has Lebesgue measure 0, and we will need that in constructing our example of a singular continuous distribution. For now, we’ll examine discrete and continuous distributions briefly.

Discrete (Singular) Distributions

These are the ones most probability textbooks begin with, and most of the examples that are familiar.

Roll a fair die. 

The sample space for a roll of a fair die X is S =\{1,2,3,4,5,6\}. The PMF is P(X = x) = 1/6, where x \in S. The CDF is given by the function P(X\leq x) = \sum_{j\leq x}P(X=j) 


P(X \leq 4) = \sum_{j\leq 4}\frac{1}{6} = \frac{2}{3}

Binomial Distribution

A binomial random variable X counts the number of “successes” or 1s in a binary sequence of n Bernoulli random variables. Think a sequence of coin tosses, and counting the number of heads. In this case, the sample space is infinite, but countable: S = \{0,1,2,\ldots\}. If the probability of a 1, or “success” is p, then the PMF of X is given by 

P(X=x) = {n \choose x}p^{x}(1-p)^{n-x}

Note here again that the sample space is of Lebesgue measure 0, but the probability of any point in that space is a positive number. 

Continuous Distributions

Continuous distributions operate on a continuous sample space, usually an interval or Cartesian product of intervals or even a union of intervals. Continuous distribution functions F are absolutely continuous, meaning that (in one equivalent definition), the distribution function has a derivative f=F' almost everywhere that is Lebesgue integrable, and obeys the Fundamental Theorem of Calculus:

F(b)-F(a) = \int_{a}^{b}f(x)dx

for a< b. This f is the probability distribution function (PDF), derived by differentiating the distribution function. Let’s mention some examples of these:

The Continuous Uniform Distribution

Suppose we have a continuous interval [a,b], and the probability mass is spread equally along this interval, meaning that the probability that our random variable X lies in any subinterval of size s has the same probability, regardless of location. Suppose we do not allow the random variable to take any values outside the interval. The sample space is continuous but over a finite interval. The distribution function for this X is given by 

F(x) = \left\{\begin{array}{lr}0&x< a\\\frac{x-a}{b-a}&a\leq x \leq b\\1&x > b\end{array}\right.

This is an absolutely continuous function. Then we may easily derive the PDF by differentiating F:

f(x) = \mathbb{1}_{x \in [a,b]}\frac{1}{b-a}

where \mathbb{1}_{x \in [a,b]} is the indicator function that takes value 1 if x is in the interval, and 0 otherwise. 

This distribution is the continuous version of a die roll. The die roll is the discrete uniform distribution, and here we just allow for a die with uncountably many sides with values in [a,b]. The probability of any particular point is 0, however, even though it is possible to draw a random number from this interval. To see this, note that the probability that the random variable X lies between two points in the interval, say x_{1} and x_{2} is given by multiplying the height of the PDF by the length (Lebesgue measure) of the subinterval. The Lebesgue measure of a point is 0, so even though a value for the PDF exists at that point, the probability is 0. 

We don’t run into issues here mathematically because we are on a continuous interval. 

The Normal Distribution

Likely the most famous continuous distribution, the normal distribution is given by the famous “bell curve.” In this case, the sample space is the entire real line. The probability that a normally distributed random variable X lies between any two points a and b is given by 

P(a\leq X \leq b) = \int_{a}^{b}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right)dx

where \mu is the mean and \sigma^{2} is the variance. 

Singular Continuous Distributions

We’re going to begin this section by discussing everyone’s favorite counterexample in mathematics: the Cantor set. 

The Cantor set

The Cantor set is given by the limit of the following construction:

  1. Take the interval [0,1]
  2. Remove the middle third: (1/3, 2/3), so you’re left with [0,1/3]\cup[2/3,1]
  3. Remove the middle third of each of the remaining intervals. So you remove (1/9,2/9) from [0,1/3] and (7/9,8/9) from [2/3,1], leaving you with the set [0,1/9]\cup[2/9,1/3]\cup[2/3,7/9]\cup[8/9,1]

Continue this process infinitely.

This is an example of a set that is uncountable, yet has Lebesgue measure 0. Earlier, when we discussed Lebesgue measure, we noted that all countable sets had measure 0. Thus we may conclude that only uncountable sets (like intervals) have nonzero Lebesgue measure. However, the Cantor set illustrates that not all uncountable sets have positive Lebesgue measure. To see why the Cantor set has Lebesgue measure 0, we will look at the measure of the sets that are removed (the complement of the Cantor set):

At the first step, we have removed one interval of size 1/3. At the second step, we remove two intervals of size 1/9. At the third step, we remove four intervals of size 1/27. Let’s call S_{n} the subset removed from the interval [0,1] by the nth step. By the end of the third step, we have removed a set of size

m(S_{3}) = \frac{1}{3} + \frac{2}{3^{2}} + \frac{4}{3^{3}}

By the nth step, 

m(S_{n}) = \sum_{j=0}^{n}\frac{2^{j}}{3^{j+1}}

This is the partial sum of a geometric series, so

m(S_{n}) = 1-\left(\frac{2}{3}\right)^{n}

Now, the Cantor set is formed when n \to \infty. The measure of the complement of the Cantor set, which we called S_{\infty} then has measure

m(S_{\infty}) = \lim_{n \to \infty}m(S_{n}) = \lim_{n \to \infty}1-\left(\frac{2}{3}\right)^{n} = 1

But the original interval we started with had Lebesgue measure 1, and the union of the Cantor set with its complement S_{\infty} is the interval [0,1]. That means that the measure of the Cantor set plus the measure of its complement must add to 1, which implies that the Cantor set is of measure 0. However, since we removed open intervals during the construction, there must be something left; in fact, there are uncountably many points left. 

Now we have an uncountable set of Lebesgue measure 0. We’re going to use this set to construct the only example I could find of a singular continuous distribution. It is very important that the Cantor set is an uncountable set of Lebesgue measure 0. 

Building the Cantor distribution

Update: Following a correction from an earlier version, I’m going to show how to construct this distribution directly and via the complement of the Cantor set. The latter was used in a textbook I found, and is a bit convoluted in its construction, but I’m going to leave it.

The direct construction is to look at the intervals left behind at each stage n of constructing the Cantor set. Assign a probability mass of \frac{1}{2^{n}} to each of the 2^{n} intervals left behind, and this is your distribution function. It’s basically a continuous uniform distribution, but on stages of the Cantor set construction. Sending n \to \infty yields the Cantor set, but the probability distribution moves to 0 on a set of measure 0. Thus, unlike the continuous uniform distribution, where the probability of any single point was 0, but the support has positive measure, we essentially have the continuous uniform distribution occurring on a set of measure 0, which means we have a continuous distribution function on a singular support of measure 0 that is uncountable and thus not discrete. This distribution is therefore neither continuous nor discrete. 

Another way to construct this is by complement, via Kai Lai Chung’s A Course in Probability Theory. 

(Note: after a second glance at this, I found this to be a relatively convoluted way of constructing this distribution, since it can be fairly easily constructed directly. However, I imagine the author’s purpose was to be very rigid and formal to cover all his bases, so I present a review of it here:)

Let’s go back to the construction of the Cantor set. At each step n we have removed in total 2^{n}-1 disjoint intervals. Let’s number those intervals, going from left to right as J_{n,k}, where k = 1,2,\ldots, 2^{n}-1

For example, at n=2 we have that J_{2,1} = (1/9,2/9),J_{2,2} = (1/3,2/3), and J_{2,3} = (7/9,8/9)

Now let the quantity c_{n,k} = \frac{k}{2^{n}}. This will be the probability mass assigned to interval J_{n,k}. So we define the distribution function as 

F(x) = c_{n,k}, x \in J_{n,k}

Let U_{n} = \cup_{k=1}^{2^{n}-1}J_{n,k}, and U = \lim_{n\to\infty}U_{n} The function F is indeed a distribution function and can be shown to be uniformly continuous on the set D = (-\infty,0)\cup U \cup (1,\infty). However, none of the points in D is in the support of F, so the support of F is contained in the Cantor set (and in fact is the Cantor set).  The support (the Cantor set) has measure 0, so it is singular, but the distribution function is continuous, so it cannot be a discrete distribution. This distribution fits nowhere in our previous two classes, so we must now create a third class — the singular continuous distribution.

(By the way, even though the PDF doesn’t exist, the Cantor distribution still has mean of 1/2 and a variance of 1/8, but no mode. It does have a moment generating function.)

Any other examples?

With some help, I spent some time poring through quite a few probability books to seek further study and other treatment of singular continuous distributions. Most said absolutely nothing at all, as if the class didn’t exist. 

One book, Modern Probability Theory and Its Applications has a rather grumpy approach:

There also exists another kind of continuous distribution function, called singular continuous, whose derivative vanishes at almost all points. This is a somewhat difficult notion to picture, and examples have been constructed only by means of fairly involved analytic operations. From a practical point of view, one may act as if singular continuous distribution functions do not exist, since examples of these functions are rarely, if ever, encountered in practice.

This notion also has led me to a couple papers, which I intend to review and continue presenting my findings. I happen to have a great fondness for these “edge cases” and forgotten areas of mathematics. I believe they are the most ripe for groundbreaking new material. 


(Commentary) Spectre and Meltdown: Spokes on a Wheel

(Commentary) Spectre and Meltdown: Spokes on a Wheel

There has been a flurry of articles and discussions related to Intel’s Spectre and Meltdown vulnerabilities. Many good writings discuss the technical nature and implications to hardware, and you can find a selection here, here, and here. As of this writing, many software developers and security experts are frantically trying to create patches to protect their infrastructure and customers from those who would exploit a 20 year-old design flaw, and we obviously wish them the best of luck.

The severity of the issue is in a large part due to a design flaw that dates back to 1995, when Bill Clinton was president, the DVD was first announced, eBay was founded, Braveheart won the Best Picture Academy Award, and Alanis Morissette’s Jagged Little Pill was released. That means that hundreds of thousands of programs, apps, and products were built on top of a fundamental design flaw that went unnoticed longer than some of our siblings have been alive. What happened?

Complexity happened. Not complexity in the sense of the human body, a finely tuned machine. Complexity born of rushed thinking, continuous development cycles, and a mentality we discouraged in our students when we were still academics — the notion that  just turning in an assignment, even if it was not well-done, was acceptable. We have been scolded in other jobs that “perfect is the enemy of good”, to “fail fast and fail often” and to “just get something delivered.” We at The Math Citadel fundamentally disagree with and reject all of these strategies and attitudes. Rushed thinking and a desperation to be seen as “first done” with the most hype has led to complexity born of brute force solutions, with patches to fix holes discovered after release. When those patches inevitably break something else, more patches are applied to fix the first patches.

“And on and on it spins, crushing those on the ground”, in the words of Daenerys Targaryen.

Lest it be thought that we are picking only on Intel, or that this is an isolated issue, let us explore other manifestations.

  • In November 2017, a developer found a security vulnerability in Apple’s High Sierra operating system that enables access to the root superuser account with a blank password on any Mac (local or remote) running OS 10.13.1. Apple quickly released a patch meant to fix it, but another update ended up reintroducing the “root bug.”
  • When iOS 11.1 was released, autocorrect would change the letter “I” to “A” with a question mark in a box.

The gaming industry has had its share of problems from rushing releases that weren’t complete. (One might almost be forgiven for assuming it’s a business strategy.)

  • No Man’s Sky was released to much hype, but the first release had very few of the promised features, generating a huge player backlash. The company released further features as DLC and patches, but the damage was done.
  •  Call of Duty: WWII had server issues at launch that took the game offline, random disconnects from matches, and some reports of gamer rankings reset. After two patches, users reported improvements but no real fixes.
  • Batman: Arkham Night released a version for the PC, and it became a disaster. Players had to turn off textures and move graphics qualities to “low” to even make the game playable, regardless of how nice their graphics card was. 

The machine learning/“artificial intelligence” space has quite a few examples, and these range from amusing to sinister.

  • Algorithmic pricing without a sanity check leads to a $23 million book price on Amazon
  • Automatic t-shirt slogan generator causes a business to fold after the owner’s algorithm generates a t-shirt saying “Keep calm and rape on.”
  •  Automated re-pricing software RePricer Express erroneously changes the prices of thousands of items on Amazon to a penny. Compounding the problem is the automatic order fulfillment from Amazon, making it impossible to retract the order. One small business owner cites a $150,000 loss.
  • Accusations of price-gouging on flights out of Florida prior to Hurricane Irma are more likely due to the automatic pricing algorithms than active price gouging. Nonetheless, it was a PR nightmare.

We can list many more examples, enough to provide clear evidence of a pattern. There have already been those calling for a re-examination of machine learning and data science in particular in response to these issues. The real problem, however, goes much deeper.

Entire companies are based around the notion of scrum development, a continued cycle of “sprints” that last a couple weeks and end in some deliverable. The original methodology may be good for a prototype, but when scaled to company operations, it inspires a culture of “just get it done.” It leads to a toxic environment, where both leaders and individual contributors are driven by a fury to “turn it in” and release before a competitor or by the time VMWorld comes around. It means products are being built on top of other products that were just barely good enough to ship with a shiny marketing veneer.

In the physical world, this would be akin to building a bridge by throwing stones wantonly into the water in order to hop across. Yes, you can get across the river quickly, but misplacement of any one of those stones can mean you may slip and fall into the water, or someone coming behind you who distributes his weight differently may fall. Worse, if the rocks seem stable enough for a long time, people begin constructing a hasty bridge using those stones as a foundation. The bridge holds for a while, but one day the cumulative effect of poor materials and high traffic volume cause the bridge to collapse, and people get hurt.

If civil engineers designed and built bridges the way tech develops and releases products, people would die. If aerospace engineers rushed the design of a commercial airliner and patched issues the way tech does, people would die.

If mathematicians developed their theories and equations the way tech develops and releases products, your world would crumble.

Let’s run a thought experiment. Suppose George Boole, the inventor of Boolean algebra, rushed his theories out so he could beat an academic rival. He didn’t really prove everything or make sure it was airtight. There was maybe that funny edge case, but he couldn’t see it ever arising in practice, so he just ignored it. Unbeknownst to him, that edge case was a counterexample that showed all of his notions to be false overall.

Boolean algebra is the fundamental theory by which your computers work today, and will be such until and unless quantum computing takes off. If what seemed like an edge case 150 years ago became the foundation for the development of computers, the ramifications would be so vast as to be unrecoverable. It would require a whole new redesign of how computers worked. Let that effect snowball in your mind.

That’s one topic in mathematics. Imagine hundreds of mathematicians developing the foundations of your world the same way. But we don’t. We study the river, the banks, and the earth carefully. Only when we are sure do we begin constructing a bridge, starting with the foundation. Stone by stone, making sure the bottom ones are perfect and unbreakable. The work takes years, decades, lifetimes sometimes, but the results are built to last. Mathematics is the only discipline that has never had to “reinvent” itself upon the discovery of new knowledge. All of mathematics builds, expands, and generalizes.

What does this have to do with business? To fix the attitudes that ultimately led to Spectre, Meltdown, and the patches to fix them, and the patches to fix those, companies need to think like mathematicians. To fix the ideologies that rushed out a new macOS with a serious security vulnerability, companies need to think like mathematicians. To avoid the PR nightmares from “AI gone wrong”, companies need to think like mathematicians.

Leaders and individual contributors need to think like mathematicians, searching deliberately for elegant, simple solutions that are provable, explainable, and fundamentally strong. The elegance and simplicity will allow for other things to be built on top that won’t break the foundation. Even when something is built on top of a foundation, it is carefully examined as to its stability. Provable solutions mean no surprises later when something fails.

This requires lateral thinking, creativity, and most importantly, a willingness to take a bit longer in product development and business decisions. It’s a difficult thing to do, when all your competitors move so fast you think you would only hear the Doppler effect as they scream by you. Adopting a mathematician’s outlook takes longer. However, the results are simpler, with less maintenance, less need for software janitors to clean up the mess from the frantic development party, and stronger, more resilient products. Every one of these things yields cost savings, long term revenue, and perhaps most importantly, customer trust.

We at The Math Citadel are mathematicians, refusing the siren song of scrum-like mentalities. We’re here to help the companies who want to look past the hypes, who want to carve their own paths rather than be the leader on a paved course. We’re here for the companies who say “enough” to shortsightedness and continuous patching. Spectre and Meltdown are just spokes on a wheel. We don’t intend to stop the wheel, we intend to break it.