Browsed by
Category: Set Theory

Sequences & Tendency: Topology Basics Pt. 2

Sequences & Tendency: Topology Basics Pt. 2

Introduction

In my previous post I presented abstract topological spaces by way of two special characteristics. These properties are enough to endow a given set with vast possibilities for analysis. Fundamental to mathematical analysis of all kinds (real, complex, functional, etc.) is the sequence.

We have covered the concept of sequences in some of our other posts here at The Math Citadel. As Rachel pointed out in her post on Cauchy sequences, one of the most important aspects of the character of a given sequence is convergence.

In spaces like the real numbers, there is convenient framework available to quantify closeness and proximity, and which allows naturally for a definition of limit or tendency for sequences. In a general topological space missing this skeletal feature, convergence must be defined.

This post will assume only some familiarity with sequences as mathematical objects and, of course, the concepts mentioned in Part 1. For a thorough treatment of sequences, I recommend Mathematical Analysis by Tom M. Apostol.

Neighborhoods

Suppose (X,\mathscr{T}) is a given topological space, and nothing more is known. At our disposal so far are only open sets (elements of \mathscr{T}), and so it is on these a concept of vicinity relies.

Definition. Given a topological space (X,\mathscr{T}), a neighborhood of a point x\in X is an open set which contains x.

That is, we say an element T\in\mathscr{T} such that x\in T is a neighborhood1 of x. To illustrate, take the examples from my previous post.

The Trivial Topology

When the topology in question is the trivial one: \{\emptyset,X\}, the only nonempty open set is X itself, hence it is the only neighborhood of any point x\in X.

The Discrete Topology

Take X=\{2,3,5\} and \mathscr{T} to be the collection of all subsets of X:

  \emptyset  
\{2\} \{3\} \{5\}
\{2,3\} \{2,5\} \{3,5\}
  \{2,3,5\}  

 

Then, for, say x=5, neighborhoods include \{5\}, \{2,5\}, \{3,5\}, and \{2,3,5\}.

The Standard Topology on \mathbb{R}

The standard topology on \mathbb{R} is defined to be the family of all sets of real numbers containing an open interval around each of its points. In this case, there are infinitely2 many neighborhoods of every real number. Taking x=\pi for instance, then (3,4), (-2\pi,2\pi), and even

\bigcup_{n=1}^{\infty}\left(\pi-\frac{1}{n},\pi+\frac{1}{n}\right)

are all neighborhoods of \pi.

Remark. A special type of neighborhood in the standard topology is the symmetric open interval. Given a point x and a radius r>0, the set

(x-r,x+r)=\{y\in\mathbb{R}\mathrel{:}|x-y|<r\}

is a neighborhood of x. These sets form what is referred to as a basis for the standard topology and are important to definition of convergence in \mathbb{R} as a metric space.

Convergence

“…the topology of a space can be described completely in terms of convergence.” —John L. Kelley, General Topology

At this point in our discussion of topological spaces, the only objects available for use are open sets and neighborhoods, and so it is with these that convergence of a sequence are built3.

Definition. A sequence (\alpha_n) in a topological space (X,\mathscr{T}) converges to a point L\in X if for every neighborhood U of L, there exists an index N\in\mathbb{N} such that \alpha_n\in U whenever n\geq N. The point L is referred to as the limit of the sequence (\alpha_n).

Visually, this definition can be thought of as demanding the points of the sequence cluster around the limit point L. In order for the sequence (\alpha_n) to converge to L, it must be the case that after finitely many terms, every one that follows is contained in the arbitrarily posed neighborhood U.

As you might expect, the class of neighborhoods available has a dramatic effect on which sequences converge, as well as where they tend. Just how close to L are the neighborhoods built around it in the topology?

We will use the example topologies brought up so far to exhibit the key characteristics of this definition, and what these parameters permit of sequences.

The Trivial Topology

In case it was to this point hazy just how useful the trivial topology is, sequences illustrate the issue nicely. For the sake of this presentation, take the trivial topology on \mathbb{R}. There is precisely one neighborhood of any point, namely \mathbb{R} itself. As a result, any sequence of real numbers converges, since every term belongs to \mathbb{R}. Moreover, every real number is a limit of any sequence. So, yes, the sequence (5,5,5,\ldots) of all 5‘s converges to \sqrt{2} here.

The Discrete Topology

Whereas with the trivial topology a single neighborhood exists, the discrete topology is as packed with neighborhoods as can be. So, as the trivial topology allows every sequence to converge to everything, we can expect the discrete topology to be comparatively restrictive. Taking the set \{2,3,5\} with the discrete topology as mentioned above, we can pinpoint the new limitation: every set containing exactly one point is a neighborhood of that point. Notice the sets4 \{2\}, \{3\}, and \{5\} are all open sets.

What does this mean? Any sequence that converges to one of these points, say 3, must eventually have all its terms in the neighborhood \{3\}. But that requires all convergent sequences to be eventually constant! This seems to be a minor issue with the finite set \{2,3,5\}, but it presents an undesirable, counter-intuitive problem in other sets.

Take \mathbb{R} with the discrete topology, for example. Under these rules, the sequence

(\alpha_n)=\left(\frac{1}{n}\right)=\left(1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\ldots\right),

though expected to converge to 0, does not converge at all.

So, the discrete topology is too restrictive, and the trivial topology lets us get away with anything. Fortunately, a happy middle ground exists by being a little more selective with neighborhoods.

The Standard Topology

By requiring an open set to contain an open interval around each of its points, it is impossible that a singleton be an open set. Therefore a singleton cannot be a neighborhood, and we eliminate the trouble posed by the discrete topology. Yet every open interval around a real number L contains a smaller one, and each of these is a neighborhood.

This effectively corrals the points of any convergent sequence, requiring the distance between the terms and the limit to vanish as n increases. Take again the sequence

(\alpha_n)=\left(\frac{1}{n}\right)=\left(1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\ldots\right).

We suspect (\alpha_n) converges to 0, but this requires proof. Therefore, we must consider an arbitrary neighborhood of 0, and expose the index N\in\mathbb{N} such that all terms, from the Nth onward, exist in that neighborhood.

Suppose U is a given neighborhood of 0, so that U contains an open interval surrounding 0. Without loss of generality, we may assume this interval is symmetric; that is, the interval has the form (-r,r) for some radius r>0. Take N to be any integer greater than \tfrac{1}{r}. Then, whenever n\geq N,

\alpha_n = \frac{1}{n} \leq \frac{1}{N} < \frac{1}{1/r} = r.

But this means \alpha_n\in(-r,r)\subset U so long as n\geq N. Since we chose U arbitrarily, it follows (\alpha_n) converges to 0.

Conclusion

The behavior of a sequence in a given set can change rather drastically depending on the network of neighborhoods the topology allows. However, with careful construction, it is possible to have all the sequential comforts of metric spaces covered under a briefly put definition.

My next post in this series will push the generalization of these concepts much further, by relaxing a single requirement. In order to define convergence in the preceding discussion, the set of indices \mathbb{N} was important not for guaranteeing infinitely many terms, but instead for providing order. This allows us to speak of all terms subsequent to one in particular. It turns out that if we simply hold on to order, we can loosen the nature of the set on which it is defined. That is the key to Moore-Smith Convergence, to be visited next.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Building a Ground Floor: Topology Basics Pt. 1

Building a Ground Floor: Topology Basics Pt. 1

Like some other terms in mathematics (“algebra” comes to mind), topology is both a discipline and a mathematical object. Moreover like algebra, topology as a subject of study is at heart an artful mathematical branch devoted to generalizing existing structures like the field of real numbers for their most convenient properties. It is also a favorite subject of mine, ever since my first introduction to it. This is due in large part to its exceedingly simple first principles, which make the wealth of expansion they allow all the more impressive.

It is my intent to discuss some of these starting points here, in the first of a short series of posts toward the goal of presenting one of my favorite results arising in topology: Moore-Smith convergence, a vast extension of the notion of the limit of a sequence. My representation here follows the explanation given by John L. Kelley in his classic text General Topology, which I recommend to all curious readers.

What is a topology?

Definition. By topology is meant any collection \mathscr{T} of sets satisfying two conditions:

\begin{array}{lrcl}\text{(1)}&A,B\in\mathscr{T}&\Rightarrow&A\cap B\in\mathscr{T};\\\text{(2)}&\mathscr{C}\subset\mathscr{T}&\Rightarrow&\bigcup\{C\in\mathscr{C}\}\in\mathscr{T}\end{array}

It is worthwhile to break this definition down. Condition (1) requires that the intersection of any two elements of the collection \mathscr{T} must itself be a member of \mathscr{T}. Condition (2) states that the union of any subcollection of \mathscr{T} must also belong to \mathscr{T}. These are referred to as closure to finite intersection and closure to arbitrary union, respectively, in some texts.

Notably, the definition speaks only of a collection of sets with no specification beyond the two conditions. Yet, even with these, one can deduce some further characteristic properties.

Corollary. If \mathscr{T} is a topology, then

\begin{array}{ll}\text{(i)}&\emptyset\in\mathscr{T};\\\text{(ii)}&\bigcup\{T\in\mathscr{T}\}\in\mathscr{T}.\end{array}

Since \emptyset\subset S for every set S, and \mathscr{T}\subset\mathscr{T}, it is enough to apply (2) to both of these cases to prove the corollary. In fact, many texts make the definition X\mathrel{:=}\bigcup\{T\in\mathscr{T}\}, and refer to the pair (X,\mathscr{T}) as the topological space defined by \mathscr{T}.

This way, the space is given its character by way of the scheme that builds \mathscr{T}, rather than the set X. It is an important distinction, for many topologies are possible on a given set. With that, we can look at some examples.

From Trivial to Complicated

1. The Trivial Topology

Based on the corollary just presented, it is enough to gather a given set X and the empty set \emptyset into a collection \{\emptyset,X\} and have created a topology on X. Because X and \emptyset are its only members, the collection is easily closed to arbitrary union and finite intersection of its elements. This is known as the trivial or indiscrete topology, and it is somewhat uninteresting, as its name suggests, but it is important as an instance of how simple a topology may be. As per the corollary, every topology on X must contain \emptyset and X, and so will feature the trivial topology as a subcollection.

2. The Discrete Topology

For this example, one can start with an arbitrary set, but in order to better illustrate, take the set of the first three primes: \{2,3,5\}. Suppose we consider the collection of all possible subsets of \{2,3,5\}. This is also referred to as the power set of \{2,3,5\}, and denoted \wp(\{2,3,5\}). Fortunately, the set is small enough to list exhaustively1. Here they are listed from top-to-bottom in order of increasing inclusion:

  \emptyset  
\{2\} \{3\} \{5\}
\{2,3\} \{2,5\} \{3,5\}
  \{2,3,5\}  

 

Note these are all possible subsets of \{2,3,5\}. It is clear any union or intersection of the pieces in the table above exists as an entry, and so this meets criteria (1) and (2). This is a special example, known as the discrete topology. Because the discrete topology collects every existing subset, any topology on \{2,3,5\} is a subcollection of this one.

For example, taking the sets

\emptyset,\quad\{5\},\quad\{2,3\},\quad\{2,3,5\}

from the collection in the table is enough to produce a topology2.

Remark. Given a topological space (X,\mathscr{T}), the elements of \mathscr{T} are referred to as open sets. This nomenclature is motivated in the next example.

3. \mathbb{R} and Open Intervals

This example will be more constructive than the previous ones. Consider the set of real numbers, \mathbb{R}. Let us define a special collection \mathscr{T} of subsets of real numbers the following way: a set T belongs to \mathscr{T} if, and only if, for every x\in T, there exist real numbers a and b such that x\in(a,b) and (a,b)\subset T. That is, we say T\in\mathscr{T} to mean T contains an open interval around each of its elements.

It is good practice to take the time to prove this collection defines a topology on \mathbb{R}. To do so, it must be shown that \bigcup\{T\in\mathscr{T}\}=\mathbb{R}, and that \mathscr{T} meets conditions (1) and (2).

Proof. To show \bigcup\{T\in\mathscr{T}\}=\mathbb{R}, it must be verified that \bigcup\{T\in\mathscr{T}\}\subset\mathbb{R} and \mathbb{R}\subset\bigcup\{T\in\mathscr{T}\}. The first containment follows by defining every T\in\mathscr{T} as a subset of \mathbb{R} to begin with, so the reverse containment is all that is left. Let x\in\mathbb{R} be given. Then certainly x\in(x-1,x+1), and surely (x-1,x+1)\in\mathscr{T}, as it contains an open interval around all its points by its very design. Thus x\in\bigcup\{T\in\mathscr{T}\}.

On to proving \mathscr{T} satisfies (1) and (2). For (1), let A,B\in\mathscr{T} be given and suppose3 x\in A\cap B. This holds if, and only if, x\in A and x\in B. Since A and B both belong to \mathscr{T}, there exist real numbers a, b, c, and d such that x\in(a,b)\subset A, and x\in(c,d)\subset B. But this means x\in(a,b)\cap(c,d). Fortunately, two intervals of real numbers may only overlap in one way: this means either c<b or a<d. Without loss of generality, suppose it is the former case, that c<b. Then (a,b)\cap(c,d)=(c,b), and it is so that x\in(c,b), an open interval contained in A\cap B (precisely as desired), and it follows A\cap B\in\mathscr{T}.

To show (2) is much easier. Let \{T_\alpha\}_{\alpha\in\mathscr{A}} be a collection4 of sets belonging to \mathscr{T}, and suppose x\in\bigcup_{\alpha\in\mathscr{A}}T_\alpha. Then there exists an index, say \alpha_0\in\mathscr{A}, such that x\in T_{\alpha_0}. Since T_{\alpha_0}\in\mathscr{T}, there exist real numbers a and b such that x\in(a,b)\subset T_{\alpha_0}. But this means x\in(a,b)\subset\bigcup_{\alpha\in\mathscr{A}}T_\alpha. Since x was chosen arbitrarily, it follows \bigcup_{\alpha\in\mathscr{A}}T_\alpha\in\mathscr{T}.

The proof above shows (\mathbb{R},\mathscr{T}) is a topological space; the collection \mathscr{T} is referred to as the standard topology on \mathbb{R}. The open sets in this space are all the subsets of real numbers contained in open intervals. Fittingly, then, open intervals are open sets in the standard topology.

Conclusion

This first post is meant to express the simple starting points of topology as a subject of study. It only takes the two criteria mentioned here to define a topology of sets, and yet an entire realm of theory rests upon them. This is a recurring theme in topology, algebra, and mathematics in general. Building the fully-featured universes that hold the answers for more specific inquiry: the complete ordered field of real numbers5 \mathbb{R}, the space \mathcal{C}^{\infty}(\mathbb{C}) of infinitely differentiable functions f\mathrel{:}\mathbb{C}\to\mathbb{C}, the class of all real-valued Lebesgue-integrable functions on \mathbb{R}, each of these requires a well-made foundation.

The next post in this series will cover the nature of sequences in topological spaces, particularly those instances where the convenient features afforded by the real numbers are no longer available. With the metric space structure stripped away, how does one define convergence and limit of sequences? What does it mean for elements in topological spaces to be close when distance is otherwise without definition?

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All the Same Opposites

All the Same Opposites

Editor’s note: see this appendix for supporting proofs.

Fields are among the most convenient algebraic structures, preserving much of the arithmetic we know and love from familiar fields like the rationals \mathbb{Q} and the real numbers \mathbb{R}.

Now, it is unnecessary that a set possess infinitely many elements to possibly constitute a field (under the right binary operations). Indeed, Dr. Rachel Traylor recently invited readers to finite field theory by way of GF(4), the field with four elements. In this post, I propose to discuss the simplest of all possible fields, namely \text{GF}(2).

What is \text{GF}(2)?

As Dr. Traylor explains in her post on GF(4), a field is built via a nonempty set \mathbb{F} and two binary operations, typically denoted + and \cdot when no further specifying is needed.  Speaking abstractly, \text{GF}(2) is the case when \mathbb{F} is taken to be any set of two elements, say, \{a,b\}, satisfying the operation tables below1.

+ a b
a a b
b b a

 

\cdot a b
a a a
b a b

 

So, what makes \text{GF}(2) simplest?

Briefly put, there is no field with fewer elements than has \text{GF}(2). Why is this so? The operations + and \cdot each require an identity element, and these must be distinct. As a result, any field must contain at least two elements. And, as it happens, those are enough to define a fully-fledged field.

As it is the most basic of fields, it might be expected that \text{GF}(2) is only trivially interesting, or only appears in coverage of the subject on the theoretical front. (No judgment here; sometimes the simplest examples of structures are illustrative of very little. See the trivial topology or trivial algebra of sets on a set X.) However, \text{GF}(2) is the mathematical representation of many dualities we encounter on a daily basis. We will take a look at some of the prominent cases.

Even & Odd

Let’s denote by {\bf e} and {\bf o} arbitrarily chosen even and odd integers, respectively. Truly, in this way, we are defining two equivalence classes on the set of integers, \mathbb{Z}, by way of modulo 2 addition.

Reminder, we say an integer m is even to mean there exists an integer k such that m=2k, and m is odd to mean there exists an integer j such that m=2j+1. Collect these in a set: \{{\bf e},{\bf o}\}, and consider this set under ordinary addition + and multiplication \cdot .

These results are summarized in tables.  For instance, in the + table below, we can think of {\bf e}+{\bf o} as

(2k)+(2j+1)=2(k+j)+1\equiv{\bf o},

since k+j is also an integer.

+ {\bf e} {\bf o}
{\bf e} {\bf e} {\bf o}
{\bf o} {\bf o} {\bf e}

 

\cdot {\bf e} {\bf o}
{\bf e} {\bf e} {\bf e}
{\bf o} {\bf e} {\bf o}

 

From these tables, {\bf e} serves as additive identity for the field, and {\bf o} the multiplicative identity.

Binary

Even more readily encountered than even and odd is binary arithmetic. We have a series of posts here on the theory of coding, and all of it rests on \{0,1\} when taken with the operations of +_2 (addition modulo 2) and \cdot .

+_2 0 1
0 0 1
1 1 0

 

\cdot 0 1
0 0 0
1 0 1

 

The similarities to the tables for (\{{\bf e},{\bf o}\},+,\cdot) are evident.  With binary, 0 is the additive (modulo 2) identity, and 1 is the multiplicative identity.  This seems to follow naturally.  After all, 0 is an even integer, and 1 is an odd integer, so these elements are easily believed to work alike.  With that in mind, I move to an example with a less immediate connection.

Truth & Falsehood

Now I want to consider the set \{{\bf T},{\bf F}\} of truth values2 true and false, respectively. 

It is worthwhile to stop and think on which operations make sense for this set. Two are needed to construct a field. Just as even and odd integers may be combined by adding and multiplying, mathematical statements may be combined via disjunction (“or,” \vee), conjunction (“and,” \wedge), and implication (“if, then,” \Rightarrow).  For this case, I am interested in the special “exclusive or,” also called XOR3, denoted by \oplus, and conjunction.

\oplus {\bf F} {\bf T}
{\bf F} {\bf F} {\bf T}
{\bf T} {\bf T} {\bf F}

 

\wedge {\bf F} {\bf T}
{\bf F} {\bf F} {\bf F}
{\bf T} {\bf F} {\bf T}

Opposite… in Precisely the Same Way

The only thing setting these examples apart is an exchange of symbols. Truly,

a, {\bf e}, 0, and {\bf F},

are interchangeable, just as are

b, {\bf o}, 1, and {\bf T}.

What matters is that these individual elements behave in exactly the same way with respect to their operations.  In the language of algebra, it is said 

(\{a,b\},+,\cdot)

(\{{\bf e},{\bf o}\},+,\cdot)

(\{0,1\},+_2,\cdot), and 

(\{{\bf F},{\bf T}\},\oplus,\wedge)

are isomorphic, that is, structurally equivalent.  

Definition (Isomorphism of fields). We say two fields (\mathbb{F},+,\times) and (\mathbb{H},\boxplus,\boxtimes) are isomorphic to mean there exists a function \varphi\mathrel{:}\mathbb{F}\to\mathbb{H} such that \varphi is one-to-one, onto, and, for all x,y\in\mathbb{F},

\varphi(x+y)=\varphi(x)\boxplus\varphi(y);\quad\varphi(x\times y)=\varphi(x)\boxtimes\varphi(y).

In the case of \text{GF}(2), proving isomorphism amounts to simply defining the function that swaps the symbols as needed.  For example, to show (\{{\bf e},{\bf o}\},+,\cdot) and (\{0,1\},+_2,\cdot) are isomorphic, define4\varphi\mathrel{:}\{{\bf e},{\bf o}\}\to\{0,1\} by putting \varphi({\bf e})=0 and \varphi({\bf o})=1.

Concluding Remarks

Mathematics in many respects is the art5 of extension and generalization. The goal is frequently to take an existing object, assume an aerial viewpoint to learn its structure and what makes it work. (This is often carried out by seeing how few pieces of the whole are really needed to accomplish those key characteristics.)  

With the right perspective, even these sets of opposites can bear likeness.  I take comfort in that.  

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The Red-Headed Step-Distributions

The Red-Headed Step-Distributions

Almost every textbook in probability or statistics will speak of classifying distributions into two different camps: discrete (singular in some older textbooks) and continuous. Discrete distributions have either a finite or a countable sample space (also known as a set of Lebesgue measure 0), such as the Poisson or binomial distribution, or simply rolling a die. The probability of each point in the sample space is nonzero. Continuous distributions have a continuous sample space, such as the normal distribution. A distribution in either of these classes is either characterized by a probability mass function (pmf) or probability distribution function (pdf) derived from the distribution function via taking a derivative. There is, however, a third kind.

One rarely talked about, or mentioned quickly and then discarded. This class of distributions is defined on a set of Lebesgue measure 0, yet the probability of any point in the set is 0, unlike discrete distributions. The distribution function is continuous, even uniformly continuous, but not absolutely continuous, meaning it’s not a continuous distribution. The pdf doesn’t exist, but one can still find moments of the distribution (e.g. mean, variance). They are almost never encountered in practice, and the only real example I’ve been able to find thus far is based on the Cantor set. This class is the set of red-headed step-distributions– the singular continuous distributions.

Back up, what is Lebesgue measure?

Measure theory itself can get extremely complicated and abstract. The idea of measures is to give the “size” of subsets of a space. Lebesgue measure is one type of measure, and is actually something most people are familiar with: the “size” of subsets of Euclidean space in n dimensions. For example, when n=1, we live in 1D space. Intervals. The Lebesgue measure of an interval [a,b] on the real line is just the length of that interval: b-a. When we move to two dimensions, \mathbb{R}\times \mathbb{R}, the Cartesian product of 1D space with itself, our intervals combine to make rectangles. The Lebesgue measure in 2D space is area; so a rectangle built from [a,b]\times [c,d] has Lebesgue measure (b-a)(d-c). Lebesgue measure in 3D space is volume. And so forth. 

Now, points are 0-dimensional in Euclidean space. They have no size, no mass. They have Lebesgue measure 01. Intuitively, we can simply see that Lebesgue measure helps us see how much “space” something takes up in the Euclidean world, and points take up no space, and hence should have measure 0. 

In fact, any countable set of points has Lebesgue measure 0. Even an infinite but countable set. The union of disjoint Lebesgue measurable sets has a measure equal to the sum of the individual sets. Points are certainly disjoint, and they each have measure 0, and summing 0 forever still yields 0.2 So, the set \{0,1,2\} has Lebesgue measure 0. But so do the natural numbers \mathbb{N}and the rational numbers \mathbb{Q}, even though the rational numbers contain the set of natural numbers.

It is actually possible to construct an uncountable infinite set that has Lebesgue measure 0, and we will need that in constructing our example of a singular continuous distribution. For now, we’ll examine discrete and continuous distributions briefly.

Discrete (Singular) Distributions

These are the ones most probability textbooks begin with, and most of the examples that are familiar.

Roll a fair die. 

The sample space for a roll of a fair die X is S =\{1,2,3,4,5,6\}. The PMF is P(X = x) = 1/6, where x \in S. The CDF is given by the function P(X\leq x) = \sum_{j\leq x}P(X=j) 

Example:

P(X \leq 4) = \sum_{j\leq 4}\frac{1}{6} = \frac{2}{3}

Binomial Distribution

A binomial random variable X counts the number of “successes” or 1s in a binary sequence of n Bernoulli random variables. Think a sequence of coin tosses, and counting the number of heads. In this case, the sample space is infinite, but countable: S = \{0,1,2,\ldots\}. If the probability of a 1, or “success” is p, then the PMF of X is given by 

P(X=x) = {n \choose x}p^{x}(1-p)^{n-x}

Note here again that the sample space is of Lebesgue measure 0, but the probability of any point in that space is a positive number. 

Continuous Distributions

Continuous distributions operate on a continuous sample space, usually an interval or Cartesian product of intervals or even a union of intervals. Continuous distribution functions F are absolutely continuous, meaning that (in one equivalent definition), the distribution function has a derivative f=F' almost everywhere that is Lebesgue integrable, and obeys the Fundamental Theorem of Calculus:

F(b)-F(a) = \int_{a}^{b}f(x)dx

for a< b. This f is the probability distribution function (PDF), derived by differentiating the distribution function. Let’s mention some examples of these:

The Continuous Uniform Distribution

Suppose we have a continuous interval [a,b], and the probability mass is spread equally along this interval, meaning that the probability that our random variable X lies in any subinterval of size s has the same probability, regardless of location. Suppose we do not allow the random variable to take any values outside the interval. The sample space is continuous but over a finite interval. The distribution function for this X is given by 

F(x) = \left\{\begin{array}{lr}0&x< a\\\frac{x-a}{b-a}&a\leq x \leq b\\1&x > b\end{array}\right.

This is an absolutely continuous function. Then we may easily derive the PDF by differentiating F:

f(x) = \mathbb{1}_{x \in [a,b]}\frac{1}{b-a}

where \mathbb{1}_{x \in [a,b]} is the indicator function that takes value 1 if x is in the interval, and 0 otherwise. 

This distribution is the continuous version of a die roll. The die roll is the discrete uniform distribution, and here we just allow for a die with uncountably many sides with values in [a,b]. The probability of any particular point is 0, however, even though it is possible to draw a random number from this interval. To see this, note that the probability that the random variable X lies between two points in the interval, say x_{1} and x_{2} is given by multiplying the height of the PDF by the length (Lebesgue measure) of the subinterval. The Lebesgue measure of a point is 0, so even though a value for the PDF exists at that point, the probability is 0. 

We don’t run into issues here mathematically because we are on a continuous interval. 

The Normal Distribution

Likely the most famous continuous distribution, the normal distribution is given by the famous “bell curve.” In this case, the sample space is the entire real line. The probability that a normally distributed random variable X lies between any two points a and b is given by 

P(a\leq X \leq b) = \int_{a}^{b}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right)dx

where \mu is the mean and \sigma^{2} is the variance. 

Singular Continuous Distributions

We’re going to begin this section by discussing everyone’s favorite counterexample in mathematics: the Cantor set. 

The Cantor set

The Cantor set is given by the limit of the following construction:

  1. Take the interval [0,1]
  2. Remove the middle third: (1/3, 2/3), so you’re left with [0,1/3]\cup[2/3,1]
  3. Remove the middle third of each of the remaining intervals. So you remove (1/9,2/9) from [0,1/3] and (7/9,8/9) from [2/3,1], leaving you with the set [0,1/9]\cup[2/9,1/3]\cup[2/3,7/9]\cup[8/9,1]

Continue this process infinitely.

This is an example of a set that is uncountable, yet has Lebesgue measure 0. Earlier, when we discussed Lebesgue measure, we noted that all countable sets had measure 0. Thus we may conclude that only uncountable sets (like intervals) have nonzero Lebesgue measure. However, the Cantor set illustrates that not all uncountable sets have positive Lebesgue measure. To see why the Cantor set has Lebesgue measure 0, we will look at the measure of the sets that are removed (the complement of the Cantor set):

At the first step, we have removed one interval of size 1/3. At the second step, we remove two intervals of size 1/9. At the third step, we remove four intervals of size 1/27. Let’s call S_{n} the subset removed from the interval [0,1] by the nth step. By the end of the third step, we have removed a set of size

m(S_{3}) = \frac{1}{3} + \frac{2}{3^{2}} + \frac{4}{3^{3}}

By the nth step, 

m(S_{n}) = \sum_{j=0}^{n}\frac{2^{j}}{3^{j+1}}

This is the partial sum of a geometric series, so

m(S_{n}) = 1-\left(\frac{2}{3}\right)^{n}

Now, the Cantor set is formed when n \to \infty. The measure of the complement of the Cantor set, which we called S_{\infty} then has measure

m(S_{\infty}) = \lim_{n \to \infty}m(S_{n}) = \lim_{n \to \infty}1-\left(\frac{2}{3}\right)^{n} = 1

But the original interval we started with had Lebesgue measure 1, and the union of the Cantor set with its complement S_{\infty} is the interval [0,1]. That means that the measure of the Cantor set plus the measure of its complement must add to 1, which implies that the Cantor set is of measure 0. However, since we removed open intervals during the construction, there must be something left; in fact, there are uncountably many points left. 

Now we have an uncountable set of Lebesgue measure 0. We’re going to use this set to construct the only example I could find of a singular continuous distribution. It is very important that the Cantor set is an uncountable set of Lebesgue measure 0. 

Building the Cantor distribution

Update: Following a correction from an earlier version, I’m going to show how to construct this distribution directly and via the complement of the Cantor set. The latter was used in a textbook I found, and is a bit convoluted in its construction, but I’m going to leave it.

The direct construction is to look at the intervals left behind at each stage n of constructing the Cantor set. Assign a probability mass of \frac{1}{2^{n}} to each of the 2^{n} intervals left behind, and this is your distribution function. It’s basically a continuous uniform distribution, but on stages of the Cantor set construction. Sending n \to \infty yields the Cantor set, but the probability distribution moves to 0 on a set of measure 0. Thus, unlike the continuous uniform distribution, where the probability of any single point was 0, but the support has positive measure, we essentially have the continuous uniform distribution occurring on a set of measure 0, which means we have a continuous distribution function on a singular support of measure 0 that is uncountable and thus not discrete. This distribution is therefore neither continuous nor discrete. 

Another way to construct this is by complement, via Kai Lai Chung’s A Course in Probability Theory. 

(Note: after a second glance at this, I found this to be a relatively convoluted way of constructing this distribution, since it can be fairly easily constructed directly. However, I imagine the author’s purpose was to be very rigid and formal to cover all his bases, so I present a review of it here:)

Let’s go back to the construction of the Cantor set. At each step n we have removed in total 2^{n}-1 disjoint intervals. Let’s number those intervals, going from left to right as J_{n,k}, where k = 1,2,\ldots, 2^{n}-1

For example, at n=2 we have that J_{2,1} = (1/9,2/9),J_{2,2} = (1/3,2/3), and J_{2,3} = (7/9,8/9)

Now let the quantity c_{n,k} = \frac{k}{2^{n}}. This will be the probability mass assigned to interval J_{n,k}. So we define the distribution function as 

F(x) = c_{n,k}, x \in J_{n,k}

Let U_{n} = \cup_{k=1}^{2^{n}-1}J_{n,k}, and U = \lim_{n\to\infty}U_{n} The function F is indeed a distribution function and can be shown to be uniformly continuous on the set D = (-\infty,0)\cup U \cup (1,\infty). However, none of the points in D is in the support of F, so the support of F is contained in the Cantor set (and in fact is the Cantor set).  The support (the Cantor set) has measure 0, so it is singular, but the distribution function is continuous, so it cannot be a discrete distribution. This distribution fits nowhere in our previous two classes, so we must now create a third class — the singular continuous distribution.

(By the way, even though the PDF doesn’t exist, the Cantor distribution still has mean of 1/2 and a variance of 1/8, but no mode. It does have a moment generating function.)

Any other examples?

With some help, I spent some time poring through quite a few probability books to seek further study and other treatment of singular continuous distributions. Most said absolutely nothing at all, as if the class didn’t exist. 

One book, Modern Probability Theory and Its Applications has a rather grumpy approach:

There also exists another kind of continuous distribution function, called singular continuous, whose derivative vanishes at almost all points. This is a somewhat difficult notion to picture, and examples have been constructed only by means of fairly involved analytic operations. From a practical point of view, one may act as if singular continuous distribution functions do not exist, since examples of these functions are rarely, if ever, encountered in practice.

This notion also has led me to a couple papers, which I intend to review and continue presenting my findings. I happen to have a great fondness for these “edge cases” and forgotten areas of mathematics. I believe they are the most ripe for groundbreaking new material. 

 

A Note on Russell’s Paradox

A Note on Russell’s Paradox

Topics in mathematics very frequently rely on set theory (I am hard-pressed to quickly think of one that doesn’t). Set theory itself is a very abstract area of study. Even so, mathematicians built it on axioms (self-evident truths taken as fundamental starting points), and, of course, debate continues to this day as to which axioms are the best/most efficient.

As mathematicians delight in doing, one can begin with these axioms and search for boundaries. That is, ask the question, “What can I get away with while still following these rules?” And paradoxes can follow. Here set theory is no exception. But paradoxes aren’t the end of the world; on the contrary, they can be nicely illuminating. Taking just one of the first principles of axiomatic set theory, one can arrive at a substantial fact of mathematics, while in the course of uncovering a well-worn conundrum.

This post, of course, concerns Russell’s Paradox, as covered in Naïve Set Theory by Paul Halmos. I highly recommend the book to readers who enjoy the discussion to follow; it is a wonderfully readable treatment of axiomatic set theory.

A Few Definitions & Notations

Throughout, the word set refers to a general collection of objects; note these objects may themselves be sets. The symbol \in denotes belonging to a set, and will translate x\in A to mean any of the following equivalent expressions in English:

  • x is an element of A;
  • x belongs to A;
  • x is a member of A.

Similarly, \notin denotes not belonging. A set will also be noted with curly braces, e.g., \{all real numbers\}. Sometimes variables are used as well, as in 

\{x\in S\mathrel{:}x>5\},

read as the set of all x belonging to S such that x is greater than 5.

Finally, a contradiction is any mathematical statement which is always false. (Contradictions are not allowed to exist.1) An example of the ‘canonical contradiction’ insists a statement and its negation are true all at once. Take the clause ‘n>8 and n\leq8,’ for instance. These properties cannot simultaneously hold for any number n, otherwise said the statement is false for all numbers n.

Axiomatic Set Theory

As with most branches of mathematics, set theory has been axiomatized, more than once, as mentioned earlier. In order to come to the mathematical understanding I focus on in this post, we only need one of these building blocks.

Axiom (Specification). Given a set A and a property p, there is a set B whose elements are exactly those elements of A for which p is true.

This axiom lays the ground for defining sets in terms of existing ones. All partitioning, categorizing, classifying, and so forth make use of this. As an example, we can speak of the set of all positive integers, \mathbb{N}=\{1,2,3,\ldots\}, and specify a subset like the set of all even positive integers:

\{2,4,6,\ldots\}=\{n\in\mathbb{N}\mathrel{:}n\text{ is even}\}.

Put plainly, given a set, you can create a new one (referred to as a subset) meeting criteria of your choosing. And it’s the choosing that makes set theory so fun. With that, let’s make a choice, and see what we can get away with while following this rule.

Russell’s Paradox

First we suppose a set A is given; here A can be any set you like (odd positive integers, irrational numbers, even the set of your long-term goals).

Next define the property p(x) to mean x\notin x. That is, p(x) is true if, and only if, x\notin x.

Let’s stop for a moment and consider p(x). You might think it nonsensical, but it can be seen to hold even in the physical sense, e.g., a box is not contained in itself, even when it contains several smaller boxes. For a math example, the set of natural numbers is not itself a natural number, so \mathbb{N}\notin\mathbb{N}. (Perhaps less reasonable is the opposite x\in x, but that can be saved for another discussion.

With a touch of grandeur, you may now INVOKE the Axiom of Specification to create a new set

B=\{x\in A\mathrel{:}p(x)\}=\{x\in A\mathrel{:}x\notin x\}.

To clarify, this means that in order for it to be true that y\in B, then it must be the case that y\in A and y\notin y. As you might have guessed, the definition of B is less interesting than what we can do with it. On that note, a query:

Is it possible that B\in A?

This question, relatively easily posed, quickly leads to trouble. Let’s examine.

By specifying the new set B by way of the property p(x), we split membership in A. If y\in A, then either y\in B or y\notin B, by definition.

Now we apply that to B, and investigate both cases. If we suppose B\in A:

  • Case 1. If B\in B, then p(B) is untrue. By definition of p(B), it follows B\notin B, a contradiction.  
  • Case 2. If B\notin B, then p(B) is true. But this means B\in B, a contradiction again.

As all our options yield contradictions, the assumption that B\in A in the first place must have been a faulty one2. That is, it can only be the case that B\notin A.

The arrival at a contradiction under all possible cases above is known as Russell’s Paradox, attributed to its first recorded discoverer, the logician Bertrand Russell.

Conclusion (Why does this matter?)

The abstract nature of set theory makes it somewhat easy to regard Russell’s Paradox as more a minor mathematical curiosity/oddity than, say, The Fundamental Theorem of Calculus. In particular, the discussion above only makes use of nondescript sets and minimally defined elements thereof.

Unlikely as it may seem, the importance lies in leaving the set A arbitrary. Remember, we said let A be any set you like. And still we found something that cannot possibly belong to that arbitrary set. Even leaving almost nothing to specifics, we created something necessarily separate.

What this means is that no set which contains everything can exist. There is no ‘universal’ set in mathematics. A frame of reference cannot be truly complete. Indeed, that speaks even to the axiomatic system by which we derived this very fact!3 No matter how ‘large’ the set, there exists something apart from it. On a further philosophical lean, there is always room to take a grander view and think outside the present mode, to pick up something new.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Concatenation as an Operation

Concatenation as an Operation

Mathematics is like any activity, sport, or skill: it must be honed and practiced. With that in mind, I have been bolstering up my abilities in algebra with a fantastic book A Book of Abstract Algebra, by Charles C. Pinter.1.

As I go through the chapters, I will be posting and discussing selected relevant exercises that have applications. This first post is a short one on operations and concatenation. I discussed operators here, but will revisit the topic.

 

What’s an operation?

First, we define an operation. From Pinter:

 

Definition (operation): An operation on a given set A, often denoted \ast is a rule which assigns a pair of elements (a,b) in A to exactly one element denoted a\ast b in A

 

Pinter stresses several things about this definition that should not just be read and glossed over. First, a\ast b has to be defined for every single pair of elements in the set A. Some obvious examples include multiplication on integers, or addition on real numbers.

Some rules that try to sneak by, but are not operations including division on the real numbers. Why does this fail? Remember that 0 is a real number. I can divide 0 by anything I like, but can I divide anything by 0? No. A real number divided by 0 is not defined.2.

Secondly, the rule sends a pair of elements to exactly one element. Mathematically, this means a rule must be well defined to be an operation. It can’t take the same pair of elements and get two possible answers.

Lastly, a rule must be closed to be an operation. Whatever (a,b) gets mapped to must also be in the set A. Here again, addition of real numbers is closed. Adding two real numbers yields a real number. Division also becomes our counterexample here, but let’s look at division on the space of just integers. Dividing 2 by 3 is certainly defined, but the result is not an integer; it’s a rational number.3.

 

On to Concatenation

Operations are not just defined on numbers, and they don’t necessarily have to just be your standard arithmetic examples (addition, subtraction, multiplication). We can define operations on any space we want, provided we satisfy the definition. Here we will talk about concatenation as an operation on the space of sequences of symbols. If you’re a programmer, you probably assume we’re looking at binary sequences, but we concatenate English words too. (We just call these compound words, like “lifetime” or “backbone”.)

Let’s call A the alphabet. If we are living in binary, then A = \{0,1\}. If we are speaking of English letters, the alphabet is A = \{a,b,c,\ldots,z\}. Now we can call A^{\ast} the set of all sequences of symbols in the alphabet A.4.

Now we will define an operation on A^{*}: concatenation. Many of you already know what this is, so we’ll formally define it here:

If a and b are two sequences in A^{*}, where \mathbf{a} = a_{1}a_{2}\ldots a_{n}, and \mathbf{b} = b_{1}b_{2},\ldots b_{m} (and each a_{i}, b_{i} are drawn from the alphabet), then the concatenation of  and is given by

\mathbf{ab} = a_{1}a_{2}\ldots a_{n}b_{1}b_{2}\ldots b_{m}

That is, just stick b to the end of a. Quick example, if we live in binary, then for \mathbf{a} = 1010 and \mathbf{b} = 001, then

\mathbf{ab} = 1010001

Let’s also note that there is an empty sequence or NULL sequence \lambda that consists of nothing at all. It’s pretty easy to check that concatenation meets the definition of an operation. Pinter asks us to do three simple things here:

1. Prove that the operation defined above is associative.

Associativity is the property of an operation that allows us to group however we like. Addition is associative:

(1+2) + 3 = 1+ (2+3)

In general, to show associativity of an operation, we need to show that for \mathbf{a,b,c} \in A^{*},

(\mathbf{ab})\mathbf{c} = \mathbf{a}(\mathbf{bc})

First, we will define three generic sequences in our A^{*}. Let

\begin{aligned}\mathbf{a} &= a_{1}a_{2}\ldots a_{m}\\\mathbf{b} &=b_{1}b_{2}\ldots b_{n}\\\mathbf{c} &= c_{1}c_{2}\ldots c_{p}\end{aligned}

Notice here that I did not specify an alphabet, and that the length of \mathbf{a,b} and \mathbf{c} are different. We need to keep this as general as possible to prove the statement. Every restriction we place (same length, specific alphabet, etc) weakens the argument.

Now we will show associativity formally:

\begin{aligned}(\mathbf{ab})\mathbf{c}&=(a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n})c_{1}c_{2}\ldots c_{p}\\&= a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n}c_{1}c_{2}\ldots c_{p}\end{aligned}

Here we can see that we can put parentheses any way we want to, and it doesn’t affect the end word. That is,

\begin{aligned}(\mathbf{ab})\mathbf{c}&=(a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n})c_{1}c_{2}\ldots c_{p}\\&= a_{1}a_{2}\ldots a_{m}b_{1}b_{2}\ldots b_{n}c_{1}c_{2}\ldots c_{p}\\&=a_{1}a_{2}\ldots a_{m}(b_{1}b_{2}\ldots b_{n}c_{1}c_{2}\ldots c_{p})\\&=\mathbf{a}(\mathbf{bc})\end{aligned}

So we’ve concluded that the grouping in which you perform the operation on several elements doesn’t matter, and thus we have concluded that concatenation is associative.

2. Explain why the operation is not commutative.

commutative operation is an operation where the order of the elements in the operation doesn’t matter. That is, for an operation \ast, a\ast b = b\ast a. Examples of commutative operations are addition and multiplication on real numbers ( 2+5 = 5+2, for example). An example of a noncommutative operation is matrix multiplication. In general, the order in which you multiply matrices matters. As an explicit illustration,

\begin{bmatrix}1&2\\1&1\end{bmatrix}\begin{bmatrix}1& 1\\ 1 & 2\end{bmatrix} =\begin{bmatrix} 3 & 5 \\ 2 & 3\end{bmatrix}\neq \begin{bmatrix}2 & 3\\3 & 4\end{bmatrix}=\begin{bmatrix}1& 1\\ 1 & 2\end{bmatrix}\begin{bmatrix}1&2\\1&1\end{bmatrix}

We can see now why concatenation is not commutative. If you append \mathbf{b} to \mathbf{a}, you will definitely not get the same result as appending \mathbf{a} to \mathbf{b}.  “Townhome” and “hometown” are certainly two different words, but the two pieces: “home” and “town” are the same. Switching the order of concatenation gave a different result. Since I have produced a single counter example, concatenation cannot be commutative in general.5

3. Prove there is an identity element for this operation.

An identity element is an element that lives in A^{*} that, when applied via the operation in question to any other element in the set, in any order, returns that other element. Formally, \mathbf{e} \in A^{*} is an identity element for operation \ast if and only if \mathbf{ea} = \mathbf{ae} = \mathbf{a} for every single element in \mathbf{A^{*}}.

Some examples:

  • the identity element for regular addition on the real numbers is 0. 0 + x = x + 0 = x for any real number x
  • the identity element for multiplication on the real numbers is 1.

For concatenation, we seek an element that, when concatenated with any other element, in any order, returns that element. Proving existence has several strategies. The simplest one (in theory) is to find a candidate element, and show it meets the criteria.6. Here, since concatenation involves essentially “compounding” two things together, the only way we could keep an element “unchanged” is to concatenate nothing to it. The NULL element \lambda (which is certainly a word, just a NULL word. Computer scientists are very familiar with this concept) fits our bill here. Try it out. If you take nothing and concatenate a word to it, you just get that word. Conversely, concatenating nothing to a given word doesn’t change the word. So the NULL element is our identity element.

 

Conclusion

Operations and rules aren’t equivalent. A rule is anything we make up for some purpose, but an operation has to meet specific criteria to be called one. This post was meant to show that operations are not restricted to just the space of numbers that we deal with every day, but that we can look at spaces of objects, matrices, functions, words, sequences…anything we like. In addition, we can define operations, as long as we satisfy the definition. This represents the power of abstract algebra: we can take structure we thought only belongs to a very restrictive space (like numbers, or even matrices), and look at other things, like concatenation, in a different light.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The Rigor of Fuzzy Sets

The Rigor of Fuzzy Sets

Perhaps “fuzzy set theory”, “fuzzy arithmetic”, and “fuzzy rules” could have been named something a bit less mock-worthy. The word “fuzzy” has English synonyms “blurred”, “unfocused”, and the worst “ill-defined”. However, fuzzy set theory is anything but fuzzy. Maybe cuddly and fun, but certainly not ill-defined. We’ll take a look in this post at what exactly fuzzy sets are. Fuzzy set theory is an extension of classical set theory, so we’ll first take a look at the basics of a classical set.

 

Classical sets

A classical set is one we all deal with explicitly on a daily basis, even if we don’t use that terminology. An item or element is either a member of a classical set, or it is not. Examples are

  • Even numbers, denoted 2\mathbb{Z}
  • A list of members of a particular country club
  • Wavelengths that are classified as UV light
  • The set of perfect squares S = \{ k^2 : k \in \mathbb{N}\}

If we take an element in the relevant universe for each example, we are faced with the binary decision of whether or not it is a member of the set. 3 is not a member of 2\mathbb{Z}; 4 is. I am not a member of any country club, so given a particular one, I do not fall in that set. The electromagnetic spectrum is a continuum, but the cutoffs are strict. A wavelength between 10nm and 400 nm is ultraviolent. Any wavelength less than 10nm or greater than 400 nm is not. Finally, a the square of a rational number cannot be a perfect square, because a rational number is not a natural number. 3 is a natural number, but there is no natural number whose square gives 3.

For all of these sets, we can define a characteristic function of a set  A (denoted \chi_{A}(x)) that takes an input and returns a 1 if the input is an element of the set A and 0 if it is not. Mathematically, we write

\chi_{A}(x) = \left\{ \begin{array}{lr} 1, & x \in A \\ 0, &x \notin A \end{array}\right.

That is, if A = 2\mathbb{Z}, then \chi_{A}(3) = 0, and \chi_{A}(4) = 1.

Now, what if we relax the requirement that you either be in or out? Here or there? Yes or no? What if I allow “shades of grey” to use a colloquialism? Then we extend classical sets to fuzzy sets.

Partial Membership is well-defined

Many things we encounter daily aren’t strict or crisp as is used in mathematics. Someone might tell you that you can either be in or out of a room, but what if you stand with one foot on either side of the threshold? It doesn’t make sense to then say you are neither in nor out, because part of you is certainly inside the room. You can’t “pick one”, because the whole you is not wholly in either place all at once.

Fuzzy sets allow for partial membership in a set. That is, you can be “a little bit” in the room (perhaps a toe is sticking in), “halfway” in the room (straddling the threshold), “most of the way” in the room (only a foot sticking out of the room), and everything in between. We formally define this notion by extending the characteristic function \chi_{A}(x) to include the possibility of fractional membership.

The membership function of a fuzzy set A (denoted \mu_{A}(x)) takes an element x in a universe X and maps it somewhere in the interval [0,1] according to the level of membership it has in the set A. That is,

\mu_{A}(x): X \to [0,1]

 

Example (Defining a bacon lover)

Let x \in \mathbb{Q} be the number of slices of bacon a person eats per day. (We will allow for partial consumption of bacon slices.) Suppose for simplicity that no one would ever eat more than 5 slices per day. Then a possible membership function for classifying someone as a “bacon-lover” could be

\mu_{A}(x) = \left\{\begin{array}{lr}\frac{x-1}{2}, &1\leq x \leq 3 \\ 1, &3 \leq x\leq 5\\0, & \text{ otherwise}\end{array}\right.

 

A possible membership function to classify as a bacon-lover

So given the number of bacon slices you eat per day, we can use the membership function \mu_{A}(x) to determine your level of commitment to bacon. If you eat 2 slices of bacon per day, \mu_{A}(2) = 0.5, so you are a “halfway” bacon-lover.

 Level sets

We can also look at fuzzy sets in terms of their level sets, as a way to “backsolve” to determine what values of x are at least an \alpha level of membership in the set, where 0\leq \alpha\leq 1.  Denoted A_{\alpha}, we define the \alpha-level set as

A_{\alpha} = \{x \in X : \mu_{A}(x) \geq \alpha\}

Example

For the bacon set, we can use the graph above.

  • A_{1} = [3,5]
  • A_{1/2} = [2, 5]
  • A_{3/4} = [2.5,5]

In general, we can find the \alpha-level set by inverting the membership function. In this case, the right endpoint is always 5, and the left endpoint is given by 2\alpha + 1.

So, A_{1/8} = \left[ 2\cdot \frac{1}{8} + 1, 5\right] = [1.25 , 5]

Some special level sets are the A_{1}, called the core, and A_{0}, called the support. The core tells us the interval that represents full membership, and the support tells us how “wide” the entire fuzzy set actually is.

Now that we understand what fuzzy sets are, let’s look at doing some things with them.

Fuzzy set operations are identical to classical set operations

We’re going to dive a little further into fuzzy set theory and discuss operations on fuzzy sets such as union, intersection, and complementation. Since the fuzzy set is a nice extension of the classical set, the definitions of union, intersection, and complementation actually remain the same for fuzzy sets as for classical sets. The only difference is that the result is a fuzzy set rather than a classical set.1 We’ll discuss each one in turn.

 

Union

The union is represented by the phrase “or”– that is, given two sets A and B, an element is a member of the union of A and B if it is in either A or B.2

In classical set theory, the characteristic function of the union of two sets A and B is given by

\chi_{A \cup B}(x) = \text{max}\left(\chi_{A}(x),\chi_{B}(x)\right)

Let’s look at classical sets again, and let A be the set of prime numbers, and B be the set of odd numbers.

  • \chi_{A \cup B}(3) = \text{max}(A(3), B(3)) = \text{max}(1,1) = 1 so 3 is in A \cup B
  • \chi_{A \cup B}(2) = \text{max}(A(2), B(2)) = \text{max}(1,0) = 1 so 2 is in   A \cup B
  • \chi_{A \cup B}(4) = \text{max}(A(4), B(4)) = \text{max}(0,0) = 0 so 4 is not in   A \cup B

For fuzzy sets, the formula for the membership function of the union of two fuzzy sets  A \vee B3 remains exactly the same as for the characteristic function of the union of two classical sets.

\mu_{A \cup B}(x) = \text{max}\left(\mu_{A}(x),\mu_{B}(x)\right)

Let’s keep our bacon set A and define B as the fuzzy set that classifies a “cheese-aficionado” with membership function

\mu_{B}(x) = \left\{\begin{array}{lr}\frac{x}{5}, &0\leq x\leq5\\0, &\text{otherwise}\end{array}\right.
The black line is the bacon set A from before. The blue line is the new cheese set B

 

We can now find \mu_{A \vee B}(x) = \max(\mu_{A}(x),\mu_{B}(x)). For unions, the left endpoint of the support will be the smaller of the two left endpoints of the supports of A and B, and the right endpoint of the union will be the larger of the two right endpoints of the supports of A and B.  The support of “bacon-lover” union “cheese aficionado” will then be [0,5], which you can see from the graph.

The membership function is given by

\mu_{A \vee B}(x) = \left\{\begin{array}{lr}\frac{x}{5}, & 0\leq x \leq \frac{5}{3}\\\frac{x-1}{2}, & \frac{5}{3}\leq x \leq 3\\ 1, & 3\leq x \leq 5\end{array}\right.
The pink line gives the fuzzy set that is the union of bacon-lovers and cheese lovers

 

Intersection

The intersection of two sets denoted A \cap B is represented in language by the word “and”– an element must be a member of both sets simultaneously to be in the intersection of two sets. For the classical set example, we’ll look again at A as the set of prime numbers, and B as the set of odd numbers. The membership function of the intersection of two classical sets, denoted \chi_{A \cap B} is given by

\chi_{A \cap B} = \min\left(\chi_{A}(x), \chi_{B}(x)\right)
  • \chi_{A \cap B}(3) = \min(A(3), B(3)) = \min(1,1) = 1 so 3 is in A \cap B
  • \chi_{A \cap B}(2) = \min(A(2), B(2)) = \min(1,0) = 0 so 2 is not in   A \cap B
  • \chi_{A \cap B}(4) = \min(A(4), B(4)) = \min(0,0) = 0 so 4 is not in   A \cap B

Just like for the union, the membership function of the intersection of two fuzzy sets (denoted A \wedge B) has the same formula as that for the classical counterpart.

\mu_{A \wedge B} = \min\left(\mu_{A}(x), \mu_{B}(x)\right)

The intersection of our fuzzy bacon set and our fuzzy cheese set then is given by

\mu_{A \wedge B} = \left\{\begin{array}{lr}0, & 0\leq x \leq 1\\\frac{x-1}{2}, & 1\leq x \leq \frac{5}{3}\\\frac{x}{5}, & \frac{5}{3}\leq x \leq 5\end{array}\right.
The purple line gives the fuzzy set that is the intersection of the bacon set and the cheese set.

 

Complement

The complement of a set (denoted A^{c} \text{ or } \overline{A}) is represented by the English word “not”, and indicates negation. Everything outside a given set is the complement of a set. The membership function for the complement of a fuzzy set is identical to the characteristic function of the complement of a classical set:

\mu_{A^{c}}(x) = 1-A(x)

The complement of a “bacon-lover”, (a bacon-hater?) is then

\mu_{A^{c}}(x) = \left\{\begin{array}{lr}\frac{3-x}{2}, & 1\leq x \leq 3 \\ 0, & 3\leq x \leq 5\end{array}\right.
The blue line is the complement of the bacon set

 

Conclusion

Fuzzy sets are simply an extension, or generalization of the classical sets we all implicitly knew already. A good definition, especially one that intends to extend a concept should contain the original concept as a special case. Fuzzy sets do indeed contain classical sets inside their definition. Moreover, operations on fuzzy sets should not need wholly new definitions. We saw that union, intersection, and complementation still hold for fuzzy sets just as they did before, so the theory of fuzzy sets is well defined and represents a consistent extension of classical set theory. The next post will extend the concept of fuzzy sets to fuzzy numbers, and then investigate how arithmetic works for fuzzy numbers.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.