## A Note on Russell’s Paradox

Topics in mathematics very frequently rely on set theory (I am hard-pressed to quickly think of one that doesn’t). Set theory itself is a very abstract area of study. Even so, mathematicians built it on axioms (self-evident truths taken as fundamental starting points), and, of course, debate continues to this day as to which axioms are the best/most efficient.

As mathematicians delight in doing, one can begin with these axioms and search for boundaries. That is, ask the question, “What can I get away with while still following these rules?” And paradoxes can follow. Here set theory is no exception. But paradoxes aren’t the end of the world; on the contrary, they can be nicely illuminating. Taking just one of the first principles of axiomatic set theory, one can arrive at a substantial fact of mathematics, while in the course of uncovering a well-worn conundrum.

This post, of course, concerns Russell’s Paradox, as covered in Naïve Set Theory by Paul Halmos. I highly recommend the book to readers who enjoy the discussion to follow; it is a wonderfully readable treatment of axiomatic set theory.

## A Few Definitions & Notations

Throughout, the word set refers to a general collection of objects; note these objects may themselves be sets. The symbol $\in$ denotes belonging to a set, and will translate $x\in A$ to mean any of the following equivalent expressions in English:

• $x$ is an element of $A$;
• $x$ belongs to $A$;
• $x$ is a member of $A$.

Similarly, $\notin$ denotes not belonging. A set will also be noted with curly braces, e.g., $\{$all real numbers$\}$. Sometimes variables are used as well, as in

$$\{x\in S\mathrel{:}x>5\},$$

read as the set of all $x$ belonging to $S$ such that $x$ is greater than 5.

Finally, a contradiction is any mathematical statement which is always false. (Contradictions are not allowed to exist.1) An example of the ‘canonical contradiction’ insists a statement and its negation are true all at once. Take the clause ‘$n>8$ and $n\leq8$,’ for instance. These properties cannot simultaneously hold for any number $n$, otherwise said the statement is false for all numbers $n$.

## Axiomatic Set Theory

As with most branches of mathematics, set theory has been axiomatized, more than once, as mentioned earlier. In order to come to the mathematical understanding I focus on in this post, we only need one of these building blocks.

Axiom (Specification). Given a set $A$ and a property $p$, there is a set $B$ whose elements are exactly those elements of $A$ for which $p$ is true.

This axiom lays the ground for defining sets in terms of existing ones. All partitioning, categorizing, classifying, and so forth make use of this. As an example, we can speak of the set of all positive integers, $\mathbb{N}=\{1,2,3,\ldots\}$, and specify a subset like the set of all even positive integers:

$$\{2,4,6,\ldots\}=\{n\in\mathbb{N}\mathrel{:}n\text{ is even}\}.$$

Put plainly, given a set, you can create a new one (referred to as a subset) meeting criteria of your choosing. And it’s the choosing that makes set theory so fun. With that, let’s make a choice, and see what we can get away with while following this rule.

First we suppose a set $A$ is given; here $A$ can be any set you like (odd positive integers, irrational numbers, even the set of your long-term goals).

Next define the property $p(x)$ to mean $x\notin x$. That is, $p(x)$ is true if, and only if, $x\notin x$.

Let’s stop for a moment and consider $p(x)$. You might think it nonsensical, but it can be seen to hold even in the physical sense, e.g., a box is not contained in itself, even when it contains several smaller boxes. For a math example, the set of natural numbers is not itself a natural number, so $\mathbb{N}\notin\mathbb{N}$. (Perhaps less reasonable is the opposite $x\in x$, but that can be saved for another discussion.

With a touch of grandeur, you may now INVOKE the Axiom of Specification to create a new set

$$B=\{x\in A\mathrel{:}p(x)\}=\{x\in A\mathrel{:}x\notin x\}.$$

To clarify, this means that in order for it to be true that $y\in B$, then it must be the case that $y\in A$ and $y\notin y$. As you might have guessed, the definition of $B$ is less interesting than what we can do with it. On that note, a query:

Is it possible that $B\in A$?

This question, relatively easily posed, quickly leads to trouble. Let’s examine.

By specifying the new set $B$ by way of the property $p(x)$, we split membership in $A$. If $y\in A$, then either $y\in B$ or $y\notin B$, by definition.

Now we apply that to $B$, and investigate both cases. If we suppose $B\in A$:

• Case 1. If $B\in B$, then $p(B)$ is untrue. By definition of $p(B)$, it follows $B\notin B$, a contradiction.
• Case 2. If $B\notin B$, then $p(B)$ is true. But this means $B\in B$, a contradiction again.

As all our options yield contradictions, the assumption that $B\in A$ in the first place must have been a faulty one2. That is, it can only be the case that $B\notin A$.

The arrival at a contradiction under all possible cases above is known as Russell’s Paradox, attributed to its first recorded discoverer, the logician Bertrand Russell.

## Conclusion (Why does this matter?)

The abstract nature of set theory makes it somewhat easy to regard Russell’s Paradox as more a minor mathematical curiosity/oddity than, say, The Fundamental Theorem of Calculus. In particular, the discussion above only makes use of nondescript sets and minimally defined elements thereof.

Unlikely as it may seem, the importance lies in leaving the set $A$ arbitrary. Remember, we said let $A$ be any set you like. And still we found something that cannot possibly belong to that arbitrary set. Even leaving almost nothing to specifics, we created something necessarily separate.

What this means is that no set which contains everything can exist. There is no ‘universal’ set in mathematics. A frame of reference cannot be truly complete. Indeed, that speaks even to the axiomatic system by which we derived this very fact!3 No matter how ‘large’ the set, there exists something apart from it. On a further philosophical lean, there is always room to take a grander view and think outside the present mode, to pick up something new.

Welcome to GF(4)

## Welcome to GF(4)

Everyone has solved some version of a linear system in either high school or college mathematics. If you’ve been keeping up with some of my other posts on algebra, you know that I’m about to either take something familiar away, or twist it into a different form. This time is no different; we’re going to change the field we operate over, and solve a basic linear system in a Galois field called GF(4).

On Permuted First-Kind Dependence of Categorical Random Variables

## On Permuted First-Kind Dependence of Categorical Random Variables

This paper discusses the notion of horizontal dependency in sequences of first-kind dependent categorical random variables. We examine the necessary and sufficient conditions for a sequence of first-kind dependent categorical random variables to be identically distributed when the conditional probability distribution of subsequent variables after the first are permuted from the identity permutation used in previous works.

A Partition by any Other Name

## A Partition by any Other Name

I promise I’m actually a probability theorist, despite many of my posts being algebraic in nature. Algebra, as we’ve seen in several other posts, elegantly generalizes many things in basic arithmetic, leading to highly lucrative applications in coding theory and data protection.  Some definitions in mathematics may not have obvious “practical use”, but turn out to yield theorems and results so powerful we can use them to send image data cleanly from space.

Time Series Analysis Part 1: Regression with a Twist

## Time Series Analysis Part 1: Regression with a Twist

We’re surrounded by time series. It’s one of the more common plots we see in day-to-day life. Finance and economics are full of them – stock prices, GDP over time, and 401K value over time to name a few. The plot looks deceptively simple; just a nice univariate squiggle. No crazy vectors, no surfaces, just one predictor – time. It turns out time is a tricky and fickle explanatory variable, which makes analysis of time series a bit more nuanced than first glance. This nuance is obscured by the ease of automatic implementation of time series modeling in languages like R1 As nice as this is for practitioners, the mathematics behind this analysis is lost. Ignoring the mathematics can lead to improper use of these tools. This series will examine some of the mathematics behind stationarity and what is known as ARIMA (Auto-Regressive Integrated Moving Average) modeling. Part 1 will examine the very basics, showing that time series modeling is really just regression with a twist.

Commentary: Infrastructure Considerations for Machine Learning

## Commentary: Infrastructure Considerations for Machine Learning

Welcome to another brief commentary and departure from the heavier mathematics. I have been endeavoring to expand the breadth of my knowledge on the tech side of things, and chronicling some things I’ve learned and observed from speaking with different companies, both as an independent and as a Tech Field Day delegate. Many of these articles have focused on considerations for a practitioner rather than a mathematician, but occasionally we theorists have to show some business value, so I try to keep current on the tools and methods utilized in the corporate world.

It’s fairly common knowledge now that most machine learning and deep learning algorithms are highly data-dependent. That is, the data you feed something like a neural network heavily affects the results. Since the common analogy for machine learning, artificial intelligence, and neural networks is one of a biological learning process, let me continue that analogy. These algorithms are like small children; they’re sponges. They learn based on the type and amount of data given, and in surprising ways. If you want to teach a child (or baby Terminator, perhaps one named John Henry) what a cow looks like, you must be very careful what you give him. If you only give him pictures of birds and cows, he may decide that a cow is identified by the number of legs he has. Then what happens when he is given a picture of a cat?

Perhaps you think of this and throw in pictures of dogs too. Aha! So a cow has four legs and hoofed feet! Until John Henry sees a zebra. This silly example illustrates just how long we took to learn even simple things as children, and how important large amounts of repetitive and varied data were to us converging on how to recognize a cow. These AI/ML/NN algorithms are designed to mimic this learning process, and thus require vast amounts of highly varied data. Good performance by an algorithm on a subset of the data may not hold up to the expanse of the real world data, just like the example of learning to recognize a cow. Thus, these algorithms are not ergodic, to borrow a term from dynamics and probability. The models and methods are not independent of the initial data you feed them. In other words, if two different people feed the same algorithm different datasets and let the algorithm “learn”, the end results can be vastly different.

To get around this, most practitioners of data science want to throw as much data as possible, ideally the entirety of everything. If you’re wanting to learn the shopping habits on an e-commerce site, you’d prefer to let your John Henry learn on the whole database rather than just a subset.1

However, your IT department would likely be unhappy with your request to run tests on a production environment for a multitude of reasons: security and performance being two of those. Having a bunch of copies floating around takes up massive amounts of storage, not to mention the security risks. A mistake in the code run against the production environment can take the whole e-store down due to a bad query.2 I spoke twice with Actifio about their Sky Infrastructure, first hearing from them at Tech Field Day 15, then interviewing them again to get some more details about use cases rather than an overview of the infrastructure itself.

As a quick overview (Mr. Achilles does a great job on the tech details in this video here), Actifio basically creates what they term a “golden copy” of your data, after which updates are done incrementally to save storage space, and everyone can get their own virtual copy (which are really more like pointers) to interact with. Now a data scientist can’t affect the production database when he/she queries against it, and can also use far more data in testing than before. This should shorten a data science development cycle, because the workaround for using subsets of data to train is to sample many subsets and train the algorithm over and over again, which takes time. In addition, the data scientist can find out very quickly if the code that worked for 100,000 rows will hold up against 10 million (guilty as charged of writing unscalable code in my past experience as a data scientist).

Being more of a theoretician, I don’t tend to step out of my bubble to consider the various infrastructures that are necessary to provide me my sandbox. To fix that, I endeavor to occasionally speak with various tech companies about their work. I like the way Actifio has streamlined a good solution that aims to satisfy the IT gatekeepers and the developers/data scientists/users of the data. Overall, I’m not exactly a fan of the semi-blind deep-learning approaches to make all business decisions, but those methods do have their uses, particularly in exploration and discovery. This platform definitely has a good potential to help a data science team in their development.

[Disclaimer: I have never been compensated by Actifio or any other company for my commentary articles.]