Browsed byCategory: Commentary

On the Essential Nature of Foundations

On the Essential Nature of Foundations

We get asked frequently a valid question: why fund our research? Why fund mathematics research, when I can’t see what the finished product will be, and you can’t give me a guaranteed code library next quarter?

I’m going to use an analogy of building and stray from my usual temptation to use math analogies. Every house built (well, every house that will last more than 5 years) sits on a foundation. That foundation is unseen, unsexy, and isn’t part of the curb appeal a realtor uses to sell the house. However, no amount of curb appeal can overcome a crumbling or poorly built foundation, and one good natural disaster ends the house.

If one is going to build a house, one should ensure the foundation is built well and funded appropriately, otherwise any work built upon it doesn’t stand a chance in the long run. When we’re asked the “value prop” of our research, much of the questions come from a misconception of research funding. Funding research isn’t like funding a quick consulting project, or an app for the App Store. What is really being asked is for the value proposition of the foundation for a house. You’re not funding flashy curb appeal and a new stone facade for a house, you’re funding the very foundation the house will stand on.  A good foundation can support many stories, up to the highest skyscrapers.

The next question one would ask when building a house (or funding research) is: “When will the house be completed?”

I’ll illustrate with some examples. We have a project that proposes a different method to detect anomalies in time series based on using fuzzy numbers. (Our new “project pages” feature is coming soon, and we’ll update here with links.) In this case, enough infrastructure has been laid (like electrical lines and water and sewer pipes), that we can not only immediately begin pouring a foundation, but we can also clearly see how the house will look at completion. This is like building in a city on an empty lot. In this case, I can give a reasonable completion date for certain stages of the project of around 1-2 years, with the framework in 6 months or less. Of course, just as in building a house, some small things may come up to adjust the exact timeframe, but the path to completion is fairly clear.

Now, suppose you’re a bit bolder. You desire to fund something less certain and less concrete, so to speak, such as pure research in, say, dependency theory . This would be analogous to wanting to purchase land in rural Wyoming and building a house there. Because no one already resides there, you can purchase quite a bit of land (a broad research direction) reasonably inexpensively. However, in this location, there are no sewer lines, no electrical wires, no city-style infrastructure to work with. The land needs to be surveyed and understood before selecting a building site, and wells need to be drilled, a septic tank installed, and electrical lines extended, in addition to the foundation of the house itself. There’s a reason most lack the courage to build here. This is a multi-year commitment, but fortune favors the bold.

One thing this type of building allows for is surprises. Perhaps during the survey of the land, you find valuable mineral deposits that will net you an unexpectedly large profit when you sell or lease the mineral rights to a mining company, thus recuperating your initial land purchase investment almost immediately and providing decades of passive income. You decide to develop that in addition to building the house, which delays the house building slightly.

It’s also possible to find an underground river close to the surface on the site you originally intended to build, thus forcing you to relocate the site of the house 1000 feet north. Those are the less-desirable surprises, but now you can drill a well in that original spot. All of this is part of the exploratory nature of mathematical and fundamental research so many tend to scoff at as “wandering” and “impractical.”

One cannot employ the same “laser focus” building techniques in an unknown area as one would on a city lot. While the prospecting may not be obviously applicable to building the house, it’s essential in remote areas. Finding pitfalls and potential issues for a building site can save hundreds of thousands of dollars later. Finding mineral deposits can bring enormous unexpected wealth.

Funding pure math research is purchasing land in rural Wyoming. It takes patience and courage, but can bring incalculable wealth and a set of foundations sturdy enough to build an entire city. Funding applied mathematics is building the foundation of a house on a city lot. The Math Citadel works in both areas, and also commits to helping you build the framework as well, which is the second most important feature of a good house.

Fancy windows, pretty shingles, nice facades, and landscaping are all features of a house that makes it beautiful and attractive to passersby, and increases the value of your home. But all those are meaningless if the foundation is cracked and the framework rotten. We encourage you directly to consider the real value-proposition of mathematics research. Our business is in building foundations that are immortal. No matter how many times you update the landscaping or other surface features to stay current and modern, the foundation will still stand.

We offer pricing far below traditional funding methods, with greater flexibility and agility. Our foundations don’t come with heavy administrative overhead, and our passion guides us to leave no stone unturned while we prospect your site for any possible additional value.

Please feel free to contact us to set up a discussion on research projects that are of interest to your business. We are happy to discuss our current directions, and we’re also happy to forge a path in an entirely new direction as well.

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

As promised in the previous article, I plan to review Reference 2, Active Queue Management with Non-Linear Packet Dropping Function, by D. Augustyn, A. Domanski, and J. Domanska, published in HET-NETs 2010, which discusses a change in the structure of the packet drop probability function using the average queue length in a buffer. I mentioned previously that choosing a linear function of the average queue length can be viewed as a bit of an arbitrary choice, since we’re designing a control mechanism here, and this paper attempts to define a new form of this packet drop probability function.

In summary, the best lesson one can take from this paper is that publication in a journal or conference proceedings does not guarantee that the paper withstands scrutiny. The paper is linked above for the interested reader to peruse himself, and to investigate the claims.

Summary

The paper intended to give a new function to calculate the probability of proactively dropping a packet in a queue in order to prevent a full buffer. It seemed to be presented as an alternative to RED, described in my previous article. The authors define this new function, then set up a simulation in order to examine the effects.

When Orthogonal is Abused

The authors describe using a finite linear combination of orthogonal basis polynomials defined on a finite interval as the underlying mathematical structure.

First, we should discuss what we mean by orthogonal in context of functions. Orthogonal is most commonly understood in terms of vectors, and when we’re in two dimensions, orthogonal becomes our familiar perpendicular.

Orthogonality

Beginning with the familiar notion of perpendicular, we can generalize this to understand orthogonality. The geometric interpretation of two vectors  being perpendicular is that the angle between them is $90^{\circ}$. Once we leave two and three dimensions (or jump to the space of polynomials, as we’ll do soon), the concept of an angle isn’t as helpful.

Another way to define perpindicular is through an operation known as the dot productSuppose we take two 2D vectors, $\mathbf{x}$ and $\mathbf{y}$. Each vector will have coordinates: $\mathbf{x} = (x_{1},x_{2})$ and $\mathbf{y} = (y_{1}, y_{2})$. The dot product is a special type of multiplication defined on vectors, and denoted $\cdot$:

$$\mathbf{x}\cdot\mathbf{y} = x_{1}y_{1} + x_{2}y_{2}$$

The dot product can be described in words as the sum of the component-wise multiplication of the coordinates of the two vectors.

Now, we can say that two vectors are perpendicular if their dot product is 0. That is, $\mathbf{x}$ and $\mathbf{y}$ are perpindicular if $\mathbf{x}\cdot\mathbf{y} = 0$. (This article gives a nice overview of how we move from the algebraic to the geometric definition of perpendicular and orthogonal.)

Remember, perpendicular doesn’t make sense once we get into higher dimensions. Orthogonal is a more general notion of vectors being perpendicular, and is defined for two vectors (of any length) as their dot product equalling zero.

From Dot Product to Inner Product

The dot product is used on vectors, and defines another type of product that is different from the scalar multiplication we know. In fact, we can generalize the notion of a dot product to something called an inner product, which can be defined on many different spaces than just vectors. We can define operations and products however we like, but for our definition to qualify as an inner product (denoted $\langle \cdot, \cdot\rangle$), it must meet certain criteria

For instance, on the set of real valued functions with domain $[a,b]$, we define the inner product of two functions $f(x)$ and $g(x)$ as

$$\langle f, g\rangle := \int_{a}^{b}f(x)g(x)dx$$

The concept of orthogonality generalizes to an inner product as well. If the inner product of two functions is zero (as defined above), we say the functions are orthogonal.

Back to the paper

The authors claim to be using a set of orthogonal polynomials to define their drop probability function, and they give the structure of such functions. For a domain $[a,b]$, and for $\phi_{j}$ in the set of polynomials, they define $\phi_{j} = (x-a)^{j-1}(b-x)$. So, for example, $\phi_{1} = (b-x)$, and $\phi_{5} = (x-a)^{4}(b-x)$

Now, in order to be an orthogonal basis for a space1, the set of functions that are claimed to form the basis of the set must be pairwise orthogonal. That is, I need to be able to take the inner product of any two functions $\phi_{i}$ and $\phi_{j}$ and get 0. If that isn’t true for even one pair, then the set is not orthogonal.

As it turns out, if we take the inner product of any two functions in the basis set over the domain given, we find that there are no pairs that are orthogonal. To do this in general, we compute the integral

$$\int_{a}^{b}(x-a)^{i-1}(b-x)\cdot (x-a)^{j-1}(b-x)dx$$

The integral computation is one of simple polynomial integration, and can be done either by hand or using your favorite software (Mathematica) of choice. What we find here is that this set of functions defined in general this way is never orthogonal, yet the paper claims they are.

Applying to the particular situation of designing a drop probability function, they give the following for average queue length thresholds $T_{\min}$ and $T_{\max}$ $$p(x,a_{1},a_{2}) = \left\{\begin{array}{lr}0, &x < T_{\min}\\\phi_{0} + a_{1}\phi_{1}(x) + a_{2}\phi_{2}(x),&T_{\min}\leq x \leq T_{\max}\\1,&x > T_{\max}\end{array}\right.$$

where the basis functions are

\begin{aligned}\phi_{0}(x) &= p_{m}\frac{x-T_{\min}}{T_{\max}-T_{\min}}\\\phi_{1}(x) &= (x-T_{\min})(T_{\max}-x)\\\phi_{2}(x) &= (x-T_{\min})^{2}(T_{\max}-x)\end{aligned}

The reader will recognize $\phi_{0}$ as the original drop probability function from the RED algorithm. These functions are absolutely not orthogonal though (as the authors claim), and a simple check as we did above will verify it.

Other issues

Another issue is that these mysterious coefficients $a_{1}$ and $a_{2}$ need to be determined. How, you ask? The authors do not say, other than to note that one can define “a functional” implicit on the unknown $p(x,a_{1},a_{2})$ that can be minimized to find the optimal values for those coefficients. They write that this mysterious functional2 can be based on either the average queue length or average waiting time, yet provide no details whatsoever as to the functional they have chosen for this purpose. They provide a figure with a sample function, but give no further details as to how it was obtained.

One other issue I have in their methodology is discussing the order of estimation. For those familiar with all sorts of ways to estimate unknown functions, from Taylor series, to splines, to Fourier series, we know that a function is exactly equal to an infinite sum of such functions. Any finite sum is an estimation, and the number of terms we use for estimation with a desired accuracy may change with the function being estimated.

For instance, if I want to use Taylor series (a linear combination of polynomials) to estimate a really icky function, how many terms should I include to ensure my accuracy at a point is within 0.001 of the real value? It depends on the function.3.

The authors simply claim that a second order term is appropriate in this case. The issue I take with that is these queueing management drop probability functions are designed. We’re not estimating a function describing a phenomenon, we are designing a control policy to seek some sort of optimal behavior. Fundamentally, the authors posing this as an estimation problem of some unknown drop probability function is incorrect. This isn’t a law of nature we seek to observe; it’s a control policy we seek to design and implement and optimize. Using language that implies the authors are estimating some “true function” is misleading.

Regarding the simulation itself, since the design was arbitrary, and not based on sound mathematical principles, I cannot give any real comment to the simulations and the results. The authors briefly discuss and cite some papers that explore the behavior of network traffic, and claim to take this into account in their simulations. To those, I cannot comment yet.

Conclusion

Always verify a paper for yourself, and don’t accept anything at face value. Research and technical publications should be completely transparent as to choices and methodologies (and obviously free of mathematical inaccuracies), and derivations and proofs should be present when necessary, even if in an appendix.

Commentary: White Papers Don’t Impress Me Much

Commentary: White Papers Don’t Impress Me Much

I spent the last week at an event called Tech Field Day (my second time). In a nutshell, it’s a traveling panel of 12-15 delegates who are generally IT professionals (and me) that visits 8-10 companies over three days to hear various presentations on their technology. Sometimes it’s storage tech, sometimes networking, or cloud, or a mixture of all sorts of things. The common thread, in theory, is that these presentations are supposed to be “deep dives”, to use an industry buzzword. The delegates around the table are all highly proficient in their fields, and are expected to ask questions to drill into claims made and get more details about various IT architectures presented. In my case, I am obviously interested in uncovering the interesting mathematics behind various enterprise technologies. From erasure coding to graph theory to the statistics underneath the vague “analytics” every company claims to do, my interest lies in discussing how they’re employing mathematics to make their tech better or drive business decisions.

Typically, most companies release white papers that claim to detail their architecture (or math, as one claimed). In reality, and with rare exception (Datrium actually comes to mind here), they’re little more than five to seven pages of marketing-style technical claims with no citations or justification. As an overview, I understand keeping the lengths shorter, but references to more detailed publications and reports should be available when making certain claims. Therefore, as part of the Tech Field Day panel, I felt a responsibility to press the presenters on some of these claims, earnestly hoping for more details. My thought was that they were putting out a “teaser”, so to speak, and just waiting excitedly for someone interested to ask about technology they built and are proud of.1 For the most part, my initial thought was wrong. From dismissing my questions to hiding behind the curtain of “secret sauces” and “proprietary” code, I was left disappointed for the most part.

My frustration can be traced to the very Silicon Valley style idea that flashy marketing must pervade everything, which blurs opinion and fact. White papers which should contain technical details and references become little more than press releases disguised as objective reports. I debated how to really articulate my opinion, and decided to do something a bit out of character for my typical article. With apologies to Shania Twain, I present my version of the song “That Don’t Impress Me Much”:

That Don’t Impress Me Much (Tech Edition)

I’ve noticed in tech they think they’re pretty smart
They’ve clearly got their marketing down to an art.
The white papers are “genius”; it drives me up a wall
There’s nothing original, not at all

Oh-oo-oh, you think you’re special
Oh-oo-oh you think you’re something else

Okay, so the erasure coding’s novel
That don’t impress me much
So you made the claim, but have you got the proof?
Don’t get me wrong, yea, I think you’re all right
But that won’t give me inspiration in the night
That don’t impress me much.

Every white paper says they’re the best on the market
“Independently verified”—just in case
Writing uncited claims, publishing as fact (I want to vomit)
Cause we all know tech’s really a private arms race

Oh-oo-oh, you think you’re special
Oh-oo-oh you think you’re something else

Okay, so it’s “secret sauce”
That don’t impress me much
So you got some code, but have you got some proof?
Don’t get me wrong, yea, I think you’re all right
But that won’t give me inspiration in the night
That don’t impress me much.

So you’re one of those firms using learning machines
But you’ve no earthly clue what’s going on underneath
I can’t believe you think that it’s all right
Come on baby tell me, you must be joking right?

Oh-oo-oh, you think you’re special
Oh-oo-oh you think you’re something else

Okay, so you’ve got analytics
That don’t impress me much
So you can “predict” but have you got some proof?
Don’t get me wrong, yea, I think you’re all right
But that won’t give me inspiration in the night
That don’t impress me much.

Carnival of Mathematics 154

Carnival of Mathematics 154

The Math Citadel is happy to host Carnival of Mathematics edition $154$, where we take a look at math blog posts from all around. For all those interested in Carnival of Mathematics future and past, visit The Aperiodical.

To get things started, we present a few interesting properties of $154$, in keeping with Carnival of Mathematics custom.  Firstly, $154$ is a palindrome when written in bases $6$, $8$, $9$, and $13$:

$$154=414_6=282_8=181_9=\text{BB}_{13}.$$

The nearest primes to 154 are 151 and 157.  Since 154 is equidistant from these, that is,

$$|154-151|=3=|154-157|,$$

154 is referred to as an interprime number

Finally, a triangle rotated internally 154 times1

Now on to this month’s post selection.

First off, Anthony Bonato discusses the hierarchy of infinities and formative study in axiomatic set theory that brought about The Continuum Hypothesis, in a post by the same name at The Intrepid Mathematician. Included is a brief account of mathematicians Gödel and Cohen and their respective constructions of set theory to uphold or prove false the hypothesis.

Next, Arvind Rao examines the case of defining a circle by three non-colinear points in the plane in 3 Points Make a Circle. The evidence in the affirmative that any such three points do indeed define a circle is presented via straight edge and compass as well as programmatically2.

In abstract algebra, we include a post of our own, All the Same Opposites, which demonstrates some of the many forms taken by $\text{GF}(2)$ and the significance of their structural equivalence.

At Quanta Magazine, Patrick Honner presents the mathematical reasoning behind vaccines in How Math (and Vaccines) Keep You Safe From the Flu, comparing linear and exponential growth rates as basis for understanding how illnesses spread. And Kevin Hartnett discusses the Navier-Stokes Equations and why these merit a spot on the list of Millennium Problems in mathematics in What Makes the Hardest Equations in Physics So Difficult?

We close with probability theory. Our own Dr. Rachel Traylor writes about the rare (and historically neglected) “singular continuous distributions” in The Red-Headed Step-Distributions. Lastly, an older post: Will Kurt at Count Bayesie gives a detailed look into Kullback-Leibler divergence and its information theoretic roots in Kullback-Leibler Divergence Explained.

(Commentary) Spectre and Meltdown: Spokes on a Wheel

(Commentary) Spectre and Meltdown: Spokes on a Wheel

There has been a flurry of articles and discussions related to Intel’s Spectre and Meltdown vulnerabilities. Many good writings discuss the technical nature and implications to hardware, and you can find a selection here, here, and here. As of this writing, many software developers and security experts are frantically trying to create patches to protect their infrastructure and customers from those who would exploit a 20 year-old design flaw, and we obviously wish them the best of luck.

The severity of the issue is in a large part due to a design flaw that dates back to 1995, when Bill Clinton was president, the DVD was first announced, eBay was founded, Braveheart won the Best Picture Academy Award, and Alanis Morissette’s Jagged Little Pill was released. That means that hundreds of thousands of programs, apps, and products were built on top of a fundamental design flaw that went unnoticed longer than some of our siblings have been alive. What happened?

Complexity happened. Not complexity in the sense of the human body, a finely tuned machine. Complexity born of rushed thinking, continuous development cycles, and a mentality we discouraged in our students when we were still academics — the notion that  just turning in an assignment, even if it was not well-done, was acceptable. We have been scolded in other jobs that “perfect is the enemy of good”, to “fail fast and fail often” and to “just get something delivered.” We at The Math Citadel fundamentally disagree with and reject all of these strategies and attitudes. Rushed thinking and a desperation to be seen as “first done” with the most hype has led to complexity born of brute force solutions, with patches to fix holes discovered after release. When those patches inevitably break something else, more patches are applied to fix the first patches.

“And on and on it spins, crushing those on the ground”, in the words of Daenerys Targaryen.

Lest it be thought that we are picking only on Intel, or that this is an isolated issue, let us explore other manifestations.

• In November 2017, a developer found a security vulnerability in Apple’s High Sierra operating system that enables access to the root superuser account with a blank password on any Mac (local or remote) running OS 10.13.1. Apple quickly released a patch meant to fix it, but another update ended up reintroducing the “root bug.”
• When iOS 11.1 was released, autocorrect would change the letter “I” to “A” with a question mark in a box.

The gaming industry has had its share of problems from rushing releases that weren’t complete. (One might almost be forgiven for assuming it’s a business strategy.)

• No Man’s Sky was released to much hype, but the first release had very few of the promised features, generating a huge player backlash. The company released further features as DLC and patches, but the damage was done.
•  Call of Duty: WWII had server issues at launch that took the game offline, random disconnects from matches, and some reports of gamer rankings reset. After two patches, users reported improvements but no real fixes.
• Batman: Arkham Night released a version for the PC, and it became a disaster. Players had to turn off textures and move graphics qualities to “low” to even make the game playable, regardless of how nice their graphics card was.

The machine learning/“artificial intelligence” space has quite a few examples, and these range from amusing to sinister.

• Algorithmic pricing without a sanity check leads to a $23 million book price on Amazon • Automatic t-shirt slogan generator causes a business to fold after the owner’s algorithm generates a t-shirt saying “Keep calm and rape on.” • Automated re-pricing software RePricer Express erroneously changes the prices of thousands of items on Amazon to a penny. Compounding the problem is the automatic order fulfillment from Amazon, making it impossible to retract the order. One small business owner cites a$150,000 loss.
• Accusations of price-gouging on flights out of Florida prior to Hurricane Irma are more likely due to the automatic pricing algorithms than active price gouging. Nonetheless, it was a PR nightmare.

We can list many more examples, enough to provide clear evidence of a pattern. There have already been those calling for a re-examination of machine learning and data science in particular in response to these issues. The real problem, however, goes much deeper.

Entire companies are based around the notion of scrum development, a continued cycle of “sprints” that last a couple weeks and end in some deliverable. The original methodology may be good for a prototype, but when scaled to company operations, it inspires a culture of “just get it done.” It leads to a toxic environment, where both leaders and individual contributors are driven by a fury to “turn it in” and release before a competitor or by the time VMWorld comes around. It means products are being built on top of other products that were just barely good enough to ship with a shiny marketing veneer.

In the physical world, this would be akin to building a bridge by throwing stones wantonly into the water in order to hop across. Yes, you can get across the river quickly, but misplacement of any one of those stones can mean you may slip and fall into the water, or someone coming behind you who distributes his weight differently may fall. Worse, if the rocks seem stable enough for a long time, people begin constructing a hasty bridge using those stones as a foundation. The bridge holds for a while, but one day the cumulative effect of poor materials and high traffic volume cause the bridge to collapse, and people get hurt.

If civil engineers designed and built bridges the way tech develops and releases products, people would die. If aerospace engineers rushed the design of a commercial airliner and patched issues the way tech does, people would die.

If mathematicians developed their theories and equations the way tech develops and releases products, your world would crumble.

Let’s run a thought experiment. Suppose George Boole, the inventor of Boolean algebra, rushed his theories out so he could beat an academic rival. He didn’t really prove everything or make sure it was airtight. There was maybe that funny edge case, but he couldn’t see it ever arising in practice, so he just ignored it. Unbeknownst to him, that edge case was a counterexample that showed all of his notions to be false overall.

Boolean algebra is the fundamental theory by which your computers work today, and will be such until and unless quantum computing takes off. If what seemed like an edge case 150 years ago became the foundation for the development of computers, the ramifications would be so vast as to be unrecoverable. It would require a whole new redesign of how computers worked. Let that effect snowball in your mind.

That’s one topic in mathematics. Imagine hundreds of mathematicians developing the foundations of your world the same way. But we don’t. We study the river, the banks, and the earth carefully. Only when we are sure do we begin constructing a bridge, starting with the foundation. Stone by stone, making sure the bottom ones are perfect and unbreakable. The work takes years, decades, lifetimes sometimes, but the results are built to last. Mathematics is the only discipline that has never had to “reinvent” itself upon the discovery of new knowledge. All of mathematics builds, expands, and generalizes.

What does this have to do with business? To fix the attitudes that ultimately led to Spectre, Meltdown, and the patches to fix them, and the patches to fix those, companies need to think like mathematicians. To fix the ideologies that rushed out a new macOS with a serious security vulnerability, companies need to think like mathematicians. To avoid the PR nightmares from “AI gone wrong”, companies need to think like mathematicians.

Leaders and individual contributors need to think like mathematicians, searching deliberately for elegant, simple solutions that are provable, explainable, and fundamentally strong. The elegance and simplicity will allow for other things to be built on top that won’t break the foundation. Even when something is built on top of a foundation, it is carefully examined as to its stability. Provable solutions mean no surprises later when something fails.

This requires lateral thinking, creativity, and most importantly, a willingness to take a bit longer in product development and business decisions. It’s a difficult thing to do, when all your competitors move so fast you think you would only hear the Doppler effect as they scream by you. Adopting a mathematician’s outlook takes longer. However, the results are simpler, with less maintenance, less need for software janitors to clean up the mess from the frantic development party, and stronger, more resilient products. Every one of these things yields cost savings, long term revenue, and perhaps most importantly, customer trust.

We at The Math Citadel are mathematicians, refusing the siren song of scrum-like mentalities. We’re here to help the companies who want to look past the hypes, who want to carve their own paths rather than be the leader on a paved course. We’re here for the companies who say “enough” to shortsightedness and continuous patching. Spectre and Meltdown are just spokes on a wheel. We don’t intend to stop the wheel, we intend to break it.

Commentary: Infrastructure Considerations for Machine Learning

Commentary: Infrastructure Considerations for Machine Learning

Welcome to another brief commentary and departure from the heavier mathematics. I have been endeavoring to expand the breadth of my knowledge on the tech side of things, and chronicling some things I’ve learned and observed from speaking with different companies, both as an independent and as a Tech Field Day delegate. Many of these articles have focused on considerations for a practitioner rather than a mathematician, but occasionally we theorists have to show some business value, so I try to keep current on the tools and methods utilized in the corporate world.

It’s fairly common knowledge now that most machine learning and deep learning algorithms are highly data-dependent. That is, the data you feed something like a neural network heavily affects the results. Since the common analogy for machine learning, artificial intelligence, and neural networks is one of a biological learning process, let me continue that analogy. These algorithms are like small children; they’re sponges. They learn based on the type and amount of data given, and in surprising ways. If you want to teach a child (or baby Terminator, perhaps one named John Henry) what a cow looks like, you must be very careful what you give him. If you only give him pictures of birds and cows, he may decide that a cow is identified by the number of legs he has. Then what happens when he is given a picture of a cat?

Perhaps you think of this and throw in pictures of dogs too. Aha! So a cow has four legs and hoofed feet! Until John Henry sees a zebra. This silly example illustrates just how long we took to learn even simple things as children, and how important large amounts of repetitive and varied data were to us converging on how to recognize a cow. These AI/ML/NN algorithms are designed to mimic this learning process, and thus require vast amounts of highly varied data. Good performance by an algorithm on a subset of the data may not hold up to the expanse of the real world data, just like the example of learning to recognize a cow. Thus, these algorithms are not ergodic, to borrow a term from dynamics and probability. The models and methods are not independent of the initial data you feed them. In other words, if two different people feed the same algorithm different datasets and let the algorithm “learn”, the end results can be vastly different.

To get around this, most practitioners of data science want to throw as much data as possible, ideally the entirety of everything. If you’re wanting to learn the shopping habits on an e-commerce site, you’d prefer to let your John Henry learn on the whole database rather than just a subset.1

However, your IT department would likely be unhappy with your request to run tests on a production environment for a multitude of reasons: security and performance being two of those. Having a bunch of copies floating around takes up massive amounts of storage, not to mention the security risks. A mistake in the code run against the production environment can take the whole e-store down due to a bad query.2 I spoke twice with Actifio about their Sky Infrastructure, first hearing from them at Tech Field Day 15, then interviewing them again to get some more details about use cases rather than an overview of the infrastructure itself.

As a quick overview (Mr. Achilles does a great job on the tech details in this video here), Actifio basically creates what they term a “golden copy” of your data, after which updates are done incrementally to save storage space, and everyone can get their own virtual copy (which are really more like pointers) to interact with. Now a data scientist can’t affect the production database when he/she queries against it, and can also use far more data in testing than before. This should shorten a data science development cycle, because the workaround for using subsets of data to train is to sample many subsets and train the algorithm over and over again, which takes time. In addition, the data scientist can find out very quickly if the code that worked for 100,000 rows will hold up against 10 million (guilty as charged of writing unscalable code in my past experience as a data scientist).

Being more of a theoretician, I don’t tend to step out of my bubble to consider the various infrastructures that are necessary to provide me my sandbox. To fix that, I endeavor to occasionally speak with various tech companies about their work. I like the way Actifio has streamlined a good solution that aims to satisfy the IT gatekeepers and the developers/data scientists/users of the data. Overall, I’m not exactly a fan of the semi-blind deep-learning approaches to make all business decisions, but those methods do have their uses, particularly in exploration and discovery. This platform definitely has a good potential to help a data science team in their development.

[Disclaimer: I have never been compensated by Actifio or any other company for my commentary articles.]

Commentary: Technical Debt in Machine Learning

Commentary: Technical Debt in Machine Learning

I recently had the opportunity to be a guest on an episode of the On-Premise IT Roundtable podcast, the topic of which was technical debt. (You can listen to the twenty minute episode here, or watch the video version here.) The conventional definition of technical debt, for both consumer and enterprise technology, is the lagging of upgrades, potentially causing issues down the line when the software or hardware is no longer supported. For example, should you immediately upgrade the operating system on your smartphone, or wait with an older version? If you wait, how long should you wait? The discussion was lively, pleasant, and informative.

In particular, one panelist, Mr. James Green of Actual Tech Media, brought up an interesting and different notion of technical debt: that of the “quick and dirty” solution to put out a fire or to be seen as a player in the latest buzzword space.

From Mr. Greene, Technical debt can also be “…when I start building something small and narrow to address a problem I have right now without thinking about three years from now, what are my needs going to be?…You create technical debt when you have to build something that meets your new needs…”

This struck me in a slightly different context than the data centers and storage technologies that comprised the undercurrent of the discussion. I see companies and people scrambling to board the data science/machine learning/AI train, and quickly implement, well, anything that shows they use these concepts and tools. It’s great for marketing, and in marketing, timing is everything. You have to be at the forefront, but not the front. What happens is “research” on a development cycle: new dashboards and “analytics” churned out every few weeks, and slight twists on the same models every time the problem or dataset changes.

It creates technical debt from an advanced development standpoint to wait too long to begin looking into a promising area or crafting a solution using what could be the next-big-thing. You risk jumping on the bandwagon as everyone else jumps to a new one. But perhaps more costly long-term is lurching from bandwagon to bandwagon, desperately trying to use that shiny new buzzword-hammer you bought into in your organization, whether or not it’s really the right tool. When you deploy a canned algorithm, or an unsupervised learning algorithm you don’t fully understand, how will you know when it begins to fail? How will you identify slow changes to the model as the data slowly changes, and how will you know if those changes are due to truly evolving circumstances or simply corrupt data? Could there have been a simpler, more elegant, more mathematical solution to the problem that would have applied across verticals, saving you months of retraining the same neural network on a different dataset?

I’ll make an analogy with open-water swimming. Unlike swimming in a pool, where you have a nice, straight, black line at the bottom of the pool to guide you, open water comes with a new set of hurdles. Current, waves, and murky depths. The technique for an open water swim is very different from a pool swim. In the pool, you can look down, let go, and cruise, knowing you’ll never get off course. Out in open water, you pause every 5-7 strokes briefly to look up and reorient yourself with the landmarks and buoys to make sure you’re still on course. You also spend time studying the course and terrain for days prior to the start of the swim. If you fail to do so, you can get hundreds of meters off course and have to correct1, which adds minutes to your race time. In open water, the winner isn’t always just the fastest swimmer charging ahead, but the smartest one whose pace is a little slower, but pauses to ensure he is on his course.

Frantically deploying the latest neural network, machine learning technique, or black-box AI solution is just like an open water swimmer treating the race like a pool swim. We’re in uncharted or barely charted territory here. Plowing full speed ahead deploying ultimately uninterpretable solutions may cause an organization to look up 3 years later and realize it’s way off course, with little to show for it. The conversation around the utility or cost of conventional technical debt converged on the conclusion that there are more nuances than simply deeming it “good” or “bad”. Obviously, it’s bad to look up every single stroke when swimming; that will slow you down unnecessarily. As with most things, moderation is key. A pause may seem ill-advised this week, this month, or even this year, but globally, those occasional pauses ensure you’re on course to deliver business value long-term. Study the problem at hand carefully. It may generalize abstractly to a problem in another vertical or industry that has already been solved elegantly, and deploying that solution may save millions of dollars and hundreds of hours over the next 5 years.

Commentary: On Straight As and Salaries

Commentary: On Straight As and Salaries

(Fair warning: this is a personal account.)

The systems were designed well, I think. When we were in school or college, passing was supposed to mean you knew the material, basically. A B showed you were pretty good, and an A was only for the smartest students. Not relatively the smartest, but objectively the smartest. Most people could at least pass, fewer were B students, very few were A students, and a tiny fraction were straight A students. At least, that’s how it was supposed to work.

I was a straight A student all through high school. Actually, I didn’t just average As, I almost never was handed back a piece of paper with less than a 90% mark. Honestly, I didn’t feel the sense of accomplishment I thought I was supposed to.

Fast forward to college. Oh man, Georgia Tech hit me hard. Not only did I experience my first B, I got my first C. Advanced Linear Algebra, or what was meant to be the first proofs class. I was told by the professor to quit mathematics which was devastating to hear. Under the way grades were supposed to work, he was right. It meant I was barely treading water in a field that was only going to get harder.

I kept going. In the next course, the first in real analysis, all those ideas that didn’t click during Advanced Linear Algebra just made sense. I started excelling, and I learned one of my life-changing lessons: Grades aren’t indicative of your performance, nor do they indicate the ability to apply that knowledge elsewhere or retain the lessons learned.

Grades are just a barometer for what a school or a professor deems worthy.

That course challenged me, fundamentally. I slipped and fell over and over again, and didn’t find my footing until the next semester. But once I realized As didn’t really matter; once I realized that the letters on my transcript were due to someone else’s value system I didn’t agree with, I lost inhibition and fear. I grew by taking on things I wasn’t sure I could do, and finished my PhD in mathematics about a year and a half ago.

Once we leave school, the measure of a person’s value became job title and salary. It makes sense, right? We’re willing to pay more for something or someone we need very much. But is that really true?

Airline pilots go through grueling training for years, and amortized over their careers, make very little. EMTs are paid far less than a director of marketing. None of these observations are new, nor is complaining about them.

Sticking strictly in the business world, the highest paid individuals are in executive management, marketing, and sales, mostly. But CEOs of large companies are ultimately pretty interchangeable – in fact, it happens all the time. Even moving down to comparing branches, businesses spend billions on marketing, and very little by comparison on research.

Is the system of salaries and spending really broken? Maybe it’s not. Maybe it’s a barometer of what businesses value. The word “value” is quite vague; we all have different values. Marketers, salespeople, and CEOs are highly valued because good ones produce high stock prices, and favorable quarterly profits. If a business’s value system ranks these things more highly than long term exploration, then it makes perfect sense that they structure their spending and salaries the way they do.

It means that the vast majority of what companies value1 differs from what I or other individuals value.

It means that salaries are just a barometer for what corporations deem most worthy.

And there is my next lesson, the “business version” of what I learned regarding grades. Salaries are ultimately a metric for someone else’s value system. Most corporations look quarter-by-quarter, maybe even a couple years out. What if you see further than that? Then the things you think money should be spent on will differ.

If you were ever laid off, if you are making less than you think you should2, then your values didn’t align with your employer’s. That’s all. Straight-A students are not necessarily better than students who made a C or three. Derive your value internally, by looking in a mirror and evaluating yourself objectively, not relative to someone else.

I realize this is hard. I also realize that my idealistic notions of not caring about salaries or grades sound a little new age. Obviously, there are penalties for these attitudes, especially in the short run. My undergraduate GPA wasn’t a 4.0 – that cost me admission into some of the more prestigious PhD programs, which would have led to good academic jobs. Refusing to align with what businesses deem valuable today means that you will have less discretionary income.

Honestly, after doing both, I’m happier for it. I’m not afraid anymore. I don’t have the best academic pedigree, but someone recognized the work I did, and for a while, I had my dream job. It came from an unexpected place, after I was rejected from almost every national lab in the country. Now again, I believe the same thing – uncompromised vision. I believe in fair trades – I’ll find someone whose value system is analogous to mine. Until then, my salary doesn’t matter.

Commentary: High Level Data Filtration

Commentary: High Level Data Filtration

The consensus over the last five or so years has converged on a conclusion regarding data: we’re drowning in it. We have more than we can possibly monitor with our own eyeballs, and certainly more than we know what to do with intelligently. The motto for data scientists has been “More is better.” Well, ask and ye shall receive. The image I see is a virtual version of someone looking at the end of a firehose as it’s turned on full blast, then trying to spot an occasional speck in the stream of water. More data is good, only if it’s good data.

The issue with these scenarios is that when a real alert comes in between these ones you prefer to dismiss, you’ll likely ignore the new one as well.  Enter the need for filtration. We have so much data (much of it repeats) in enterprise scenarios that we need a way to filter these streams by eliminating the obvious and duplicated, as it were. To focus on a very real illustration of this need, take a look at enterprise network traffic. Enterprises have thousands of devices sending massive amounts of data within the network. They also deal with thousands of traffic that moves into and out of the network. The amount of packet data and metadata you could capture in an attempt to monitor this is difficult for us to really fathom. We need to decide what data is useful, and what isn’t; and we need to decide this in real time, as the information flows.

An intelligent way to approach data filtration is similar to how we look at water filtration. The first thing you want to do is get the obvious large chunks of…whatever…out. Then you apply progressively tighter layers of filtration (plus some chemical treatment) until you get clean water. Data cleaning can be a bit like that. Data scientists2 recognize that they will never be handed clean water to work with, and that they’ll have to do some cleaning themselves. But rarely does anyone who is actually tasked with developing data science or cybersecurity solutions want to be the ones removing the obvious big garbage.

I watched, as part of Tech Field Day 153, a presentation by Ixia (video can be found here) on what is effectively a real-time filtration system for network treat intelligence. The idea is to leverage their database of known obvious issues and “bad players”, as it were, to quickly filter out and mitigate these “large chunks” before passing the data to the more refining and advanced cybersecurity monitoring products. Ixia’s products also look for duplicate data, and remove that as well.  I like that they stepped in to do the “dirty work”, as it were, and offer a solution to help with that high level filtration in real time. They were very clear about what their analytics did and did not do, which I respect.

The benefit of clearing out these known issues or duplicated data is clear whenever someone downstream, as it were, feeds data into some variation of a predictive machine learning algorithm that is meant to monitor, evaluate, and alert. The algorithms can run more efficiently with less data flowing through it, and unnecessary alerts can be eliminated, allowing a security monitoring system to only deal with the data that suggest potentially new threats that may require human intervention, as the known threats were identified and mitigated many steps earlier.

The world needs all kinds. Everyone wants to be the surgeon, because he’s famous, but the surgeon cannot perform well without good technicians and nurses to help with the prep work. Ixia stepped in and offers a technician for the job of filtering prepping network traffic for better analysis and better security monitoring, which will keep the surgeon from fatigue.