Commentary: Technical Debt in Machine Learning

Commentary: Technical Debt in Machine Learning

I recently had the opportunity to be a guest on an episode of the On-Premise IT Roundtable podcast, the topic of which was technical debt. (You can listen to the twenty minute episode here, or watch the video version here.) The conventional definition of technical debt, for both consumer and enterprise technology, is the lagging of upgrades, potentially causing issues down the line when the software or hardware is no longer supported. For example, should you immediately upgrade the operating system on your smartphone, or wait with an older version? If you wait, how long should you wait? The discussion was lively, pleasant, and informative. 

In particular, one panelist, Mr. James Green of Actual Tech Media, brought up an interesting and different notion of technical debt: that of the “quick and dirty” solution to put out a fire or to be seen as a player in the latest buzzword space.

From Mr. Greene, Technical debt can also be “…when I start building something small and narrow to address a problem I have right now without thinking about three years from now, what are my needs going to be?…You create technical debt when you have to build something that meets your new needs…”

This struck me in a slightly different context than the data centers and storage technologies that comprised the undercurrent of the discussion. I see companies and people scrambling to board the data science/machine learning/AI train, and quickly implement, well, anything that shows they use these concepts and tools. It’s great for marketing, and in marketing, timing is everything. You have to be at the forefront, but not the front. What happens is “research” on a development cycle: new dashboards and “analytics” churned out every few weeks, and slight twists on the same models every time the problem or dataset changes. 

It creates technical debt from an advanced development standpoint to wait too long to begin looking into a promising area or crafting a solution using what could be the next-big-thing. You risk jumping on the bandwagon as everyone else jumps to a new one. But perhaps more costly long-term is lurching from bandwagon to bandwagon, desperately trying to use that shiny new buzzword-hammer you bought into in your organization, whether or not it’s really the right tool. When you deploy a canned algorithm, or an unsupervised learning algorithm you don’t fully understand, how will you know when it begins to fail? How will you identify slow changes to the model as the data slowly changes, and how will you know if those changes are due to truly evolving circumstances or simply corrupt data? Could there have been a simpler, more elegant, more mathematical solution to the problem that would have applied across verticals, saving you months of retraining the same neural network on a different dataset? 

I’ll make an analogy with open-water swimming. Unlike swimming in a pool, where you have a nice, straight, black line at the bottom of the pool to guide you, open water comes with a new set of hurdles. Current, waves, and murky depths. The technique for an open water swim is very different from a pool swim. In the pool, you can look down, let go, and cruise, knowing you’ll never get off course. Out in open water, you pause every 5-7 strokes briefly to look up and reorient yourself with the landmarks and buoys to make sure you’re still on course. You also spend time studying the course and terrain for days prior to the start of the swim. If you fail to do so, you can get hundreds of meters off course and have to correct1, which adds minutes to your race time. In open water, the winner isn’t always just the fastest swimmer charging ahead, but the smartest one whose pace is a little slower, but pauses to ensure he is on his course. 

Frantically deploying the latest neural network, machine learning technique, or black-box AI solution is just like an open water swimmer treating the race like a pool swim. We’re in uncharted or barely charted territory here. Plowing full speed ahead deploying ultimately uninterpretable solutions may cause an organization to look up 3 years later and realize it’s way off course, with little to show for it. The conversation around the utility or cost of conventional technical debt converged on the conclusion that there are more nuances than simply deeming it “good” or “bad”. Obviously, it’s bad to look up every single stroke when swimming; that will slow you down unnecessarily. As with most things, moderation is key. A pause may seem ill-advised this week, this month, or even this year, but globally, those occasional pauses ensure you’re on course to deliver business value long-term. Study the problem at hand carefully. It may generalize abstractly to a problem in another vertical or industry that has already been solved elegantly, and deploying that solution may save millions of dollars and hundreds of hours over the next 5 years. 
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


  1. I’ve had to do this. I ended up at least 500 meters off course during a 5K swim when I was a teenager. It sucked.