Welcome to the (Data) Jungle

We plot fun and games

I imagine the early days of big data were similar to the early days of the California Gold Rush. You did some relatively easy work (stick your bare hand into a river) and got something valuable (gold). The first time an individual or organization goes from a system that relies on no big data to one that does, the gains relative to effort required are massive. Imagine, as an individual, trying to manage your personal budget without even knowing how much you’re spending, to a world in which you can see your spending by merchant or category in less than a second. Imagine the difference between a Netflix service that suggests random video content to each user (regardless of genre; type (TV show vs. movie); hell, even language) and one with even a simple recommendation system (you watched a sci-fi show most recently—so let’s, at the bare minimum, recommend more sci-fi). Both of these improved systems are made possible by the availability of big data.

After those quick wins though, things get a little thornier. (“Ok Google, do jungles have thorns? Because if not my analogy is breaking down real fast.”) Individuals need to figure out what additional data and techniques are necessary to make decisions—or if any is needed at all. Organizations face the same problem, along with questions of deciding what needs to be measured and the costs of that measurement, how best to organize their data, developing their data teams, and avoiding data abuse (e.g. gaming internal metrics).

Relying on my experiences as a quantitative analytics leader and those of my colleagues, I propose that data organizations within companies and data literacy among individuals are currently in an adolescent phase. High school was like a jungle, right (heads up: video link)? We’re past the simplicity of childhood, but haven’t mastered the challenges of adulthood. I don’t think this should come as a surprise, as the data ecosystem as we know it is a recent development. We didn’t figure out how to cost effectively store big data until the 90s. This new blog is dedicated to exploring challenges that individuals and organizations face when it comes to using their data effectively. This is best known as data strategy, and will include elements of data translation. I’ll address the fields of data engineering, analytics, and science, and perhaps give a quick nod to machine learning engineering. This blog will not be a technical one, however. There are plenty of folks writing blog posts on how to implement various data science techniques, many of which can be found in Medium’s Towards Data Science publication. My hope is that my advice will be helpful to those starting in the space, those working their way up, and those who don’t work with data for a living but want to better understand how they can use it both personally and professionally. To all of you I say, “welcome to the jungle!”