Why do we care about data?
It’s a deceptively simple and equally provocative question. It’s temping to roll ones eyes and cite that data is the world’s most valuable resource/the new oil, that data scientist is the sexiest job of the 21st century, or that artificial intelligence1 startups had +82% year over year growth in funding in 2020 (calculation, report). But we should not give in to such temptation. By doing so and relying on others’ predictions and reasoning, we might have also concluded that the “the horse is here to stay [as the] automobile is a fad”, televisions will never catch on because “people will soon get tired of staring at a plywood box every night”, and that “there is a world market for maybe five computers.”
I think it’s necessary to understand why we care about data on a more fundamental level to understand if it deserves the hype. Before I share my answer, I encourage you to reflect on why businesses and individuals should care about data.
Struggling to come up with anything at all? It’s not your fault. A lot of historical coverage of the space took the conclusion that data is important for granted, and instead focused on its sheer volume or the eye wateringly high compensation packages being offered to those in the space.
Best you can offer are vague terms such as “learnings” or “insights?” Again, not your fault. Those in the data space who don’t really know what they’re doing love to overly rely on these terms to distract others from the fact that they aren’t actually accomplishing anything.
Got a laundry list of ideas like “producing recommendations”, “optimization” and “guiding strategy”? Not bad! But I think all these examples fit under a larger umbrella, which serves as a unifying theory.
After a few years of working within the data space, and then only after that time asking myself (if and) why data matters and spending a few additional years contemplating that, did I finally arrive at the answer which is heavily informed by this article by Cassie Kozyrkov:
We care about data because it can be used to improve decisions.
I strongly believe in defining important terms I use, so here’s a quick refresher I wrote on decisions. I’ve spent hours and hours agonizing over trying to find possible exceptions to the above within business or individual (read as: non-academic) contexts, and have been unable to do so. If you think about the list in the last bullet above, all three examples fall under the umbrella of improving decisions—which I’ll refer to in the bullets less formally as “choices”. To avoid confusion, I’d like to clarify that we typically think of decisions as the selection of a choice among alternatives that requires conscious thought. I decided to leave behind my friends and family, renounce my worldly possessions, shave my head, and become a monk. I chose the turkey sub for lunch. This can get confusing if a machine or algorithm makes a choice for us. Did it decide, or merely choose? To sidestep a theories of consciousness tangent, I’ll use “choice” in my examples below, but know that a broader definition of “decision” could work as well.
Producing Recommendations: Algorithmic recommendation engines are trained on large datasets to personalize and improve recommendations to users. Netflix does this to suggest other shows you may want to watch; Amazon does this to suggest other items to buy; Facebook does this with which posts it thinks you’re most interested in seeing; and Stitch Fix does this to figure out what clothes you want to wear. In this case, a partially or fully automated model is choosing which content or items to recommend to you out of a very large universe of possibilities.
Optimization: Historical data can be used to inform tons of optimization decisions, which I’d define as incremental improvements to existing systems, that businesses and individuals face. Some examples are the dynamic pricing strategy that Uber uses or Vanguard’s portfolio allocation models.
Guiding Strategy: You may have heard about different product’s “aha moments”: moments where a user understands the value of a product and therefore is likely to keep using it. Facebook famously shared that their “aha moment” was getting new users to sign up for 7 friends in 10 days. They arrived that this by looking at historical user data and examining which behaviors were most correlated with future usage of their product. This allowed them to greatly simplify their decision space: instead of having to brainstorm, debate, and vet all sorts of initiatives that might improve user retention in the long term, they could focus on those that got new users to add friends.
There’s also a very important consequence of the proposition that the purpose of data and analyses is used to improve decisions that hit me hard when I realized it. If the value of data to organizations and individuals is to improve decisions, then data or analyses that other folks try to use in any other way are not only useless, they’re distracting. Think about that next time somebody presents data or analytical “insights” or “learnings” to leadership that aren’t tied to a decision down the line2. That person should be paying the company for wasting time and energy, instead of being paid themselves.
I plan on exploring the simple and elegant idea that the purpose of data is to improve decisions in 8K Ultra HD in future newsletter posts. Below are a list of other topics I plan on addressing, which can you chew on in the meantime:
What exactly is a decision? What types of decisions exist within and across organizations? Who gets to make them and why?
What is data? What are the ways data analysis does and doesn’t fit into the decision making process?
How objective is data analysis? How automatable is it as a consequence of this?
How can one practice using data for decision making when not in a decision maker role?
What are characteristics of mature data organizations? How does one build one?
How should data teams be organized to get them to positively contribute to the decision making process?
And if I still have your attention, I want to answer some questions you might have:
Is this what I can expect of future posts in terms of length and content?
Yes! I bet you’re busy; I know I am. My hope is to write a bunch of bite sized posts that take 5 minutes or less to read. Each should have a data related nugget of wisdom that you can start applying immediately. Pick and choose the ones that interest you, and with their powers combined3, you’ll have a data strategy playbook for business or life in general!
You seem to be a big fan of decision intelligence, why doesn’t this newsletter focus on that?
It’s a new(ish) and growing field. All of my experience is in the use of data for decisions, so I figured that was a better place to start and certainly there is more interest in and demand for data strategy at the moment than decision strategy. I would not be surprised if decision intelligence becomes the new data science over the next few years, and one could argue it might just be a formalization and broadening of what is currently owned by strategy teams. Regardless, I plan on sharing any relevant decision intelligence resources in this newsletter.
I started this newsletter in part to encourage more discussions and knowledge sharing around data strategy. I’d love to get your feedback, questions, and comments!
A technology that is heavily reliant on data to build.
Important to note that we need to think of impact in the long term. An individual datapoint or analysis may not affect a decision in the short term, but when viewed with other datapoints or analyses over time it can.
Yes , this is a Captain Planet reference.
Great Article, Jarus. It addresses the most important question, 'Why'! I really like the part where you talk about the misleading jargons. I manytimes get caught up in those words.