You’ve probably heard it countless times: “data is the new oil.” It conjures the image of a prospector striking it rich discovering “black gold.” Or you might think of Jed Clampett from The Beverly Hillbillies, finding that “bubbling crude” on his land and the financial windfall that ensued for him and his family.

It’s true: like oil, data is an immensely valuable resource that companies may not realize they are sitting on top of right now. But there’s another side of “data” as “oil” that is just as important…
Both must be extracted, refined, and put to use to be truly valuable.

There’s a term for the “crude oil” of the data landscape: “dark data.” Dark data is, very simply, any data you have that you aren’t putting to use for your business.

It could be because it’s hard to get to.

Or, you might have it in raw form but just aren’t using it for analytics or decision making.

Maybe it’s unclean, incomplete, or unreliable.

But it’s there, and it’s all potential, waiting to be tapped. A 2022 report from the Enterprise Strategy Group notes that nearly half of all data can be considered “dark.”

One of the unfortunate perspectives about “dark data” that is out there is the thought that it’s an inconvenient nuisance. After all, data isn’t free – even your own data. There’s a real cost to storing, securing, and maintaining it. As much as 52% of an organization’s budget for data storage, according to a report by Veritas. Unless there is a compelling regulatory compliance reason, why keep data I don’t need? Admittedly, it’s a good argument, but one can just as easily say..

Why not use data if you have it?

So how can you refine dark data to fuel your business? Here are a few examples of how dark data can be illuminated.


360-Degree Customer View

I can’t recall coming across a company where all of its customer data originated from a single place. Often you’ll have a CRM and perhaps a separate email marketing platform; social media interactions; logged-in web traffic; sales data in some place, which could be a point-of-sale system, ERP, or eCommerce platform; and countless other systems and applications.

The problem is getting all of these tools to talk to one another so that you can have a total view of a specific customer at any given time. Some of these are true examples of dark data, as they are not used at all; others might be used, but not to their fullest capacity. Light integration can bring these sources together, break down silos, and allow you to enhance the customer experience by getting a truly holistic view of your customer that everyone in the organization can see.


Customer Feedback Data

Once upon a time, when people thought of “data” they thought of numbers. Something that can be plotted on a graph. To the extent that “words” were data, they were descriptors (or dimensions, in business intelligence parlance) or, occasionally, you’d see a word cloud.

Semi-structured or unstructured data was either difficult or impossible to analyze. One example of semi-structured data is narrative text. I like to call this “gray data.” It’s not quite dark, because it’s easy to find, but it’s also not fully illuminating as you have to look it up on an individual basis.

Flash forward to present day, where you can actually analyze semi-structured data and boil it down to empirical numbers. Consider customer service responses. Whether pulled from emails received directly from customers, chatbot logs, or even just notes entered by contact center personnel, a natural language processing technique called sentiment analysis can score a response based on whether it is positive or negative about a certain product or service. Now, even emails that come in at a rate of hundreds per minute can be analyzed into a real-time number that you can see as a gauge for customer perceptions at the current moment — not one week ago, where it might be too late to do anything about it.


Abandoned Carts

If your business has an eCommerce platform, storing the contents of a user’s cart is essentially table stakes. There is no excuse in 2023 for a shopping cart that empties if a user goes idle or logs out. The customer may want to do a little more research or even delay immediate gratification before making a purchase. Returning to an empty cart because these data didn’t save anywhere is a fast way to lose any chance of that sale converting.

However, what do you do with this data other than just keep it around in case the user comes back? If that data is just filed away untouched, it becomes dark. One way to put it to use is to make it part of the customer experience journey. Gently remind the person they have an item waiting to purchase. Maybe even offer a discount on a cross-sell. And if all else fails and the cart is fully abandoned, that data needn’t be purged. Use it to identify trends in why people are abandoning carts. Can it be combined with metadata from the website about user activity — also dark data — to understand if there is a user experience issue? Are there patterns among the type of customer that tends to abandon that can inform marketing approaches?


Internet of Things

IoT, or Internet of Things, is a term that encompasses a number of technologies like sensors, wearables, smart home appliances, and more, that provide streams of real-time data. There are many non-dark applications for data originating from IoT devices. For example, wearable devices providing real-time streams of medical information, sensors in manufacturing equipment providing diagnostic information, or thermostats in supply chain transportation to ensure that refrigerated cargo remains at the appropriate temperature controls.

With readings streaming in every second or even more frequently, it’s easy to see how this can be a major storage bloat. However, if used properly, IoT can offer tremendous insight, predicting when equipment needs to be repaired or uncovering patterns in human behavior. One way to deal with the storage challenges is to have a well-thought out data strategy around IoT. Do you really need to maintain second-to-second detail or is aggregation okay? Is a longer time interval acceptable? Instead of storing full readings, can you maintain changes from point to point?



From Darkness to Light

Dark data should not be seen as a liability. Instead, look at it as untapped potential.

Before discarding any data because it’s not being used, take an honest assessment of the reason for that – is it not being used because it isn’t useful? Or does it have value if looked at differently? You’d be surprised how small the steps are to start to bring that data into the light.