Your Data Deserves Better Than Band-Aid Solutions
Imagine trying to fill a cracked glass with wine. The more you pour, the more spills through the fractures, leaving behind a mess instead of a full, satisfying result. This metaphor illustrates the current state of data management in many organizations—a fragile system of interconnected tools, patched together with temporary fixes, unable to fully contain or utilize the data pouring into it. But why is this the case? To answer that, we must examine the environment that creates these cracks in the first place.
The Cracked Glass of Data Management
In this analogy, the glass represents core systems (e.g. ERP, MES, CRM, databases, etc.) —the key tech that runs your business. The band-aids are the APIs, connectors, scripts, and one-off solutions intended to make these systems communicate. While these fixes might work temporarily, they cannot handle the volume, velocity, and variety of today’s data demands. Data is the lifeblood of modern organizations, but a disjointed tech stack prevents it from flowing effectively. A few culprits:
1. Legacy Systems Are Still at the Core
Many companies still rely on decades-old ERP, MES, and CRM systems as the backbone of their operations. These systems were never designed to handle the scale, speed, or diversity of modern data demands. Their limitations create silos where valuable information becomes trapped, accessible only through cumbersome processes.
According to BCG, 56% of managers struggle with data-operating costs driven by "spider web" architectures of fragmented data stacks. This complexity increases the maintenance burden on APIs and integrations.
2. The API Band-Aid Culture
In response to growing data needs, organizations often resort to using APIs and connectors as quick fixes. While these tools enable disparate systems to communicate, they are often fragile, poorly maintained, and only address surface-level integration challenges. The Result: Data flows are incomplete, error-prone, or require constant manual intervention. APIs, while useful, are not a substitute for a unified data strategy.
3. The Explosion of Data Sources
The volume and variety of data sources have skyrocketed. From IoT devices to social media, from machine logs to customer interactions, companies collect more data than ever before. Yet, most of this data is unstructured, making it challenging to analyze or integrate into existing systems.
4. Vendor Proliferation and Stack Fragmentation
The rise of specialized tools for every conceivable business need has led to an explosion in vendor options. While this provides flexibility, it also introduces complexity. This diversity results in a fragmented stack, where overlapping tools create redundancy, increase costs, and make integration even more complex.
According to BCG, mature companies now manage over 150 unique data tools, compared to fewer than 50 a decade ago.
So much Opportunity Wasted!
Your tech stack is like a fragile glass, hastily patched together to hold the flood of data pouring into it. As wine (data) is poured, much of it leaks through the cracks—the dismissed data, the vast majority of information that’s discarded without being stored or analyzed. What remains in the glass is what you’ve managed to collect, but even then, only a portion of that data is used. The rest sits untouched—this is dark data, the untapped portion of your stored information, waiting to reveal its potential.
What is Dark Data?
Dark data is the portion of your collected data that you never use for analysis, insights, or decision-making. It includes stored but untouched customer behavior logs, machine sensor data, unstructured documents, and more. Unlike dismissed data, which is never retained, dark data lingers in your systems—undiscovered, inaccessible, or simply ignored.
Imagine the wine left in the patched glass. Some of it is poured into usable containers (your operational and analytical systems), but much of it remains in the glass, unreachable and forgotten, its potential wasted. Dark data persists because it’s buried in silos, stored in unstructured formats, or overlooked due to a lack of strategy or resources.
In 2019 Splunk found that 60% of companies report that half or more of their organization’s data is dark, with 33% claiming 75% or more of their data is dark. Three years later ESG in a different study found that 42% of companies indicated that at least half of their data is "dark" (unused or unknown), with the estimated mean of dark data at 47%. Alarmingly, 21% of respondents reported over 71% of their data as dark. No matter how you look at it, that is a lot of dark data!
BCG highlights that 95% of the data generated globally today is unstructured, and most organizations lack the tools to process it effectively. This inability to handle unstructured data feeds directly into the accumulation of dark data.
What is Dismissed Data?
While dark data is stored but unused, dismissed data is never retained at all—it’s the wine spilling through the cracks in the glass. BCG reports that only about 6-7% of the 84 zettabytes of data generated globally in 2021 was stored, with the rest dismissed outright. This includes transient IoT sensor readings, ephemeral logs, and other data deemed too costly or irrelevant to keep.
The distinction is critical. Dismissed data represents lost opportunities that cannot be reclaimed. Dark data, on the other hand, is already within your systems and can be accessed if the right tools, strategies, and priorities are applied.
Unlocking Data’s Potential: Balancing Strategy and Opportunity
The potential of data—both dark data (collected but unused) and dismissed data (generated but not stored)—is enormous. Yet many manufacturers struggle to harness its full potential. The solution isn’t to collect or analyze everything—doing so is expensive and impractical. Instead, companies should focus on unlocking the value of their data by blending a strategic approach with an opportunistic approach. This ensures manufacturers extract actionable insights while avoiding unnecessary costs and complexity.
Strategic Approach: Identify and Capture the Data That Drives Value
The strategic approach begins by asking: What data do we need to make better decisions or create more value? Instead of trying to collect everything, manufacturers can focus on specific areas where data supports critical outcomes.
For example, a manufacturer aiming to minimize unplanned downtime might analyze the data they need to predict equipment failures. This could include vibration, temperature, and pressure data from sensors on production line machines. The company doesn’t need to collect every possible metric from every piece of equipment. Instead, it can prioritize the machines that are most critical to production and the parameters most likely to signal issues.
Similarly, dismissed data—such as short-lived IoT signals that are currently ignored—could become valuable if it supports strategic goals. For instance, a factory might decide to start capturing brief spikes in electricity usage from machinery if those spikes correlate with future failures. Strategic planning can help identify which dismissed data streams to begin retaining.
Strategic actions manufacturers can take:
Map data to specific goals: Engage engineering, operations, and leadership teams to identify key business priorities, such as reducing downtime, improving yield, or enhancing quality control. Then, determine the data needed to achieve those goals.
Prioritize high-impact areas: Focus on capturing data from critical assets or processes, such as high-value production lines or frequently failing machines, rather than trying to cover every system.
Implement scalable systems: Adopt modern architectures, like a data mesh, that allow selective collection and integration of data across the factory without overwhelming legacy systems.
Opportunistic Approach: Extract Insights from What You Already Have
While strategic planning focuses on capturing the right data moving forward, the opportunistic approach looks at the data manufacturers already have but aren’t using. Dark data—such as logs from machines, historical production records, or maintenance reports—often contains valuable insights waiting to be uncovered.
For example, a manufacturer might already collect data from machine logs that track run times and error codes but hasn’t analyzed it to identify trends. By applying machine learning (ML) models, they could detect patterns that predict equipment failures, allowing for proactive maintenance. Similarly, analyzing historical production data could reveal correlations between environmental factors, such as humidity, and product defects—insights that could lead to better environmental controls on the factory floor.
Manufacturers can also use opportunistic analysis to guide future data collection. For instance, if analysis of existing machine logs reveals that certain metrics (e.g., temperature spikes) correlate strongly with breakdowns, they can start capturing dismissed data from similar assets.
Opportunistic actions manufacturers can take:
Audit existing data: Review stored data like maintenance logs, sensor readings, or production metrics to identify underutilized sources of information.
Apply AI and ML models: Use advanced analytics tools to analyze unstructured or semi-structured data, such as images from quality inspections or vibration data from sensors, to uncover hidden patterns.
Run pilot projects: Focus on one production line or asset class to test insights derived from dark data. For instance, predict maintenance needs for a specific type of machinery before expanding to others.
References:
Enterprise Strategy Group - Mike Leone, ESG Senior Analyst; and Keir Walker, Senior Market Research Analyst - 2022 State of Data Governance and Empowerment, July 2022 https://www.erwin.com/docs/2022-esg-state-of-data-governance-and-empowerment-report-analyst-reports-30424.pdf
Boston Consulting Group - A New Architecture to Manage Data Costs and Complexity, February, 2023: https://www.bcg.com/publications/2023/new-data-architectures-can-help-manage-data-costs-and-complexity
Splunk - The State of Dark Data, 2019: https://www.splunk.com/en_us/form/the-state-of-dark-data.html