Why the traditional approach to unstructured data has been storing up problems

As NVIDIA CEO Jensen Huang recently pointed out, “about 90% of the data generated every single year is unstructured data,” adding that much of it has, until recently, been “useless” because it could not be easily queried or indexed.

It’s a strong assessment, and as he went on to explain, AI is now changing this by enabling the interpretation of meaning in unstructured data and converting it into formats that can be searched and analysed more effectively.

While this is correct, for many organizations, the more pressing reality is that unstructured data remains poorly understood and largely unmanaged. That doesn’t stop businesses everywhere from collecting and storing it by default, in huge volumes, across a diverse mix of on-premises systems and multiple cloud environments.

The common denominator to this collective data addiction, however, is that organizations typically lack visibility into what data they hold, where it resides, how it is used, and whether it still has value. This leaves large volumes of data stored without a clear purpose or active management. Instead of working to address the issue, many technology leaders choose to keep adding more storage hardware.

A perfect storage storm

To say this has significant cost implications is putting it mildly. In early March, some NAND Flash prices doubled overnight as demand from the AI industry surged, which in turn impacted cloud storage costs, with all of the big hyperscalers increasing their prices. These pressures feed directly into storage and infrastructure investment across every enterprise. 

This represents a perfect storm for those responsible for managing storage environments, who are being asked to accommodate rapidly growing data volumes while operating under competing cost/performance demands.

Costs aside, the issue is not simply the volume of unstructured data, but the lack of visibility and control over how it is stored and used. Without good visibility, organizations have no reliable way to determine which data is active, which is redundant, and which can be archived or deleted. Even if storage infrastructure was currently very cheap, this lack of insight would still prevent organizations from managing their environments efficiently or ensuring that data is governed and used in a meaningful way.

Adding to the cost is an approach to storage often characterised by large volumes of inactive or low-value data continuing to occupy high-performance, high-cost storage. Over time, this drives unnecessary infrastructure expansion and increases governance challenges, particularly where unclear ownership and inconsistent data handling raise the risk of compliance and security issues.

What needs to change?

Addressing these challenges is becoming increasingly urgent, irrespective of whether rising infrastructure prices are a spike or more of a long-term ‘new normal’. What’s needed instead is a fundamental realignment in how many organizations view storage, moving away from a narrow focus on availability and capacity utilization towards a stronger emphasis on better data management.

If we accept that simply adding more storage is no longer viable, even as a short-term fix to the unstructured data challenge, organizations need the ability to understand what data exists across their environments, supported by detailed metadata insights such as age, activity and ownership, among various other key factors.

By default, this level of insight allows organizations to distinguish between active, high-value data and information that is no longer relevant or in use. Data can then be properly and economically managed across its lifecycle, including opportunities to archive or even delete parts of the estate.

The value of good governance

Given the nature of modern IT infrastructure, this approach will only work if it is applied across heterogeneous environments, including the various storage platforms, locations, and cloud providers businesses routinely use. Without these capabilities in place, organizations will continue to struggle to control costs and optimize performance.

Effective storage optimization of this kind is also dependent on establishing clear governance, including defined ownership and accountability. While effective governance is always valuable, in this context, it helps address the issues associated with large volumes of data that become orphaned over time, with no clear owner or defined purpose, making it harder to manage effectively.

Introducing consistent management policies ensures that data is handled in line with both operational requirements and regulatory obligations. These must be consistently applied across all environments; there is no benefit in implementing effective governance on a piecemeal basis. Regular auditing and monitoring are also essential to ensure that data remains aligned with defined policies as environments evolve.

Put it this way, the approach many businesses have taken to unstructured data and storage infrastructure is no longer sustainable. For very obvious reasons, businesses are increasingly reluctant to maintain previous levels of investment in storage infrastructure. While this is an unfortunate set of circumstances, it also presents an opportunity to rethink how storage environments are managed, shifting the focus from expanding capacity to managing data more effectively over the long term.