Modern Data Archiving: Managing Explosive Unstructured Data Growth

Modern Data Archiving: Managing Explosive Unstructured Data Growth

As unstructured data creation rates have soared, the timeframe for active use of data has shrunk due to edge computing, IoT systems, machine generated data, and let’s not forget GenAI. The period of data use today has largely been reduced to around 30-90 days before the flood of new data appearing makes the existing data either less useful or even redundant.

The constant flood of incoming data requires near relentless storage system expansion in an increasingly futile attempt to keep up. So, the need to have a modern archiving strategy is paramount to managing storage and hybrid-cloud systems.

The key benefits to a modern archiving solution are:

  • Cost Management/Reduction
  • Operational Efficiency
  • Sustainability Improvements

Archiving Versus the Rest

Archiving is an often-misunderstood term and that is confusing for all involved. It is important to have a clear understanding of exactly what archiving is and what it is not.

Archiving vs Tiering

Tiering and archiving are two very different things. Think of archiving as a removal company moving physical documents from file cabinets, placed into boxes, and then transported to an offsite facility for long term storage after which the moving company leaves and is no longer needed. When the files need to be retrieved, anyone (with permission) can access the files.

Think of tiering as a specialized librarian permanently in place to move one file at a time into a unique filing system for which only the librarian has the knowledge to get each file back. Tiering is essentially another name for Hierarchical Storage Management (HSM).

Tiering or HSM solutions have been tried in multiple forms over many years and inevitably they tend to cause more pain than benefit.

Archiving Versus NAS Cloud Gateways

NAS cloud gateways provide a global file system and therefore, global access via the public cloud to files traditionally stored in on-premises NAS storage systems. Since the NAS gateway device maintains all file related metadata in its global file system, it can be claimed that the device also serves as an archive front-end.

But while the NAS gateway considers the content archived, it is not true archiving because the metadata is stored in the NAS gateway so any access to that data in the event of recall is arbitrated by the gateway (very similar to tiering).

Both tiering and gateways present companies with an important concern. What if that solution gets decommissioned or the vendor goes out of business? How do you retrieve data when the app is no longer available?

Archiving Versus the Archive Storage Platform

The “act” of archiving data and the platform upon which the archived data will be stored, are two different things. While the platform represents an important decision to be made when creating archiving policies, there is more to archiving than just the target platform.

The ”act” of archiving data involves making critical decisions about what to archive and then moving it based on policies. The right data needs to be found among billions of files based on criteria such as length of time since a file was last accessed, the length of time since a file was last modified, or archive of data for a specific user ID.

Identifying these proverbial needles in the haystack, moving them to a suitable archiving platform, and having a vendor-neutral way to quickly retrieve the data is key to a modern archiving strategy.

 

A Modern Archiving Strategy

When creating a modern archiving strategy, it is important to consider whether to leverage an active archive, a deep archive, or a combination of the two.

An active archive is an archive that is used for data that has a modest or reasonable chance of needing recall. A deep archive is an archive that is used for data that has very little chance of needing recall but must be retained for either regulatory compliance reasons or for internal governance reasons. The deep archive can also become the next location for data in the active archive that has passed a threshold defined by corporate policy dictating its movement into the deep archive.

Clever use of the active and deep archives can generate higher cost savings, but access requirements need to be considered. Essentially, there is a balancing act between the frequency of access, the cost of storage, and the performance of the recalls.
Frequent Access vs Recall Performance vs Low Cost

Modern Archiving Starts with Insights – Use Your Metadata

The first step in archiving is to get insights about the profile of your files. Luckily, there is ample storage system assigned metadata that can be used to indicate when content was created, when it was most recently accessed, when it was last modified, and even whether the file is owned by an active or inactive user (orphaned files).

Policies can be created to dictate when files that meet certain criteria get relocated to the archive platform. For example, perhaps files that have not been accessed nor modified within the last 3 years are relocated to the archive platform.

Many organizations are surprised to learn that upwards of 60% of their stored data falls into this category. And with petabytes of data and billions of files being stored, this can really add up.

Modern Archiving Doesn’t Lock You In

Once the data to be archived has been identified, it is important to move the data to the new platform in the most efficient and least disruptive manner.

This means using a solution that is fast, scalable, and most importantly, doesn’t lock you into keeping it in place permanently for recall and migration purposes.

In most circumstances, each archiving event is akin to a migration. It is a point-in-time activity that should be a one-and-done event, not an arduous on-going trial with unacceptable future consequences.

 

Conclusion

In this blog, we’ve covered why tiering is not the same as archiving, why NAS gateways do not fulfill the full need of archiving, why the “act” of archiving is not the same as the archiving platform, and the benefits of modern archiving.

As an IT decision maker, adopting a modern archiving strategy can significantly benefit your organization by reducing storage costs but at the same time ensuring fast and vendor-neutral access to archived data.

To realize these benefits, you need a modern archiving software solution that can handle the scale and complexity of enterprise environments.

StorageMAP delivers the benefits of true modern archiving by providing the analytics necessary to find the archive candidates lurking among billions of files based on a variety of criteria. StorageMAP’s Unstructured Data Mobility Engine (uDME) has the power to handle the relocation of data to the desired archive storage platform without locking you in.

The net result is a far more manageable environment with the ability to continue to identify and relocate data as it continues to age and cross the archiving policy thresholds.