Ensuring Data and Metadata Accuracy in the AI Era: Key Takeaways from SNIA’s “Intelligent Data Management: Shaping the Future for AI Workloads”

The convergence of AI, multi-cloud environments, and decades of accumulated data has created unprecedented challenges for organizations managing unstructured data. At the recent SNIA webinar on “Intelligent Data Management: Shaping the Future of AI Workloads”, Carl D’Halluin, CTO of Datadobi, explained why getting data and metadata right matters more than ever, and why it’s way more complicated than most people think.

AI Is Breaking Down Data Silos—And Exposing Hidden Risks

For 40–50 years, enterprises have been accumulating data across countless applications, each writing information in its own way. As they work to organize this sprawl to maximize insights, strengthen compliance, and reduce risk, AI has emerged as a significant catalyst, suddenly operating across environments that were never designed to interact. This creates scenarios where data might be written by one application, metadata by another, and AI training or inference engines read both. That convergence exposes a fundamental challenge: ensuring that data and its metadata remain accurate and interpretable across different systems, vendors, and protocols.

Welcome to Multi-Vendor, Multi-Protocol Chaos

Today’s organizations face a complex storage landscape. Customers rely on numerous vendors to store unstructured data, some on-site, some in the cloud, mixing file and object storage. The result is a tangled web of protocols: SMB, NFS, multi-protocol NAS, S3, Azure Blob, Google Cloud Storage, Swift, and more. As Carl puts it, each one is like “different dialects and different ways to work.”

Datadobi ensures that both data and metadata are accurately interpreted and copied across all these different protocols and vendors, something they’ve learned over 15 years of working in the data migration and management space.

Why “Just Moving Bytes” Is Way Harder Than It Sounds

You’d think copying data would be simple; we’re just moving bytes from point A to point B, right? Wrong. Carl’s presentation showed exactly why that assumption can get you into serious trouble. The complexity shows up in three areas:

  1. Access Methods: Different authentication systems, parallel access needs, and performance quirks create instant headaches. Pick the wrong target system or use incompatible access methods, and your migration dies before it even starts.
  2. Data Itself: Beyond just the raw content, you’ve got file format variations, encoding differences (ever tried moving US character sets to German systems?), content size limits that change by platform (AWS just bumped their max object size from 5TB to 50TB, something competing object storage platforms have been offering for a long time), file and object name encoding, case sensitivity issues, illegal characters, and path length restrictions. Even something as basic as how many files you stuff in one directory can create bottlenecks that tank your performance.
  3. Metadata: This is where things really go sideways. What metadata actually means varies wildly from system to system. You’ve got format and encoding inconsistencies, timestamps and timezones that don’t play nice together, security attributes that don’t translate, and versioning nightmares—all of which can corrupt your data or lose critical context. As Carl put it, “For applications to keep reading data, we need more common themes.”

Five Critical Data Management Scenarios

Carl outlined five scenarios where data and metadata portability are essential:

  1. Data Migration: Storage servers don’t last forever, every few years, data needs to be moved to newer systems, which can mean different protocols and vendors without losing any metadata along the way.
  2. Data Replication: When you’re creating backup copies for vaults or disaster recovery, you need accuracy and control, especially when you’re copying between different systems and vendors to protect yourself from vulnerabilities.
  3. NAS to Object in Cloud: Shifting from old-school file storage to cloud object storage without breaking anything.
  4. SSD-Based Systems and Cross-Company Collaboration: Modern flash systems and sharing data between companies leave no room for sloppy data handling.
  5. Data Archival: When you’re archiving data for eDiscovery and compliance, it needs to stay perfect for potentially decades.

Preparing for the New World

If you’re sitting on decades of data, you need to make data and metadata portability across systems, applications, and vendors a top priority. With AI tearing down old data silos and creating new ways to access information, the cost of getting this wrong has never been higher.

This complexity isn’t going anywhere, it’s only getting worse. But if you take the right approach and use proven cross-platform data management solutions, you can keep your data accurate, accessible, and valuable for decades to come.