All For One And One For All: Improving Data Mobility With Version 5.12

All For One And One For All: Improving Data Mobility With Version 5.12

If you’re already familiar with Datadobi, you know that we focus on unstructured data. This presents a unique data mobility problem domain separate from SAN-based block storage datasets. 

For Data Mobility, It’s All in the Architecture 

Both DobiMigrate and DobiProtect are architected in such a way that we separate orchestration activities from data mobility activities. The component in our architecture that handles data mobility is called a proxy, and these proxies literally execute all the heavy lifting required when copying data between any source and any destination. 

Proxies scan the source and destination to create the file system maps required for syncing the systems and execute worklists consisting of the delta and verification operations required to bring the source and destination systems to parity.

When you look at the protocols involved in migrating and/or protecting unstructured data, you’ll see four options staring back at you: 

  • NFS (v3, v4.x)
  • SMB (1.x, 2.x, 3.x)
  • S3 (multiple vendor implementations along with AWS) 
  • The Azure Blob API 

There are several versions and dialects of these protocols, and in some cases there are protocol clients built into the operating system to make life easy for all of us. 

Several years ago, we relied on the native OS and its kernel-based clients for handling NFS and SMB communication. This worked fine in the beginning, but we knew that we generated a specific pattern of I/O requests and low-level analysis showed that we were not able to reach levels of efficiency that we knew were possible. 

The inefficiencies were caused by the use of the OS kernel-based clients that are included with each release of Linux and Windows. These clients perform admirably for most use-cases, and since they run in the OS, kernel they have near direct access to hardware. 

Yet, almost by definition these kernel-based clients have to be a one-size-fits-all solution. It’s not fair to ask the OS developers to predict the needs of every application in existence, and to have a magical SMB or NFS client that performs optimally for each of those applications. 

Addressing Generic, Non-Optimized Client Behavior 

Datadobi has invested in creating our own user-space SMB and NFS clients. You may be asking what the difference is between kernel and now the userspace, so what’s the difference? 

Simply put, an OS can be viewed in a set of concentric rings:

  • Ring 0 is at the center and where the OS kernel runs. 
  • Rings 1 and 2 are where hypervisors and device drivers run and the kernel manages them all from Ring 0.
  • Ring 3 is the userspace and where all applications run. These applications interact with the kernel in Ring 0 via syscalls and the kernel manages the applications to ensure they don’t interfere with each other. By running a protocol client in userspace we can bypass the kernel-based client.


What this Means for Data Mobility and Datadobi Customers 

While it might appear we simply reinvented the wheel and are running it as an application which is further removed from the kernel —it’s not unreasonable to think we’ll pay a performance penalty, so why bother? 

While it’s a fair thought, the discussion isn’t that simple — especially when you consider the one-size-fits-all behavior versus that of a purpose-built client that knows exactly what I/O profile it will need to handle the speed benefit of the kernel-based client to exceed. 

Let’s examine this further. 

Examining Kernel-Based Client Behavior 

Since the kernel-based clients have to support unknown workloads, there are assumptions such as caching behavior, whether or not requests can be pipeline, etc., built into the code. The default behaviors work really well for a variety of workloads, but they are not optimal for use cases, unlike with DobiMigrate and DobiProtect. 

With userspace, clients we can tailor our client behavior to the exact requirements for the I/O patterns we generate and this results in a number of benefits for our customer including: 

  • Enhanced performance– Kernel-based clients have behavior we don’t want. For example, they will cache content when read because the assumption is the user will re-read the file. We don’t want that because it wastes time and memory. Datadobi pipelines our requests by sending multiple requests bundled as one to get better response from the SMB server.
  • Better instrumentation and metrics – With a userspace client, we can probe performance behavior in our own stack far more effectively than by monitoring with generic OS-level utilities.
  • Accurate Error Reporting – Kernel-based clients are somewhat notorious for reporting issues that tend to be more symptomatic versus indicating any true root cause. The net result is quicker diagnosis of the true issue versus “red herrings” that draw attention away from the root cause. 

Perhaps the biggest benefit to highlight is breaking any dependency on the OS itself. It’s far more convenient to deploy a single proxy that can speak multiple protocols versus deploying proxies and having to match the OS to the type of client needed. 

Deploying a couple of proxies that speak both SMB and NFS all running on Linux is certainly cheaper and easier than deploying two Linux proxies specifically for NFS, and two additional Windows instances specifically for SMB traffic. Instead, two proxies speaking both NFS and SMB protocols simultaneously is easier to deploy, monitor, and manage – plus it saves a little bit on Windows licensing. Depending on the size and complexity of your environment a deployment could be as small as a single VM. 

As for the full file access layers, we had already developed a userspace NFS client and introduced it in a previous version. Since then, we focused our attention on developing our userspace SMB client to complement the NFS client. 

With the 5.12 release we have rounded out our offering of all userspace client implementations – SMB, NFS, our S3 client, and Azure Blob client. 

Learning More about Datadobi’s Engine Userspace SMB is only one aspect of the new 5.12 release. For more details on the other enhancements available in this new release, I encourage you to take a look at our release notes here for more information.