Exploring DQL – The Hidden Jewel of StorageMAP

Exploring DQL – The Hidden Jewel of StorageMAP

Over the past few months, we have been detailing some of the analytics features of our multi-vendor, multi-cloud unstructured data management platform StorageMAP in our blog. Today, I want to shine a spotlight on one of the lesser-known features of the product: Datadobi Query Language (DQL).

Why DQL Was Created

As digital transformation continues to increase the amount of unstructured data within networks, enterprises began to ask us for access to the detailed metadata scans in order to analyze and reorganize unstructured data lakes. However, in order for a customer to dissect the composition of the data, it requires some serious data reduction and aggregation in the set of billions of files.

We saw the need for a tool to query, aggregate, and reduce the amount of information about the data lake so it is consumable by the IT administrator and developed the query framework now known as DQL as a result.

How DQL Works Within StorageMAP

When a StorageMAP scan runs against a file server, a large amount of metadata is collected for each file server. While this data is important, only a portion of it is needed to populate the StorageMAP graphs and tables that an end user sees in the user interface. DQL provides a mechanism to both store and retrieve this metadata from the database.

A small sample of items DQL can identify within a data lake:

  • Cold data sets — data that is infrequently accessed
  • Old data sets — data that was created or modified some time ago
  • Shares, exports, or directories trees that are homogeneous and can be handled as one data set
  • Datasets that are owned by a specific user or group that no longer work at the company
  • And more

How Does This Help IT Leaders?

In short, it allows you to view the extensive amount of scanned metadata in a variety of different ways that may be needed for your particular environment.

Let’s use some examples. Say that you were tasked with finding every DWG file on a particular file system with millions of files that are all used by your drafting department. These queried DWG files need to be limited to only those modified in the past 24 months by a particular set of employees.

Or what if you were tasked to find every “orphaned” (data that has no defined business owner) file on a large file system used to house all your users’ home folders, and need to know the path to the directories where these exist?

For both situations, a DQL query could be created to provide exactly this information and more.

Here’s how.

DQL utilizes a format similar to standard SQL queries. This allows you and/or your DevOps team to script intricate queries for whatever data mining task may be needed. There are over 70 fields to query against. With DQL you can query fields such as path, name, size, shares/exports, a variety of time stamps, and owner/group details to name a few. The queries are not limited to files and directories. Queries could also be built for symlinks, sockets, pipes, and mountpoints. These queries can be parsed and sorted in a variety of different ways. The power of this tool is limited only by your own imagination.

After the query is run within StorageMAP, the metadata is saved to an embedded relational database which is accessible on the dropzone of the StorageMAP Core server. From the Core, you can view and sort this data using standard SQL statements. Plus, the data can be exported to a CSV file where it can be imported into tools such as Excel for additional custom reporting or analysis. This provides admins and developers the ability to work with the data in whatever platform they are most comfortable with.

Unlock the Possibilities of Unstructured Data with DQL

The combination of DQL and the metadata indexes it queries is quite powerful. This article only scratches the surface of what is possible with this technology.

Please reach out to Datadobi Sales at [email protected] if you would like to learn more about DQL and its benefits for your organization and how to go beyond unstructured data stored, to unstructured data managed.