Filesystem Analytics and Discovery
Yancy Blum, Senior System Engineer
The Simplicity Illusion
Have you ever been put into a position of having to solve a file migration problem without the faintest clue on where to begin? Often this is the case when storage admins or IT staff are thrust into being asked to solve seemingly simple data migrations only to find out it’s not as easy as initially perceived. From copying over ACLs, DACLs, and other metadata to discovering your files are being used by multiple protocols, the pitfalls can be many. The ability to quickly ascertain the makeup of your datasets will be key to having a successful migration experience.
Although file migration has been around for many decades the toolsets used for migrating data are often disappointing and require constant monitoring and faith that the data will indeed make it from one system to the next. DobiMigrate was purpose built with nothing but file migration in mind and fills the void of a true enterprise migration tool.
Visualizing the complexity of the layout and characteristics of your files and filesystems is critical in planning your next migration. As data and metadata exponentially grow, we need ways to quickly understand what we are moving, how long it will take, and if its integrity will be impacted along the journey. Graphs and charts are excellent at conveying complex data quickly and effectively. Wrapping your head around file relevancy or trying to make heads or tails of what storage system houses what datasets and if those datasets are still in use can bring even the most seasoned veteran to their knees because outdated legacy tools provide zero information on the state of the source system’s data landscape. Fortunately, the purpose-built tools included in DobiMigrate have been developed to tackle these and other challenges, making the process of planning and executing your migration exponentially easier.
As files fade into obscurity it is often difficult to know what is and is not important. Understanding when it’s time to put your files out to pasture (archive, tape, Amazon Glacier) becomes challenging when dealing with larger datasets. Using the reporting module in DobiMigrate, we can quickly look at files based on many attributes, one of which is the last modification time. When performing your data migration, why not better position those older files that rarely get touched to an appropriate storage tier. The below image shows files that have not been modified in over three years! Might be time for a file catharsis.
Determining file age using our DobiMigrate reporting module
A Thousand Paper Cuts
Small files (less than 128KB) in mass are often the bane of any large migration. Small files consume vastly more resources compared with a file of larger size (256KB and greater) and never truly get to fully take advantage of hitting optimum throughput of your infrastructure. Let’s take a look at why. Regardless of the size of a file there are certain truths that cannot be altered. Whenever a file is transferred over a network there is a certain amount of overhead associated with its network protocol. It is true that the time it takes for a small file to migrate is not very long compared to a large file, but one must look at the big picture to truly understand the effects small files cause on a migration. Small files never really achieve optimal throughput or even optimal speed as shown in the graph. The process repeats millions or billions of times and is ultimately far less effective than moving larger files. Let’s compare a 1KB to a 1GB. The 1KB files have to make 1 million trips before they equal 1GB of file size. The cost benefit of moving a larger file becomes quite apparent when factoring in overhead and time variables because the 1KB file never reaches optimal throughput but the 1GB file does.
Per size analytics options using our DobiMigrate reporting
Setting Better Expectations with Analytics
There is one question that is asked more times than I can count, “About how long will the migration take?” Of course, the answer is always, “It depends.” In a “perfect environment” based on all things being equal with minimal latency and consistent R/W IO performance from the source and target… and with the number of files and average file size being optimal, we could expect your migration to take x days or x hours; but, as we know, this is never the case and the “perfect environment” rarely exists when performing a migration. Things change all the time during a migration as bandwidth utilization ebbs and flows and files grow, shrink, and vanish. The change rate of your data is a living and breathing digital animal that sometimes needs to be kept on a short leash. That metaphorical leash comes in many forms such as both internal and external SLAs, user permissions, and bandwidth limitations to name a few.
DobiMigrate exceeds expectations and enables both accuracy and consistency when setting your migration goals. The more details we gather around our file data before and during a migration, the better we are positioned to create a migration plan that will meet expectations.