The goal of the CTA has always been pretty straightforward, keep data that is meaningful to the business on high-speed, high-cost storage, and then use policies to transparently tier older, less-valuable data off to cheaper commodity storage [and then cloud storage].CTA accomplishes this by truncating the original file down to a very small pointer, often referred to as a stub. Drastically smaller, the stub only contains the metadata of the original file.
Rainfinity File Management Appliance, later renamed just FMA, then renamed to Cloud Tiering Appliance or CTA for short, was most often implemented behind EMC Celerra/VNX or NetApp 7-Mode using FPolicy.
While these solutions certainly accomplished their goals of driving down the total cost to store archive data, they also came with a trade-off. The very nature of being an archiving appliance also comes with it the understood goal of being reliable.. but slow. That’s fine, most of the time, when you pull a few files back from the archive. But when you need to retire the archive it can quickly become a nightmare.
The Pain
A frequent frustration from customers migrating data off CTA is the retrieval in support of the migration is prohibitively slow. It’s not unusual for a CTA migration take weeks, sometimes months or even years when performed by legacy migration tools. or tools that must communicate with the CTA in order to retrieve the CTA stubbed data.
Real world examples
- CustomerA, a large multi-national bank had VNX/CTA/Centera, moving to Isilon. 500TB of data archived to Centera. Trying to copy through the front-end data mover on the VNX, they were getting 500GB per day. That’s 1,000 days, or 2.74 years, just for the first full copy. The array in question only had 6 months, or ~180 days left in maintenance.
- CustomerB, a global pharmaceutical company with Celerra/CTA, moving to Isilon. 200TB of data in the Atmos archive. Copying through the front-end Celerra data mover, they were getting 100GB per day. That’s 2,000 days or ~5-1/2 years for just the first full copy.
What Causes This??
There’s common misconception that this is due to slow performance of back-end storage (CTA target) the data has been archived to. The reality is quite different, however. In most cases, the bottleneck is from the rapid succession of stubbed data requests related to the migration. The CTA is simply not designed to accommodate random, high frequency requests. With the CTA in the in the data path, accessing and migrating thousands to millions of files will always be slow.
Three main causes of this delay:
- When a copy job hits a stub-file and has to read it, it does what’s called a pass-through read. This means that some service on the NAS head/filer/datamover itself, has to reach into the archive and pull the file back into RAM of that NAS head, then it sends it over the wire to the requesting client. This recall service/daemon on the NAS head eats up CPU cycles, and temporarily holding the file to send to the client, consumes a large amount of Memory.
- Because these HSM solutions were designed with random and infrequent file recall, they typically have a fairly low max concurrency setting. For instance in the VNX/CTA/Centera use-case they only permit about 28 concurrent threads. If you were migrating this data, not only are you facing delays from recalling the data, you have to deal with this bottleneck as well.
- One VNX/Celerra data mover will never use more than 1 CTA or CTA-HA at a time for recall operations because the solution uses Round Robin DNS for load balancing. Embedded in the stub is a pointer that is a DNS A-Record which has an entry for each CTA or CTA-HA in the environment
Can DobiMigrate Fix This? (Yes.)
Welcome to the “CTA Bypass” method. With this type of migration, DobiMigrate simply detects the presence of CTA stubbed content and copy directly from the CTA target (Centera, SMB or NFS target, etc.) without going through the recall process. This alone saves an exponential amount of time during the copy process while significantly reducing load on the overall solution including the production storage controller.
To make things even better, DobiMigrate overcomes the limitation on max concurrent recall threads by accessing the CTA directly. This means hundreds of concurrent copy threads and and even greater copy performance.
And, that’s just during the baseline copy process. With DobiMigrate’s highly scalable scanning techniques, long duration cut-over windows are a relic of the past.
Results?
- 30x increase in items/second
- 4x increase in bandwidth utilization
- An overall 87% reduction in duration of the initial data copy
CTA migration with 83.5 million files | CTA Recall | Bypass |
First copy | 89 Days | 11 Days |
Final sync | 12 hrs 53 mins | 47 mins |
Those are the results from a recently completed CTA migration for a customer on the East Coast. To put that in perspective, a migration that was previously scoped to take 80+ days completed it just shy of a week. In addition, not only did the customer realize a drastic reduction in completion time, the project resources, project manager and many others were freed up from weekly status calls, not to mention change control. To make it even better, the customer was able to avoid signing an additional support contract, save money on power consumption and get to their newly invested hardware two months earlier than expected.
Want to learn more?
Please contact us at Send mail
Customer environment referenced above:
- Dataset = 83.5 million files
- 1gb bandwidth between all devices, 30+/-% utilized
- Two SMB proxies implemented
- 250 Mbit/s throttling configured