Reducing Cassandra Backup Storage Footprint & Simplifying Recovery


This is the second in a series on customer deployments with Imanis Data for Cassandra backup and recovery. The first describing a customer deployment scenario in AWS is highlighted here.


Summary: A large business and financial management software company turned to Imanis Data to simplify backup and recovery in their large multi-datacenter Cassandra deployment.


Industry: Technology (Computer Software). A Fortune 1000 software company that provides business and financial management solutions for consumers, small businesses, and accounting professionals.


Big Data Environment: This Imanis Data customer has standardized on DataStax Enterprise (DSE) as the underlying NoSQL database. All databases and applications are hosted internally in their own data centers. The customer currently has 32 terabytes of Cassandra data in a 75-node DSE cluster spread across four data centers.


Challenges: The customer was using snapshots for Cassandra backup and recovery. Their DevOps team had developed some scripts to automate the process of taking snapshots on a periodic basis, a time-intensive process and a drain on storage resources.


Specifically, Cassandra snapshots created a few challenges for this customer because of their large and distributed environment, including:


  • Due to regular Cassandra compaction, snapshots end up consuming a lot of storage space on the production cluster. Even after allocating around 50% of the production storage for snapshots, the customer frequently ran out of space on the production cluster. This directly impacted the applications that were being supported by the Cassandra environment resulting in downtime. Engineers would have to be called in to remove some of the older snapshots to free up space taken up by older snapshots.


  • Cleaning out older snapshots limited the number of snapshots that the customer could keep for recovery purposes. This was a big limitation since the company had internal policies to retain older Cassandra backups to meet their compliance needs.


  • Recovering data from snapshots was a very onerous and manual process. The database administrator would have to browse the different snapshot directories, identify the files associated with the tables and keyspaces that needed to be recovered and copy them from snapshots to the appropriate Cassandra directories. Not only did this take time and effort but it also was error prone.


Cassandra Backup Solution: 

The customer has deployed a Imanis Data cluster to back up their entire 75 node DSE cluster. Deployment and configuration of the Imanis Data software was quick and easy and the customer was doing backup of data in their production environment within an hour. The customer is our multi-datacenter capabilities to significantly reduce the amount of data that has to be transferred across a wide area network. All backups are de-duplicated, compressed, and stored on direct-attached storage outside the DSE clusters.

Sign Up To Receive Imanis Data Updates

Take the Next Step

Put Imanis Data to work for all your data management needs.