Backing Up and Restoring a Very Large Cassandra Database


In this post, we highlight how Imanis Data helped an industrial manufacturer protect a very large Cassandra database environment that is hosted in AWS.


Summary: A leading industrial manufacturer makes Imanis Data a core component of their business-critical testing infrastructure


Industry: Manufacturing. A Fortune 500 manufacturing company is a leading provider of industrial machines and components for commercial and military use. Recently the company embarked on a digital transformation project to unlock the value of the large streams of data generated from these components to provide incredible insights and operational value to their customers.


Big Data Environment: To realize their vision, this organization has deployed a NoSQL platform powered by Datastax Enterprise (DSE) to capture machine test data. The database is hosted in Amazon Web Services (AWS) and is continuously ingesting data as they are undergoing testing. The customer has two 32-node Cassandra databases, one for production and the other one for testing, and all data is stored in a single 48 terabyte table which is growing every day as new data is ingested into the system.


Challenges: Their Cassandra databases consists of one keyspace that has one large 48TB table. Backing up and recovering the single large table was very unreliable due to frequent failures occurring while the table was being backed up. The customer was using EBS storage as a backup destination and that cost was increasing rapidly to a point where the customer was looking for alternatives to their backup and recovery strategy. Secondly, the customer had a need to copy data from the production database to the test cluster every 6 hours so that the testing team had access to fresh data from production. This process was accomplished manually using scripts that were internally developed. A lot of engineering cycles were spent in developing, running, and debugging the process to ensure test data was available to the test team in a timely manner.


Solution: The customer has deployed an Imanis Data cluster in AWS to address their backup & recovery needs as well as to copy data from the production database to the test database. Given the unique database environment that contained a single large 48 terabyte table, Imanis Data’s incremental-forever technology proved very efficient and resilient by only moving data changes during the backup process. This made the backup process very reliable and fast for the customer. To reduce backup storage costs, the customer used the native Imanis Data Amazon S3 integration and storage optimization capabilities. Using Imanis Data, the customer is now able to store backup data on Amazon S3 at a significantly lower cost compared with Amazon EBS. In addition, the Imanis Data de-duplication technology reduces the overall amount of data that needs to be copied and stored in Amazon S3, further reducing the backup storage costs.


The same Imanis Data cluster mirrors production data to the test Datastax database environment. The customer automates the entire process by creating a mirroring workflow that copies the large table from the production database to the test database every six hours. After the first full copy, all subsequent data transfers to the development cluster are incremental only resulting in much faster transfers with significantly lower network bandwidth utilization.

Sign Up To Receive Imanis Data Updates

Take the Next Step

Put Imanis Data to work for all your data management needs.