Data Protection For An Analytics Data Warehouse


This is a continuation of our series highlighting why and how customers are deploying Imanis Data. Others in this series are here and here.


Summary: A leading online travel brand deployed Imanis Data to protect their business-critical consumer data and travel metrics data warehouse.


Industry: Online travel. This company simplifies travel planning and trip management by offering a variety of innovative tools and features such as exploring trips based on traveler budget and price forecasting online and via its mobile app.


Big Data Environment: To enable these innovative capabilities, the company deployed the Vertica data warehouse and the Hadoop platform. The data warehouse captures large, fast-growing volumes of travel data and makes it available to query-intensive applications and users. The Hadoop platform captures transaction data and also supports deep data analytics. The customer has deployed two 10-node Vertica production clusters and a large Hadoop cluster in their on-premise data center.


Challenges: Naturally, in the consumer-facing online travel business, uptime and data integrity of a company’s applications are critical. Any type of data loss or downtime would be unacceptable, resulting in lost revenue, lost opportunities and negative impact on their brand. Protecting the data stored in their Vertica and Hadoop environments became even more paramount because they were struggling with the homegrown solution they had put together using native tools that were provided by the Big Data platforms.


Their current solution was very inflexible, unreliable, network and storage intensive, and consumed a lot of operational resources. Backups were very topology dependent and any changes to the topology after the backups were taken would render the previous backups non-restorable. With their current solution, they could only restore the entire database with no ability to do granular table or partition-level restores. Since their current backup methodology involved a weekly full backup and incrementals during the week, the backup storage consumption was growing as the production environment grew and the network utilization was also very high. Finally, backup and restores were manual and onerous on their IT staff, especially when there were frequent failures of the backup and restore process.


Solution: The customer has deployed a five-node Imanis Data cluster in their production environment to address their backup and recovery needs for both Vertica and Hadoop. With Imanis Data the customer addresses all their current backup and recovery challenges for their Vertica data warehouse. The customer can implement granular backup and recovery workflows, down to the table or partition level. The Imanis Data software is topology-agnostic and has no limitations when it comes to recovery to different sized Vertica clusters. All backups using Imanis Data are incremental-forever and as a result significantly reduce the backup window as well as the amount of storage and network resources consumed. Since the incremental backups are fully materialized (aka, they are virtual full backups), restoring data is a single step process making recoveries fast, simple, and reliable. The automation and reliability significantly reduced the operational resource burden on the IT team.


The same Imanis Data cluster is also protecting the Hadoop namenode environment. With regular backups of the namenode database, the customer can ensure rapid recovery of their Hadoop environment in case of a corruption of the namenode database.


We have plenty of resources to help you understand how we can help you prevent data loss for Vertica, Hadoop and other big data platforms. Drop us a line with any questions

Sign Up To Receive Imanis Data Updates

Take the Next Step

Put Imanis Data to work for all your data management needs.