Blog

Lorem Ipsum

The Collision of Machine Learning and Ransomware

Taxonomy/Category

Europe’s recent wave of ransomware attacks put ransomware on the map for many, and heightened concerns for others. The latest attack, which goes by several names including Petya, Petrwrap and GoldenEye, is far more malicious than its predecessor WannaCry, as it mimics ransomware but then doesn’t allow its victims to recover their ransomed data. Here in the U.S., a Philadelphia health care clinic was hit with an attack that compromised the personal and health data of 300K patients.

 

While it’s impossible to predict the next attack and know what form it will take, having an actionable recovery plan will make it easier to rebound with as little downtime and loss as possible. It’s the insurance policy you hope you’ll never need, but the truth is more than 90% of companies have been impacted by breaches, threats and other malicious behavior.

 

That’s why the Imanis Data software incorporates unique machine learning algorithms designed to combat ransomware and identify anomalous data loss. We have dubbed this functionality, ThreatSense.

 

Imanis Data ThreatSense includes:

 

  • Intelligent Monitoring. The ThreatSense architecture uses machine learning to identify patterns of data movement and churn, and builds custom predictive models to immediately flag anomalous events as possible threats or intrusions.

 

  • Predictive Analytics and Point-in-time Recovery. By providing intelligent on-going predictive inputs, the Imanis Data software can also help an organization recover their data to a previous pristine point-in-time, enabling it to resume operations with minimal downtime repercussions.

 

  • Smart Alerting and Reporting. Users will benefit from immediate email alerts and reports that include details such as date, time, data moved, etc. so they can act quickly to address potential attacks and accidental data deletions.

 

 

Our customers understand the need to lower their RTO/RPO thresholds and ThreatSense provides them with added capabilities to support those business goals. It benefits from the underlying Imanis Data architecture that easily scales with production data and leverages application-aware technology that monitors various data and metadata attributes (nearly 50) to come up with baseline patterns.

 

ThreatSense also enables user input, so if there is a rational explanation for an anomalous data loss the user can adjust the ThreatSense findings to provide additional learning for the underlying algorithm.

 

We’re pleased to offer these intelligent features within our platform. Machine learning and predictive innovations are going to help protect organizations from the dangers of ransomware and other anomalous data losses. What was once considered futuristic or a “nice-to-have” is now the new baseline. And it’s just one of the areas that set us and our innovative customers apart. Drop us a line so we can demo these new capabilities to you.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Introducing Imanis Data

Taxonomy/Category

When my co-founders and I first came together in 2013 to discuss what would become Talena, our mission was clear: help companies minimize the impact of data loss in today’s modern data ecosystem; a big data world built around NoSQL databases, Hadoop and newer data warehouse technologies. Much has changed since those early days and we’re excited to reveal our new name: Imanis Data. The Latin root, immanis, means immense, huge or vast. Our new name — and our new website — better reflect who we are today, the vast modern data challenges we’re helping leading enterprises address – and where we’re collectively headed, as massive amounts of data, and the insights we can glean from it, push us forward.

 

We’re also thrilled to unveil our next-generation data management software platform, and you can learn more about that in our official press release, as well as our blog post. A company’s data – and how it’s managed, protected, analyzed and optimized – is strategic and a top-of-mind concern across the C-suite. Companies are building critically important modern data applications at a rapid pace and issues around minimizing data loss are increasingly top of mind.

 

Like many start-ups before us, we started with a simple idea to address a complex problem. The rise in new business-critical applications exposes companies to greater risk for human and application errors as well as ransomware, and losing critical data would mean weeks of downtime for some, an irreversible loss for others. What’s more, we knew that companies would need infinite scale and unmatched performance to protect these enormous data sets in this new era.

 

Our success is in large part thanks to the support of our earliest customers and partners who took a chance on a start-up with a game-changing promise. Would it prove to be good to be true? We are proud to say, no. We delivered and the market responded.

 

An industry first when we launched in 2015, the Imanis Data software remains the industry’s fastest data backup and recovery solution with built-in machine intelligence to handle massive data sets and effectively manage and quickly recover data in the event of corruption or loss. Today we protect thousands of production nodes across dozens of Fortune 500 companies. What’s more, we’re minimizing downtime costs that average $750K per incident and saving our customers up to 80% on secondary storage needs.

 

We’ve seen rapid growth across our customer base in nearly every sector with significant traction in financial services, retail, manufacturing and technology. We’re also proud to have built a robust partner ecosystem that includes more than a dozen of the world’s leading technology, consulting and reseller partners.

 

We’re committed to constantly enhancing our high-performance data management architecture that gives our customers peace of mind allowing them to put all that business-critical data to good use. We’re excited about introducing many more companies to the unique and critical capabilities of Imanis Data—to helping them prevent data loss and realize the benefits of strategic backup and recovery.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Imanis Data 3.0: Machine Learning, New Data Platforms & Even Faster Performance

Taxonomy/Category

When we launched two years ago, the need to support companies building mission-critical applications on top of modern data platforms like Cassandra, Couchbase, Hadoop, MongoDB and Vertica was growing fast. Today, these applications continue to expand rapidly within every large enterprise. As they increase in popularity so do the risks for human and application errors and the threat of critical data loss. The risk is all too real and increasingly common. So common, 3 out of 4 companies have experienced a loss over the last year; a loss that carries with it an average cost of 900K and weeks of downtime.

 

That’s why we’re pleased to introduce our latest platform, Imanis Data 3.0.

 

A game-changer when we first launched, Imanis Data 3.0 remains the fastest backup and recovery platform on the market. Our cloud-ready elastic architecture backs up, recovers and replicates terabyte and petabyte-sized data sets and beyond up to 10 times faster than any other solution on the market – the kind of speed that greatly minimizes the impact of data loss by reducing costly days and weeks of downtime to minutes and hours and reduces secondary storage costs by up to 80%. What’s more, our intelligent software enables early detection of ransomware attacks and proactively identifies accidental data loss, while also offering unmatched scale and performance.

Over the last year, we’ve backed up more than 3 billion documents in a single Couchbase bucket for a leading gaming company; handled over 100 terabytes in a single Cassandra keyspace for a top manufacturing company; and recovered data 10 times faster than the native database utility for one of the world’s leading banks.

 

New 3.0 features include:

 

  • Broadest database support: With this release, we’ve added new support for MongoDB, Oracle test data management, as well as support for Cassandra 3.0 and DataStax Enterprise 5.0, further extending our broadest range of modern data platform support in the industry.

 

  • Deep integration with cloud platforms: We remain the only cloud-ready data management platform with multi-tenancy support and native integration with Microsoft Azure Blob Storage and HDInsight. In June, we announced our availability on the Azure Marketplace, enabling companies to launch an Azure-certified Imanis Data cluster via a single-click installation to quickly migrate workloads to HDInsight. In addition, for Amazon Web Services, we have native integration with Amazon S3 and Amazon Glacier.

 

  • Unique machine learning for ransomware: Imanis Data 3.0 delivers built-in and expanded machine learning to proactively identify ransomware, notify about accidental data deletion, and ensure business SLAs for recovery time.

 

 

  • Fastest backup. We offer seamless and rapid backup granularity, helping companies greatly minimize downtime. No one comes close. We also are the only solution with the ability to back up Hadoop NameNode metadata.

 

We are proud to be unveiling our new platform today and look forward to supporting our existing and growing number of Fortune 500 businesses in nearly every market. Why are so many successful companies across data-centric industries like retail, manufacturing and banking choosing Imanis Data? Because they understand the value of their data and what a loss will cost their organizations in time, money and reputation. And because Imanis Data is the fastest backup and recovery platform that also delivers extreme scale, rapid recovery, machine intelligence and smart storage optimization. It’s a win-win.

 

I hope we have the opportunity to introduce you to the power of Imanis Data 3.0 soon.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Cloud Data Management at Exabyte Scale

Taxonomy/Category

Earlier this year I hosted a webinar that discussed the key data management challenges enterprises face when running or migrating big data workloads to the cloud. These workloads are moving in greater numbers to the cloud for well-known reasons: business agility, flexibility, and minimizing capital expenditures, among others. Yet, often left unanswered is the question of how companies optimize data storage and data management for these new workloads.

 

This post highlights how the Imanis Data architecture optimizes cloud data management. Our philosophy from the very beginning is to be compatible with any infrastructure deployment architecture: whether exclusively on-prem, exclusively in the cloud, or hybrid deployments. This flexible approach enables us to support a wide variety of different data backup, mirroring and recovery use cases, and is enabled by how the Imanis Data file system handles these different storage requirements.

 

The Imanis Data File System

The Imanis Data file system is built on a storage tiering model. It can federate data over multiple tiers of storage transparently based on user defined policies. The first storage tier is typically the block storage tier – examples include Elastic Block Store (EBS), managed disks, direct attached drives, storage-attached network/network-attached storage devices. The second storage tier is an object storage platform such as Amazon S3 or Azure Blob Storage. Finally, the last tier of storage is a cold storage platform such as Amazon Glacier. A user could define a policy to keep the data in the block tier for 5 days, in the object tier for 25 days and in the cold tier for 6 months. The Imanis Data file system transparently migrates data between different tiers. Similarly the data access also is transparent to the user since our file system has built-in capabilities to natively retrieve the data from different tiers using supported access protocols.

 

Data Backup and Mirroring in the Cloud

The storage in Imanis Data is unbounded. What this means is that we built the Imanis Data file system in such a way that it can automatically span the different types of storage highlighted above. For example, Imanis Data can back up a directory where two files are stored on local storage while the remaining six files are moved to S3. More importantly, this process is completely transparent to the user as to where the files are stored because our file system presents a unified namespace across different storage tiers. If a user migrates data from local storage to cloud storage using a policy that they have created, the data movement happens asynchronously without the user needing to be aware of the underlying migration process. As soon as the migration is done, the space occupied by the original files is freed up and immediately available for re-use.

Let’s take this one step further. If your company deployed workloads in a multi-cloud environment, say AWS and Azure, Imanis can designate specific backup workflows to go to S3 and others to go to Azure Blob Storage. Because we separated the compute and storage layers, Imanis Data can handle huge amounts of storage relative to the compute requirements. Furthermore, the Imanis Data storage optimization engine can store de-duplicated data on S3/Azure Blob Storage, reducing your storage footprint even further.

 

Data Recovery from the Cloud

During recovery, the Imanis Data file system will immediately figure out if the data is, for example, on the local file system or on the object storage tier and read the data from the appropriate storage location. All of the data restore operations are completely transparent to the users and the data is seamless fetched from the storage tier where it resides. This is in stark contrast to what would happen if you were to use scripts: you would need specific scripts for each storage location.

 

Conclusion

The Imanis Data architecture is flexible enough to address the key issues of scale, bandwidth, and cost of cloud data management. As a result, we’re used by some of the largest enterprises running the most demanding big data workflows. We encourage you to check out this video of how Imanis Data works and contact us to learn more about the ideal big data cloud management solution.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Backing Up and Restoring Very Large Data Streams

Taxonomy/Category

In this post, we highlight how Imanis Data helped an industrial manufacturer protect a very large DataStax Enterprise environment that is hosted in AWS.

 

Summary: A leading industrial manufacturer makes Imanis Data a core component of their business-critical testing infrastructure

 

Industry: Manufacturing. A Fortune 500 manufacturing company is a leading provider of industrial machines and components for commercial and military use. Recently the company embarked on a digital transformation project to unlock the value of the large streams of data generated from these components to provide incredible insights and operational value to their customers.

 

Big Data Environment: To realize their vision, this organization has deployed a NoSQL platform powered by Datastax Enterprise (DSE) to capture machine test data. The database is hosted in Amazon Web Services (AWS) and is continuously ingesting data as they are undergoing testing. The customer has two 32-node Cassandra databases, one for production and the other one for testing, and all data is stored in a single 48 terabyte table which is growing every day as new data is ingested into the system.

 

Challenges: Their Cassandra database consists of one keyspace that has one large 48TB table. Backing up and recovering the single large table was very unreliable due to frequent failures occurring while the table was being backed up. The customer was using EBS storage as a backup destination and that cost was increasing rapidly to a point where the customer was looking for alternatives to their backup and recovery strategy. Secondly, the customer had a need to copy data from the production database to the test cluster every 6 hours so that the testing team had access to fresh data from production. This process was accomplished manually using scripts that were internally developed. A lot of engineering cycles were spent in developing, running, and debugging the process to ensure test data was available to the test team in a timely manner.

 

Solution: The customer has deployed an Imanis Data cluster in AWS to address their backup & recovery needs as well as to copy data from the production database to the test database. Given the unique database environment that contained a single large 48 terabyte table, Imanis Data’s incremental-forever technology proved very efficient and resilient by only moving data changes during the backup process. This made the backup process very reliable and fast for the customer. To reduce backup storage costs, the customer used the native Imanis Data Amazon S3 integration and storage optimization capabilities. Using Imanis Data, the customer is now able to store backup data on Amazon S3 at a significantly lower cost compared with Amazon EBS. In addition, the Imanis Data de-duplication technology reduces the overall amount of data that needs to be copied and stored in Amazon S3, further reducing the backup storage costs.

 

The same Talena cluster mirrors production data to the test Cassandra environment. The customer automates the entire process by creating a mirroring workflow that copies the large table from the production database to the test database every six hours. After the first full copy, all subsequent data transfers to the development cluster are incremental only resulting in much faster transfers with significantly lower network bandwidth utilization.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

What’s Next With Big Data: A Q&A with ONSET’s Shomit Ghose

Taxonomy/Category

ONSET Ventures General Partner Shomit Ghose has witnessed and invested in a variety of major technological disruptions ranging from SaaS and cloud, to Big Data and next generation security. In this post, he discusses some of the key trends in Big Data and how enterprises can take advantage of them.

 

1. A few years ago you said that big data was the only business model tech had left. Do you still feel that way? And why?

 

Big Data today is even more firmly entrenched as the only business model that remains in tech. By now, hardware, software and bandwidth have been completely commoditized and are ubiquitous, with the consequence that the supply of data being produced is exploding at ever-faster rates. With everything else commoditized, the only sustainable business value is in monetizing that vast supply of Big Data.

 

2. As a venture capitalist investing in software companies, what big data trends are you most excited about? IoT, machine learning/AI, deep analytics, others?

 

The Big Data trend that excites me the most is the ability to apply unsupervised machine learning to enormous volumes of seemingly sparse data, and then finding monetizable business signals within that data. This capability is independent of any specific vertical domain, of course. The investing opportunity lies in combining the insights that machine learning can bring, with a disruptive business model, within a large vertical domain.

 

3. The volume of data continues to skyrocket just as companies are increasingly aware of the tremendous business value their data holds. What are some of the most common mistakes customers make around their big data infrastructure?

 

The biggest mistake companies make when embarking on a Big Data strategy is the failure to clearly understand their own business. Without a well-defined, antecedent business use case, it’s impossible to know what Big Data is required, how to acquire it, how it should be interpreted, or whether in the end the Big Data initiative has been successful in improving the bottom line. First and foremost, companies need to begin by understanding the types of insights their business needs.

 

4. Which markets will see the next wave of big data innovation and why? Agriculture, aviation, health care, eCommerce or others?

 

I think every industry will be deconstructed and redefined by the advent of data-driven business models; it’s only a matter of time till this happens across the board. But today I’m most excited about the new business models afforded by the absolutely vast streams of data that will be produced by the connected car. The connected car is going to revolutionize automobile usage, and this revolution is going to be driven entirely by data. So just as your cell phone is no longer about the physical device itself but about the applications and data flowing through it, your automobile is about to undergo the same transition. The transition from car-as-your-largest-mechanical-device to car-as-your-largest-computing-device is already underway and will open an entire new universe of opportunity for the software industry.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

The Unique Capabilities of the Imanis Data HBase Connector

Taxonomy/Category

In this post we compare the Imanis Data HBase Connector with existing and proposed data protection solutions for Apache HBase.

Imanis Data HBase connector design

As with all Imanis Data connectors, the HBase connector automatically provides all the benefits of the Imanis Data platform: metadata catalog; Imanis Data FastFind, for rapid object discovery; storage optimization, to reduce capital and operating expenses; a scale-out architecture, to handle any size of production workload; and more. Review all the platform features here.

A few different capabilities, explained below, uniquely separate the Imanis Data connector design from that of other systems.

Agentless architecture

The new HBase connector design follows the same principles that govern other connectors and has an agentless architecture. That means customers need not change any configuration or install any software on their production cluster.

Imanis Data HBase backup

The Imanis Data HBase connector leverages HBase snapshots to take backups. HBase snapshots guarantee data consistency by flushing all in-memory data, committing it to persistent storage.

The process is as follows:

1. Take a full backup: Create an HBase table snapshot and copy all the files contained in the snapshot to Imanis Data. Do this only once for any backup workflow.

2. Take an incremental backup by creating a new HBase table snapshot.

3. Look up the Imanis Data catalog and compare the files contained in the new snapshot with the files that were present in the previous iteration of the backup.

4. Copy the incremental data to Imanis Data

5. After the backup is completed, immediately delete snapshots taken on the production cluster.

It’s possible that extra data is captured by the incremental backups because of compaction on the production cluster. But, thanks to Imanis Data data-aware de-duplication, the extra data capture does not consume extra space on the secondary cluster.

Imanis Data backup example

Below is tabulation for a backup job during which we took a backup snapshot every night. We assumed that 20Gb of data were added to the cluster every day. We also assumed duplicate data within the table and that Imanis Data would achieve a 5x reduction in data size.

Notice that on Day 4, files f3, f4, and f5 were combined because of compaction and a new file f6 was created by the daily addition of data. The incremental backup on Day 4 copied the new files as well as the additional data. 80GB of data were copied instead of the daily 20GB. But once de-duplication runs on the Imanis Data platform, all duplicate data will be eliminated and only unique chunks of data will be retained.

Additional space savings

HBase works on top of the Hadoop distributed file system (HDFS). Typically, the replication factor of HDFS is set to three production clusters, so a 100GB data set on the production cluster will occupy 300Gb of disk space.

Example 1 highlights the storage efficiency of the Imanis Data platform. When data is backed up to the Imanis Data system, only unique data is saved after our data-aware de-duplication. A data set that takes 300GB of disk space on a production cluster can end up taking just 20GB of disk space on the Imanis Data platform.

Incremental forever

Our backups are incremental forever and the platform also provides a restore-centric design. Our architecture optimizes a company’s recovery time objective (RTO). Unlike traditional backup-and-recovery methods that take periodic full backups and apply incrementals to them, every incremental backup image is a fully recoverable and independent snapshot of the production data. This allows Imanis Data to deliver a single-step restore process.

The following scenario and its results (shown in Example 2) illustrate the principle.

A customer creates a new backup job, which takes a nightly backup of 10 critical tables. Backup images are maintained for 90 days. Assume an original 1 terabyte data set and 50 gigabytes of daily changes.

After 80 days, we see that the customer has one full backup and 79 incremental backups. On day 81, we see that user error caused data corruption in some of the tables—the customer needs to recover all the data immediately!

A traditional recovery approach would recover the first full backup and then start applying changes from each of the 79 incremental backups. However, Imanis Data maintains a virtualized copy of the production data set in its restore point, so our restores are speedy and involve moving just a fraction of the original data. In this example, Imanis Data’s restore algorithms will restore from the virtualized restore point, thereby restoring only 1.2TB of data.

Data recovered by other solutions: 1TB + 50GB x79 = 4.95TB

Data recovered by Imanis Data: 1.2TB (exactly the size of data on Day 80)

Granular Restores

With the Imanis Data HBase connector, customers can select backup-and-restore data sets at the namespace or table level. A customer can select a complete namespace or a set of tables when a new backup workflow is created. But during the restore, the customer can select any individual table or set of tables to be recovered to the same HBase cluster or to an alternate HBase cluster in the data center.

Even though Imanis Data uses an incremental forever approach for backup, all the restore points are completely virtualized on the Imanis Data cluster. That way, a customer need not restore the first full backup followed by incrementals. Instead, the restore point is instantly available to complete the restore.

Imanis Data HBase connector vs. other HBase backup stratagems

We compared the Imanis Data HBase backup strategy against two other HBase backup offerings.

Backups with the Write-Ahead Log (WAL)

The WAL technique uses HBase snapshot capability to take a full backup and then uses the WAL to take incremental backups. In HBase, all transactions are first written to the WAL before they are committed to actual HFiles. A WAL is maintained for each region server. All that makes for a good schema, but with certain limitations.

One such is that the restore procedure follows the traditional approach of full and incremental restores. It suffers from the same storage bloat problems discussed above. Moreover, the captured WAL files must be converted to HFiles before they can be restored. And further, RTOs are significantly higher with this procedure.

Another limitation is that the incremental backup appropriates the WAL, and because the WAL is shared by all regions hosting various tables on a single region’s server, incremental backups include data for all tables in the deployment. Even if a customer selects just a single table for backup, the changes for all tables are captured, thus extending the backup window and including unnecessary data. That extra data will have to be purged at the receiving cluster so that only the relevant data set is stored. Moreover, when two tables have to be backed up with different frequencies, say, once every hour vs. once every 12 hours, well, the WAL has to be copied multiple times.

We think these are two major disadvantages of a WAL-based backup.

Backup snapshot management

Simple snapshot management leverages HBase snapshots for data protection. A backup utility takes periodic snapshots according to a predefined policy. The snapshots are saved locally on the HBase cluster, but to assist in disaster recovery, they can be copied to different locations in the data center or to cloud storage environments like Amazon S3 or Azure Blob storage. Recovery involves copying the files of the snapshot to a temporary location on the HBase cluster and using HBase bulk load to recover the lost data.

That procedure is used by backup utilities provided with some HBase distributions, but it is too simple for today’s complex data protection needs. Some of its limitations:

  • Backups are not incremental; rather, the whole snapshot is copied as part of every backup, resulting in large backup windows.
  • Secondary storage is needed to keep multiple restore points.
  • The number of restore points that can be saved on backup storage is limited because of excessive space consumption.
  • Snapshots cannot be recovered to another cluster in the data center because the backup utility is limited.

Conclusion

Imanis Data provides a highly scalable solution to protect against accidental data loss in a HBase environment and encompasses key functional attributes such as an agentless model, incremental-forever backup and extremely rapid recovery aided by our metadata catalog. Watch our product video and review our architecture white paper to get a better understanding of how we bring technical and business value to the world of big data management.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Data Protection in a Complex Hadoop Deployment

Taxonomy/Category

Summary: One of the world’s largest financial services companies deploys Imanis Data to protect their Hadoop data lake and support business-critical analytics capabilities.

 

Industry: Financial Services. One of the largest financial services companies in the world that provides personal and commercial banking, corporate and investment banking, wealth management services, insurance, and transaction processing services globally. The company has embarked on a Big Data journey to enable next-generation analytics and smarter data management.

 

Hadoop Environment: The company has standardized on Hadoop as their Big Data platform and created a Data Lake. All customer transactions (~400M per day) are archived to the Data Lake for compliance and retention needs. Machine learning, cognitive computing, and predictive analytics are used against the archived data to drive next-generation analytics for their clients. The company has set up three distinct Hadoop clusters for specific use cases. The first consists of 40 nodes hosting 425 TBs of transaction data. The second is a 16-node cluster with 180 TB of data for fraud detection and the third is a 8-node cluster with 50 TB of data to obtain customer insights.

 

Challenges: Availability and reliability of the data lake is critical to the company, as any kind of downtime or data loss would be disastrous to their Big Data initiatives and have a negative impact on the business and consumer trust. However, managing this large and complex Hadoop environment has been a challenge for the customer. The admin team was under constant pressure handling hardware failures, software updates, security issues, and capacity management. On top of that, they were writing and maintaining many different scripts to back up these Hadoop clusters. The backup operations team is responsible for running the scripts on a nightly basis and addressing data recovery requests on an ad hoc basis. These scripts were very fragile and frequently failed requiring frequent late night calls to the admin team. Data recovery was also a fire drill because it involved multiple steps such as locating the right backups, performing data recovery, and verifying that the right data was recovered. With the growth and complexity of the Hadoop environment, the backup and recovery costs were spiraling and the customer was unable to meet the recovery time objectives (RTO) mandated by their business. At that point the customer started looking for commercial alternatives.

 

Solution: The customer chose and deployed a 10-node Imanis Data cluster in their production environment to address their backup and recovery needs for their Hadoop infrastructure. Imanis Data’s scale-out architecture gives them plenty of room to grow as the size of their primary Hadoop cluster and corresponding data grows or they bring additional Hadoop clusters online in the future. Now that Imanis Data is deployed in their production environment, the admin team is freed from what was becoming an unmanageable situation and backup and recovery is now the sole responsibility of the backup operations team. Using a single UI to manage backup and recovery across their entire Hadoop environment, the backup operations team is now able to focus on providing better and faster service when a business user requests data to be recovered. Backups are completely centralized and automated without relying on fragile scripts and requiring constant monitoring. Their backup storage costs have also gone down significantly due to the Imanis Data storage optimization (de-duplication and compression) and incremental-forever backup methodology.

 

The customer is also now able to meet and exceed their Recovery Point Objective (RPO) and their Recovery Time Objective (RTO) since Imanis Data allows them to back up their data more aggressively on an hourly schedule and recover data very quickly using the Imanis Data one-step restore process.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Two Under-Appreciated Imanis Data Capabilities

Taxonomy/Category

Recently I have spent a lot of time working with existing customers to understand their requirements and help them take full advantage of the Imanis software for their modern data management needs. During these conversations I’ve realized that even customers with whom we’ve been working for some time have not always internalized the full power of our product. There are two topics that crop up more often than others, so I want to discuss these in more depth.

 

Incremental Forever Backup

 

“No, you never have to create another full backup after your first one”

“No, this does not impact your ability to recover quickly from a data loss incident”

“No, Imanis Data does not have any issues finding the one specific data object across terabytes or petabytes of data”

 

This is a fairly accurate representation of my interactions with customers when we discuss the Imanis backup architecture and philosophy. I’ve written in the past about why an incremental-forever architecture is an absolute necessity in the big data world.

 

Let’s use a concrete example to highlight why this approach works. Assume that you use Imanis Data to back up your Cassandra database on a daily basis and that you keep seven days worth of restore points. A full backup is done the very first time the backup workflow is initiated. All subsequent backups are incremental. On day eight, Imanis Data will delete the very first backup that was created. And this is a very important point: even if the very first full backup has been deleted, Imanis Data never needs to execute a full backup ever again. All the relevant data is still available on the Imanis Data storage cluster. This is because we create a “virtualized full” image for each incremental backup that we do. This is in stark contrast to the traditional approach of periodically doing a full backup and then doing incremental backups during other intervals which creates significantly greater overhead and is uneconomical at big data sizes.

 

Rapid Recovery

 

We understand the ultimate goal of any backup and recovery solution is to minimize the impact of a data loss or downtime incident. There are two key capabilities that allow us to help companies meet their recovery point (RPO) and recovery time (RTO) objectives:

 

  • As mentioned above, Imanis Data delivers a one-step restore process via its ability to create “virtualized full” restore points whenever an incremental backup is completed. This allows us to immediately start the restore process without any lag time associated with the creation of a restore image, present in typical recovery processes.

 

  • The Imanis Data Google-like metadata catalog immediately shortens the time to find a particular object and restore point based on a “timeline” metaphor surfaced via the Imanis Data user interface. This capability eliminates having to know the exact details of the restore point that is typically needed within built-in platform utilities or manual scripts.

 

 

Our white paper describes these capabilities and our architecture in quite some depth, and I’d also encourage you to watch our product video so you understand how quickly companies can initiate and execute enterprise level data management workflows.

Sign Up To Receive Imanis Data Updates

Blog

Lorem Ipsum

Data Protection For An Analytics Data Warehouse

Taxonomy/Category

This is a continuation of our series highlighting why and how customers are deploying Imanis Data. Others in this series are here and here.

 

Summary: A leading online travel brand deployed Imanis Data to protect their business-critical consumer data and travel metrics data warehouse.

 

Industry: Online travel. This company simplifies travel planning and trip management by offering a variety of innovative tools and features such as exploring trips based on traveler budget and price forecasting online and via its mobile app.

 

Big Data Environment: To enable these innovative capabilities, the company deployed the Vertica data warehouse and the Hadoop platform. The data warehouse captures large, fast-growing volumes of travel data and makes it available to query-intensive applications and users. The Hadoop platform captures transaction data and also supports deep data analytics. The customer has deployed two 10-node Vertica production clusters and a large Hadoop cluster in their on-premise data center.

 

Challenges: Naturally, in the consumer-facing online travel business, uptime and data integrity of a company’s applications are critical. Any type of data loss or downtime would be unacceptable, resulting in lost revenue, lost opportunities and negative impact on their brand. Protecting the data stored in their Vertica and Hadoop environments became even more paramount because they were struggling with the homegrown solution they had put together using native tools that were provided by the Big Data platforms.

 

Their current solution was very inflexible, unreliable, network and storage intensive, and consumed a lot of operational resources. Backups were very topology dependent and any changes to the topology after the backups were taken would render the previous backups non-restorable. With their current solution, they could only restore the entire database with no ability to do granular table or partition-level restores. Since their current backup methodology involved a weekly full backup and incrementals during the week, the backup storage consumption was growing as the production environment grew and the network utilization was also very high. Finally, backup and restores were manual and onerous on their IT staff, especially when there were frequent failures of the backup and restore process.

 

Solution: The customer has deployed a five-node Imanis Data cluster in their production environment to address their backup and recovery needs for both Vertica and Hadoop. With Imanis Data the customer addresses all their current backup and recovery challenges for their Vertica data warehouse. The customer can implement granular backup and recovery workflows, down to the table or partition level. The Imanis Data software is topology-agnostic and has no limitations when it comes to recovery to different sized Vertica clusters. All backups using Imanis Data are incremental-forever and as a result significantly reduce the backup window as well as the amount of storage and network resources consumed. Since the incremental backups are fully materialized (aka, they are virtual full backups), restoring data is a single step process making recoveries fast, simple, and reliable. The automation and reliability significantly reduced the operational resource burden on the IT team.

 

The same Imanis Data cluster is also protecting the Hadoop namenode environment. With regular backups of the namenode database, the customer can ensure rapid recovery of their Hadoop environment in case of a corruption of the namenode database.

 

We have plenty of resources to help you understand how we can help you prevent data loss for Vertica, Hadoop and other big data platforms. Drop us a line with any questions

Sign Up To Receive Imanis Data Updates