When my co-founders and I first came together in 2013 to discuss what would become Talena, our mission was clear: help companies minimize the impact of data loss in today’s modern data ecosystem; a big data world built around NoSQL databases, Hadoop and newer data warehouse technologies. Much has changed since those early days and we’re excited to reveal our new name: Imanis Data. The Latin root, immanis, means immense, huge or vast. Our new name — and our new website — better reflect who we are today, the vast modern data challenges we’re helping leading enterprises address – and where we’re collectively headed, as massive amounts of data, and the insights we can glean from it, push us forward.
We’re also thrilled to unveil our next-generation data management software platform, and you can learn more about that in our official press release, as well as our blog post. A company’s data – and how it’s managed, protected, analyzed and optimized – is strategic and a top-of-mind concern across the C-suite. Companies are building critically important modern data applications at a rapid pace and issues around minimizing data loss are increasingly top of mind.
Like many start-ups before us, we started with a simple idea to address a complex problem. The rise in new business-critical applications exposes companies to greater risk for human and application errors as well as ransomware, and losing critical data would mean weeks of downtime for some, an irreversible loss for others. What’s more, we knew that companies would need infinite scale and unmatched performance to protect these enormous data sets in this new era.
Our success is in large part thanks to the support of our earliest customers and partners who took a chance on a start-up with a game-changing promise. Would it prove to be good to be true? We are proud to say, no. We delivered and the market responded.
An industry first when we launched in 2015, the Imanis Data software remains the industry’s fastest data backup and recovery solution with built-in machine intelligence to handle massive data sets and effectively manage and quickly recover data in the event of corruption or loss. Today we protect thousands of production nodes across dozens of Fortune 500 companies. What’s more, we’re minimizing downtime costs that average $750K per incident and saving our customers up to 80% on secondary storage needs.
We’ve seen rapid growth across our customer base in nearly every sector with significant traction in financial services, retail, manufacturing and technology. We’re also proud to have built a robust partner ecosystem that includes more than a dozen of the world’s leading technology, consulting and reseller partners.
We’re committed to constantly enhancing our high-performance data management architecture that gives our customers peace of mind allowing them to put all that business-critical data to good use. We’re excited about introducing many more companies to the unique and critical capabilities of Imanis Data—to helping them prevent data loss and realize the benefits of strategic backup and recovery.
When we launched two years ago, the need to support companies building mission-critical applications on top of modern data platforms like Cassandra, Couchbase, Hadoop, MongoDB and Vertica was growing fast. Today, these applications continue to expand rapidly within every large enterprise. As they increase in popularity so do the risks for human and application errors and the threat of critical data loss. The risk is all too real and increasingly common. So common, 3 out of 4 companies have experienced a loss over the last year; a loss that carries with it an average cost of 900K and weeks of downtime.
That’s why we’re pleased to introduce our latest platform, Imanis Data 3.0.
A game-changer when we first launched, Imanis Data 3.0 remains the fastest backup and recovery platform on the market. Our cloud-ready elastic architecture backs up, recovers and replicates terabyte and petabyte-sized data sets and beyond up to 10 times faster than any other solution on the market – the kind of speed that greatly minimizes the impact of data loss by reducing costly days and weeks of downtime to minutes and hours and reduces secondary storage costs by up to 80%. What’s more, our intelligent software enables early detection of ransomware attacks and proactively identifies accidental data loss, while also offering unmatched scale and performance.
Over the last year, we’ve backed up more than 3 billion documents in a single Couchbase bucket for a leading gaming company; handled over 100 terabytes in a single Cassandra keyspace for a top manufacturing company; and recovered data 10 times faster than the native database utility for one of the world’s leading banks.
New 3.0 features include:
We are proud to be unveiling our new platform today and look forward to supporting our existing and growing number of Fortune 500 businesses in nearly every market. Why are so many successful companies across data-centric industries like retail, manufacturing and banking choosing Imanis Data? Because they understand the value of their data and what a loss will cost their organizations in time, money and reputation. And because Imanis Data is the fastest backup and recovery platform that also delivers extreme scale, rapid recovery, machine intelligence and smart storage optimization. It’s a win-win.
I hope we have the opportunity to introduce you to the power of Imanis Data 3.0 soon.
ONSET Ventures General Partner Shomit Ghose has witnessed and invested in a variety of major technological disruptions ranging from SaaS and cloud, to Big Data and next generation security. In this post, he discusses some of the key trends in Big Data and how enterprises can take advantage of them.
1. A few years ago you said that big data was the only business model tech had left. Do you still feel that way? And why?
Big Data today is even more firmly entrenched as the only business model that remains in tech. By now, hardware, software and bandwidth have been completely commoditized and are ubiquitous, with the consequence that the supply of data being produced is exploding at ever-faster rates. With everything else commoditized, the only sustainable business value is in monetizing that vast supply of Big Data.
2. As a venture capitalist investing in software companies, what big data trends are you most excited about? IoT, machine learning/AI, deep analytics, others?
The Big Data trend that excites me the most is the ability to apply unsupervised machine learning to enormous volumes of seemingly sparse data, and then finding monetizable business signals within that data. This capability is independent of any specific vertical domain, of course. The investing opportunity lies in combining the insights that machine learning can bring, with a disruptive business model, within a large vertical domain.
3. The volume of data continues to skyrocket just as companies are increasingly aware of the tremendous business value their data holds. What are some of the most common mistakes customers make around their big data infrastructure?
The biggest mistake companies make when embarking on a Big Data strategy is the failure to clearly understand their own business. Without a well-defined, antecedent business use case, it’s impossible to know what Big Data is required, how to acquire it, how it should be interpreted, or whether in the end the Big Data initiative has been successful in improving the bottom line. First and foremost, companies need to begin by understanding the types of insights their business needs.
4. Which markets will see the next wave of big data innovation and why? Agriculture, aviation, health care, eCommerce or others?
I think every industry will be deconstructed and redefined by the advent of data-driven business models; it’s only a matter of time till this happens across the board. But today I’m most excited about the new business models afforded by the absolutely vast streams of data that will be produced by the connected car. The connected car is going to revolutionize automobile usage, and this revolution is going to be driven entirely by data. So just as your cell phone is no longer about the physical device itself but about the applications and data flowing through it, your automobile is about to undergo the same transition. The transition from car-as-your-largest-mechanical-device to car-as-your-largest-computing-device is already underway and will open an entire new universe of opportunity for the software industry.
Summary: One of the world’s largest financial services companies deploys Imanis Data to protect their Hadoop data lake and support business-critical analytics capabilities.
Industry: Financial Services. One of the largest financial services companies in the world that provides personal and commercial banking, corporate and investment banking, wealth management services, insurance, and transaction processing services globally. The company has embarked on a Big Data journey to enable next-generation analytics and smarter data management.
Hadoop Environment: The company has standardized on Hadoop as their Big Data platform and created a Data Lake. All customer transactions (~400M per day) are archived to the Data Lake for compliance and retention needs. Machine learning, cognitive computing, and predictive analytics are used against the archived data to drive next-generation analytics for their clients. The company has set up three distinct Hadoop clusters for specific use cases. The first consists of 40 nodes hosting 425 TBs of transaction data. The second is a 16-node cluster with 180 TB of data for fraud detection and the third is a 8-node cluster with 50 TB of data to obtain customer insights.
Challenges: Availability and reliability of the data lake is critical to the company, as any kind of downtime or data loss would be disastrous to their Big Data initiatives and have a negative impact on the business and consumer trust. However, managing this large and complex Hadoop environment has been a challenge for the customer. The admin team was under constant pressure handling hardware failures, software updates, security issues, and capacity management. On top of that, they were writing and maintaining many different scripts to back up these Hadoop clusters. The backup operations team is responsible for running the scripts on a nightly basis and addressing data recovery requests on an ad hoc basis. These scripts were very fragile and frequently failed requiring frequent late night calls to the admin team. Data recovery was also a fire drill because it involved multiple steps such as locating the right backups, performing data recovery, and verifying that the right data was recovered. With the growth and complexity of the Hadoop environment, the backup and recovery costs were spiraling and the customer was unable to meet the recovery time objectives (RTO) mandated by their business. At that point the customer started looking for commercial alternatives.
Solution: The customer chose and deployed a 10-node Imanis Data cluster in their production environment to address their backup and recovery needs for their Hadoop infrastructure. Imanis Data’s scale-out architecture gives them plenty of room to grow as the size of their primary Hadoop cluster and corresponding data grows or they bring additional Hadoop clusters online in the future. Now that Imanis Data is deployed in their production environment, the admin team is freed from what was becoming an unmanageable situation and backup and recovery is now the sole responsibility of the backup operations team. Using a single UI to manage backup and recovery across their entire Hadoop environment, the backup operations team is now able to focus on providing better and faster service when a business user requests data to be recovered. Backups are completely centralized and automated without relying on fragile scripts and requiring constant monitoring. Their backup storage costs have also gone down significantly due to the Imanis Data storage optimization (de-duplication and compression) and incremental-forever backup methodology.
The customer is also now able to meet and exceed their Recovery Point Objective (RPO) and their Recovery Time Objective (RTO) since Imanis Data allows them to back up their data more aggressively on an hourly schedule and recover data very quickly using the Imanis Data one-step restore process.
Recently I have spent a lot of time working with existing customers to understand their requirements and help them take full advantage of the Imanis software for their modern data management needs. During these conversations I’ve realized that even customers with whom we’ve been working for some time have not always internalized the full power of our product. There are two topics that crop up more often than others, so I want to discuss these in more depth.
Incremental Forever Backup
“No, you never have to create another full backup after your first one”
“No, this does not impact your ability to recover quickly from a data loss incident”
“No, Imanis Data does not have any issues finding the one specific data object across terabytes or petabytes of data”
This is a fairly accurate representation of my interactions with customers when we discuss the Imanis backup architecture and philosophy. I’ve written in the past about why an incremental-forever architecture is an absolute necessity in the big data world.
Let’s use a concrete example to highlight why this approach works. Assume that you use Imanis Data to back up your Cassandra database on a daily basis and that you keep seven days worth of restore points. A full backup is done the very first time the backup workflow is initiated. All subsequent backups are incremental. On day eight, Imanis Data will delete the very first backup that was created. And this is a very important point: even if the very first full backup has been deleted, Imanis Data never needs to execute a full backup ever again. All the relevant data is still available on the Imanis Data storage cluster. This is because we create a “virtualized full” image for each incremental backup that we do. This is in stark contrast to the traditional approach of periodically doing a full backup and then doing incremental backups during other intervals which creates significantly greater overhead and is uneconomical at big data sizes.
We understand the ultimate goal of any backup and recovery solution is to minimize the impact of a data loss or downtime incident. There are two key capabilities that allow us to help companies meet their recovery point (RPO) and recovery time (RTO) objectives:
Our white paper describes these capabilities and our architecture in quite some depth, and I’d also encourage you to watch our product video so you understand how quickly companies can initiate and execute enterprise level data management workflows.
Summary: A leading online travel brand deployed Imanis Data to protect their business-critical consumer data and travel metrics data warehouse.
Industry: Online travel. This company simplifies travel planning and trip management by offering a variety of innovative tools and features such as exploring trips based on traveler budget and price forecasting online and via its mobile app.
Big Data Environment: To enable these innovative capabilities, the company deployed the Vertica data warehouse and the Hadoop platform. The data warehouse captures large, fast-growing volumes of travel data and makes it available to query-intensive applications and users. The Hadoop platform captures transaction data and also supports deep data analytics. The customer has deployed two 10-node Vertica production clusters and a large Hadoop cluster in their on-premise data center.
Challenges: Naturally, in the consumer-facing online travel business, uptime and data integrity of a company’s applications are critical. Any type of data loss or downtime would be unacceptable, resulting in lost revenue, lost opportunities and negative impact on their brand. Protecting the data stored in their Vertica and Hadoop environments became even more paramount because they were struggling with the homegrown solution they had put together using native tools that were provided by the Big Data platforms.
Their current solution was very inflexible, unreliable, network and storage intensive, and consumed a lot of operational resources. Backups were very topology dependent and any changes to the topology after the backups were taken would render the previous backups non-restorable. With their current solution, they could only restore the entire database with no ability to do granular table or partition-level restores. Since their current backup methodology involved a weekly full backup and incrementals during the week, the backup storage consumption was growing as the production environment grew and the network utilization was also very high. Finally, backup and restores were manual and onerous on their IT staff, especially when there were frequent failures of the backup and restore process.
Solution: The customer has deployed a five-node Imanis Data cluster in their production environment to address their backup and recovery needs for both Vertica and Hadoop. With Imanis Data the customer addresses all their current backup and recovery challenges for their Vertica data warehouse. The customer can implement granular backup and recovery workflows, down to the table or partition level. The Imanis Data software is topology-agnostic and has no limitations when it comes to recovery to different sized Vertica clusters. All backups using Imanis Data are incremental-forever and as a result significantly reduce the backup window as well as the amount of storage and network resources consumed. Since the incremental backups are fully materialized (aka, they are virtual full backups), restoring data is a single step process making recoveries fast, simple, and reliable. The automation and reliability significantly reduced the operational resource burden on the IT team.
The same Imanis Data cluster is also protecting the Hadoop namenode environment. With regular backups of the namenode database, the customer can ensure rapid recovery of their Hadoop environment in case of a corruption of the namenode database.
In my previous post I highlighted some of our key accomplishments and the lessons we learned in 2016. In this post, I want to focus on the key trends we anticipate driving our business and big data more generally in 2017.
IT Ops Increasingly Takes Ownership of Big Data Management
We believe a large percentage of our 2017 business will originate with DevOps and Engineering teams as they deploy distributed database applications for their line of business. As these applications emerge into Tier 1 status, however, they will come under the purview of IT Ops. As big data applications mature, we anticipate the need for software-defined data management to only grow within this community – and our focus on cost savings, compliance and business agility will continue to directly support this trend.
Machine Learning Will Drive Automation In Data Management
Machine learning is rapidly impacting every part of a technology stack as a means to overcome some of the limitations associated with human-intensive processes. This trend will affect the data management space as well, and we see plenty of opportunities for machine learning to deliver significant value for companies looking to better protect their data assets against accidental loss and support critical compliance goals.
The Partner Ecosystem as a Catalyst in Data Management Adoption
Over the past several months we have seen a wave of interest from partners whose customers are asking the right types of big data management questions – the types of questions that prompt the partner to contact us. These are no longer just fulfillment questions but strategic questions that impact the success of a company’s big data initiatives, and overall success.
Data Management Isn’t Just The Purview of Early Adopters
We’ve been pleasantly surprised by companies using our software that traditionally aren’t considered early technology adopters: insurance companies, healthcare, and life-sciences companies. We believe the benefits of big data and the relevance of enterprise data management are cutting their way across all industries, and we only expect the diversity of companies using our solution to continue.
We’re very excited about 2017 – it’s already off to a great start with add-on purchases from existing clients, purchases from new ones, and new partnerships coming to fruition. I look forward to keeping you abreast of our progress – and perhaps learning more about how we might help you and your organization support and manage these trends.
2016 was an excellent year for Imanis Data. We saw significant enterprise traction and now have over a thousand big data nodes and several petabytes under management. Most importantly, we received real world validation about our value from companies, across all markets, looking to prevent data loss, support compliance initiatives, and deliver applications faster. As I look back on the year, I see three major themes:
Imanis Data has always provided the broadest set of data management capabilities for big data platforms, from test data management to backup & recovery to archiving. What we discovered in 2016 was that the initial driver for Imanis Data adoption was around preventing data loss via our highly scalable backup and granular recovery solution. Why has this use case been the tip of the arrow for us?
Enterprise Sales Momentum & Key Partnerships
We’ve seen customer adoption from a number of Fortune 500 companies across a variety of vertical markets, and the proof-of-concepts in process right now validate our software-defined architecture. We’ve seen especial interest in the financial services, retail/e-commerce, technology, and manufacturing verticals, not surprising given the initial wave of big data success stories.
There are two primary groups within an enterprise that adopt our technology, driven primarily by where ownership of the underlying data platform resides. DevOps and engineering teams like our extensibility into their environment via our RESTful API and our ability to easily merge into the application development and deployment processes. IT Ops teams like our emphasis on storage cost efficiency, our ability to easily scale as production data grows, and our flexibility to deploy elastically in the Cloud or on-prem.
We’ve expanded our partner ecosystem significantly in 2016 to include best-of-breed SIs and VARs who have enterprise relationships and we saw a major jump in the number of partner-influenced opportunities and deals in the 2nd half of the year.
Continued Innovation and Data Platform Expansion
On the product side we focused our efforts in 2016 on three areas
Stay tuned for my next post in which I’ll highlight what we see happening in the world of software-defined data management in 2017.
This is the second in a series on customer deployments with Imanis Data. The first describing a customer deployment scenario in AWS is highlighted here.
Summary: A large business and financial management software company turned to Imanis Data to simplify backup and recovery in their large multi-datacenter Cassandra deployment.
Industry: Technology (Computer Software). A Fortune 1000 software company that provides business and financial management solutions for consumers, small businesses, and accounting professionals.
Big Data Environment: This Imanis Data customer has standardized on DataStax Enterprise (DSE) as the underlying NoSQL database. All databases and applications are hosted internally in their own data centers. The customer currently has 32 terabytes of Cassandra data in a 75-node DSE cluster spread across four data centers.
Challenges: The customer was using Cassandra snapshots for backing up their Cassandra database. Their DevOps team had developed some scripts to automate the process of taking snapshots on a periodic basis, a time-intensive process and a drain on storage resources.
Specifically, Cassandra snapshots created a few challenges for this customer because of their large and distributed environment, including:
Solution: The customer has deployed a Imanis Data cluster to back up their entire 75 node DSE cluster. Deployment and configuration of the Imanis Data software was quick and easy and the customer was doing backup of data in their production environment within an hour. The customer is our multi-datacenter capabilities to significantly reduce the amount of data that has to be transferred across a wide area network. All backups are de-duplicated, compressed, and stored on direct-attached storage outside the DSE clusters.
Summary: A leading provider of Data as a Service for data-driven enterprise applications turned to Imanis Data to protect its critical customer data assets stored in Cassandra and to enable rapid application iteration.
Industry: Technology (software). An award-winning software company that manages all types of customer data including multi-domain master data, transaction and interaction data, third party, public and social data, across all industries from healthcare and life sciences to retail and entertainment.
Big Data Environment: This Imanis Data customer has standardized on Datastax Enterprise (DSE) as the underlying NoSQL database. All databases and applications are hosted in the Amazon AWS cloud. The customer currently serves its clients using six 6-node DSE clusters storing 36 terabytes of data.
Challenges: The customer was using their engineering resources to write scripts for backing up the various Cassandra databases. The scripts were executed on a nightly basis on each of the DSE clusters and would frequently fail. Engineering would have to be called in to debug and fix these complex scripts so that backups could be done successfully. Also, the scripts were backing up all replicas of data resulting in escalating Amazon storage bills.
Creating test and development environments with production data also involved writing inefficient scripts. Engineers had to wait for days to get a non-production environment to use for development thereby slowing down the application development process. These challenges were an unnecessary drain on valuable engineering resources and taking engineers away from other business critical projects.
Solution: The customer has deployed a single 3-node Imanis Data cluster to back up all 6 DSE clusters. Deploying and configuring the Imanis Data software to back up 6 DSE clusters took less than an hour and the entire configuration was done using our web-based user interface. This greatly simplified the backup and recovery process and freed up valuable engineers from writing and maintaining scripts.
All backups are de-duplicated, encrypted, and stored in Amazon S3. Imanis Data’s content-aware deduplication significantly reduced the backup storage requirements by storing one backup copy versus storing all replicas. By copying backup data to low-cost Amazon S3, the customer was able to further reduce backup storage costs significantly.
The same Imanis Data cluster is also being used to spin up test & development clusters using production data. Using our RESTful API, the customer is able to integrate Imanis Data into their workflows and dashboard. Developers can now create test and development clusters very easily and quickly without writing any scripts.