Highly Scalable Data Management for Hadoop
Hadoop has enabled companies to deploy large-scale, distributed applications that identify customer shopping patterns, tackle potential fraud, and even process human genome data across hundreds of terabytes and even petabytes of data. As these applications morph from interesting research projects into mission-critical business processes, the risk of downtime or data loss increases substantially and can severely impact your revenue and business reputation. Companies realize that data management to support rapid application iteration and prevent data loss is crucial for anyone who seeks to operationalize their Hadoop infrastructure.
Although Hadoop provides basic data management features (e.g. multiple replicas to protect against hardware failures), these features do not cover the gamut of data management capabilities that are required for large-scale distributed applications. Data management has to span the full lifecycle of use cases: enabling self-service access to production data for sandbox use; sophisticated backup and granular recovery; archiving older data to cost-efficient storage onpremise or in the cloud; and creating reliable disaster-recovery copies. Home grown solutions cannot handle these requirements either, so companies need a new approach to manage data on their Hadoop platforms. They need intelligent systems to effectively manage and quickly clone or recover it so they can get up and running faster with minimal downtime. More intelligent, rapid data management is a must-have in today’s modern data world.