Migrating Hadoop Data to Microsoft Azure Cloud

With the rapid adoption of Microsoft Azure cloud services for Hadoop, one of the challenges organizations face is migrating Hadoop data to a newly provisioned Microsoft Azure cloud infrastructure.  Existing methods used by companies are either expensive, consume a lot of network bandwidth, require application downtime, or cannot address compliance requirements. This will be a two-part blog post: part one will focus on the different use cases, part two will highlight the unique capabilities that allow Imanis Data to support these varied use cases.

 

There are a number of use cases for migrating Hadoop data to the Microsoft Azure cloud.

 

  • On-premises to Cloud Migration. In this scenario, companies are interested in moving the entire Hadoop workload and applications from an on-premise infrastructure to Microsoft Azure.  One of the key requirements is to minimize the application downtime associated with the migration process.

 

  • Cloud-to-Cloud Migration. Organizations that have deployed Hadoop in a different cloud infrastructure are moving their applications to Microsoft Azure because of its unique Hadoop capabilities.  As in the case of on-prem to cloud migration, minimizing application downtime is a key requirement.

 

  • Region-to-Region Migration. This use case involves sharing data with other internal organizations in a different geographic region.  For example, a multinational company has deployed Hadoop in the Microsoft Azure Europe region to collect customer data.  They want to migrate some of that data to an analytics Hadoop cluster in the US region that is being used by their US counterparts .  Data privacy is a big requirement for these use cases.

 

  • Platform Migration or Version Upgrades. Occasionally, companies may need to migrate to a different distribution of Hadoop or upgrade to the latest release of the software.  Being able to migrate the right data with the least amount of downtime to the applications is critical to ensure minimum disruption to the business.

 

  • Test Data Workloads in the Cloud. Many organizations have started using Microsoft Azure to create smaller non-production Hadoop clusters for their development and QA teams.  To mimic real world scenarios during test and development, production data must be copied to non-production environments in Microsoft Azure on a regular basis.  Not only do data sets need to be down sampled to fit the smaller clusters but also confidential data needs to be masked to ensure compliance with privacy regulations.

 

  • Disaster Recovery Site in Cloud. Organizations that have deployed production Hadoop clusters on-premises are increasingly leveraging Microsoft Azure for Disaster Recovery purposes.  By creating Hadoop clusters and replicating data from the on premise production cluster to Microsoft Azure, companies are protecting their valuable data assets and ensuring business continuity in case of a disaster in their primary data center.  This type of deployment significantly reduces costs by minimizing compute down to two head nodes and one small data node.  Costs can be further reduced by using Windows Azure Storage Blob (WASB) storage instead of Azure Data Lake Storage (ADLS).

 

  • Migrating Hadoop data between WASB and ADLS for HDInsights (HDI). Organizations that have deployed HDI using ADLS storage may want to move to WASB to reduce costs or conversely move from WASB to ADLS to improve performance. Both these scenarios would require data migration.

In part 2 we’ll discuss how our underlying architecture and tight integration with Microsoft enables Imanis Data customers to support these use cases.

Sign Up To Receive Imanis Data Updates

Take the Next Step

Put Imanis Data to work for all your data management needs.