How to Plan a Legacy Data Warehouse Migration to the Cloud
Many companies are moving their legacy data warehouse to the cloud for a variety of reasons such as cost savings, scalability, use of data analytics for business growth, and improved integration.
Spotify, an audio streaming platform, moved to Google Cloud Platform (GCP) in 2015 to handle its massive data and free up its developers to focus on innovation. With the help of a dedicated Spotify/Google cloud migration team, they migrated successfully and increased scalability.
Etsy, a global e-commerce platform, was looking to improve its UX and site performance. They chose to migrate to GCP in 2018 and used a hybrid environment to execute the migration. It helped them achieve long-term sustainability and scalability.
Data migration to the cloud is a challenging, time-intensive task that requires thorough research and careful planning to avoid unwanted costs and nasty surprises.
Consider these eight steps to plan data warehouse migration to the cloud:
1. Define goals and business cases
Start the process of planning by outlining your reasons for migrating your data warehouse to the cloud.
You could be looking for any of the following: increased data capacity, faster query performance, cost savings, greater agility, labor savings, or business growth.
Identifying the reason(s) will help you determine your migration path.
2. Evaluate your current data warehouse environment
You need to perform a comprehensive assessment of your current environment.
– Determine your cloud server requirements based on current resource usage. Include any additional enhancements or processing improvements that you plan to integrate.
– Take a complete inventory of the data that needs to be moved. Keep in mind data locations, data types, and data formats in the source systems and in the target system. If you’re migrating data that contains personally identifiable information (PII), ensure that it is secured before, during, and after the migration.
– Also, take stock of which operations and services your data warehouse can and cannot currently perform, and how cloud storage will solve the problem.
Consider the following factors as well:
– Which components of your data warehouse, if any, are already on the cloud?
– Which parts of your data warehouse do you plan to move to the cloud — for instance, ETL, database, or BI tools?
– Which elements are stored on-premise? This is done to identify dependencies (data gravity) that need to be re-established on the cloud.
Engage the expertise and know-how of data owners in project planning to help the migration go smoothly.
If your current data warehouse architecture is sufficient for current BI requirements but is not enough for advanced analytics and big data integration, you will need to review and refine data models and processes as you migrate to the cloud.
3. Plan a cost projection
Now that you have all the necessary information, you can create a fairly accurate cost projection for migration to the cloud and compare it to the cost of running your current solution.
Data migration to the cloud comprises two costs: cloud infrastructure costs and human costs. On average, the cost of executing the migration may be as low as $1,000 per server to as much as $3,000 per server. Complex migrations may even cost $15,0000 per server.
Legacy data warehouses incur maintenance and upgrading costs, whereas cloud storage has a low, pay-as-you-go pricing model.
Cloud solutions cost between $18 and $84 per terabyte per month, whereas on-premise solutions may cost up to $1,000 per month.
4. Define your Legacy Data Warehouse migration strategy
The data migration process has three steps:
– Extraction of data
– Transformation of data
– Loading of data
There isn’t a one-size-fits-all migration strategy but there are some core migration paths.
You can choose from a variety of data migration strategies:
1. Big bang approach — It involves migrating all your data simultaneously, which gets the job done quickly but also requires system downtime. Businesses that are looking to rapidly increase their server capacity, have quit a hosting provider, want to remain cloud-agnostic, or need a backup plan to return to on-premise systems use this approach.
2. Trickle approach — It involves migrating the data in phases and running the source system and target system in parallel. Naturally, it takes longer but requires less downtime and offers more opportunities for testing. Businesses that are looking to optimize a certain part of their system and are planning total cloud adoption tend to use this approach.
3. Refactoring approach — It involves re-imagining how the system will run on a cloud platform and then re-designing it to take advantage of cloud-native capabilities. The approach offers the most ROI but is also the most time-intensive and difficult upfront. Businesses that are unable to scale to meet their future needs use this approach.
Many companies choose to shift their legacy data warehouses to the cloud in increments to minimize downtime and focus on key use cases.
5. Choose a cloud platform and data management environment
A) Choose the cloud platform for data migration
– Consider cloud-optimized databases like Snowflake, Azure Databricks, AWS, Synapse, Microsoft SQL database, or Amazon Redshift.
– Consider the pros and cons of each. For example, you could compare Snowflake vs Databricks to determine what suits your needs best.
– Employ the services of a specialist at this critical stage.
B) Choose your data management environment
– Will you choose to manage the infrastructure yourself (IaaS) or let a cloud provider do it (PaaS)?
– Although these decisions will be business-specific, cloud-based data management tools are more customizable and minimize business disruption during incremental migration.
C) Determine the cloud model you want to adopt
– Whether you choose a public cloud, private cloud, hybrid cloud, or multi-cloud depends on your specific business needs.
– Some companies choose a hybrid cloud model when they want to keep some data on-premises.
6. Review security and privacy policy
Review security policies before transferring data to the cloud to protect user privacy. Determine who has access to the migrated data or is authorized to access it.
Plan for data backup and recovery in case of data loss or compatibility problems.
7. Perform a data audit
Perform an audit of data stored on the cloud and on-premise. Identify server groups that can be shifted together to minimize business disruption.
If you plan to keep some data on-premise, define which data will be retained and for what purpose.
Again, it is advisable to consult an expert at this stage.
8. Migrate and operationalize
Define the test and acceptance criteria before the data migration is underway.
Plan the testing and then execute the migration to shift schema, data, ELT, processing, and metadata.
Once testing is successfully executed, start cloud data warehouse operations and migrate users and applications.
Final Thoughts on Migrating a Legacy Data Warehouse
Migrating a legacy data warehouse to the cloud is not easy, but it is worth the deeper insights you can generate about your customers and their buyer journey. You also gain the agility to respond to changing business needs, along with superior performance.
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform.