Friday, August 12, 2022
HomeBig DataMigrate a big information warehouse from Greenplum to Amazon Redshift utilizing AWS...

Migrate a big information warehouse from Greenplum to Amazon Redshift utilizing AWS SCT – Half 1

A information warehouse collects and consolidates information from varied sources inside your group. It’s used as a centralized information repository for analytics and enterprise intelligence.

When working with on-premises legacy information warehouses, scaling the scale of your information warehouse or enhancing efficiency can imply buying new {hardware} or including extra highly effective {hardware}. That is typically costly and time-consuming. Operating your individual on-premises information warehouse additionally requires hiring database managers, directors to take care of outages, upgrades, and information entry requests. As firms change into extra data-driven, dependable entry to centralized information is more and more vital. In consequence, there’s a robust demand for information warehouses which can be quick, accessible, and in a position to scale elastically with enterprise wants. Cloud information warehouses like Amazon Redshift deal with these wants whereas eliminating the price and danger of buying new {hardware}.

This multi-part collection explains how one can migrate an on-premises Greenplum information warehouse to Amazon Redshift utilizing AWS Schema Conversion Device (AWS SCT). On this first publish, we describe how one can plan, run, and validate the large-scale information warehouse migration. It covers the answer overview, migration evaluation, and steering on technical and enterprise validation. Within the second publish, we share finest practices for selecting the optimum Amazon Redshift cluster, information structure, changing saved procedures, suitable capabilities and queries broadly used for SQL conversions, and proposals for optimizing the size of knowledge sorts for desk columns.

Answer overview

Amazon Redshift is an industry-leading cloud information warehouse. Amazon Redshift makes use of Structured Question Language (SQL) to investigate structured and semi-structured information throughout information warehouses, operational databases, and information lakes utilizing AWS-designed {hardware} and machine studying to ship the very best price-performance at any scale.

AWS SCT makes heterogeneous database migrations predictable by mechanically changing the supply database schema and many of the database code objects, SQL scripts, views, saved procedures, and capabilities to a format suitable with the goal database. AWS SCT helps you modernize your functions concurrently throughout database migration. When schema conversion is full, AWS SCT will help migrate information from varied information warehouses to Amazon Redshift utilizing information extraction brokers.

The next diagram illustrates our structure for migrating information from Greenplum to Amazon Redshift utilizing AWS SCT information extraction brokers.

Carry out a migration evaluation

The preliminary information migration is the primary milestone of the undertaking. The primary necessities for this section are to attenuate the influence on the information supply and switch the information as quick as potential. To do that, AWS affords a number of choices, relying on the scale of the database, community efficiency (AWS Direct Join or AWS Snowball), and whether or not the migration is heterogeneous or not (AWS Database Migration Service (AWS DMS) or AWS SCT).

AWS offers a portfolio of cloud information migration companies to offer the proper answer for any information migration undertaking. The extent of connectivity is a big think about information migration, and AWS has choices that may deal with your hybrid cloud storage, on-line information switch, and offline information switch wants.

Moreover, the AWS Snow Household makes it easy to get your information into and out of AWS by way of offline strategies. Based mostly on the scale of the information, you should utilize AWS Snowmobile or AWS Snowball in case you have petabytes to exabytes of knowledge. To resolve which switch technique is best in your use case, confer with Efficiency for AWS Snowball.

Carry out schema conversion with AWS SCT

To transform your schema utilizing AWS SCT, you have to begin a brand new AWS SCT undertaking and join your databases. Full the next steps:

  1. Set up AWS SCT.
  2. Open and provoke a brand new undertaking.
  3. For Supply database engine, select Greenplum.
  4. For Goal database engine, select Amazon Redshift.
  5. Select OK.
  6. Open your undertaking and select Hook up with Greenplum.
  7. Enter the Greenplum database info.
  8. Select Check connection.
  9. Select OK after a profitable connection check.
  10. Select OK to finish the connection.
  11. Repeat comparable steps to determine a connection to your Amazon Redshift cluster.

    By default, AWS SCT makes use of AWS Glue because the extract, remodel, and cargo (ETL) answer for the migration. Earlier than you proceed, you have to disable this setting.
  12. On the Settings menu, select Undertaking settings.
  13. Deselect Use AWS Glue.
  14. Select OK.
  15. Within the left pane, select your schema (right-click) and select Convert schema.
  16. When requested to interchange objects, select Sure.
  17. When requested to load statistics, select Proceed.

    By the tip of this step, all Greenplum objects must be migrated to Amazon Redshift syntax. Some objects could also be proven in pink, which means that AWS SCT couldn’t totally migrate these objects. You may view an evaluation abstract of the migration for extra info.
  18. On the View menu, select Evaluation report view.

    Within the backside pane, you may see Greenplum DDL and Amazon Redshift DDL of the chosen objects facet by facet for comparability.
  19. Select the schema with a pink icon, which signifies that it wants guide conversion.You’re offered with particular actions relating to the tables, constraints, or views that may’t be migrated to Amazon Redshift. You should examine these points and repair the errors manually with the required modifications. Some examples are binary information in BLOB format, which AWS SCT mechanically converts to character various information sort, however this can be highlighted as a problem. Moreover, some vendor-supplied procedures and capabilities couldn’t be transformed, so AWS SCT can error out.

    As a remaining step, you may validate that the tables exist in Amazon Redshift.
  20. Join utilizing the Amazon Redshift question editor v2 or one other third-party instrument or utility of your selection and test for all of the tables with the next code:
    choose t.table_schema, t.table_name,i.tbl_rows
    from svv_tables t left be a part of svv_table_info i 
    on t.table_schema = i.schema and t.table_name = i.desk
    the place t.table_type="BASE TABLE" and t.table_schema="SCHEMA NAME"
    order by 1,2;

Migrate the information

To begin your information migration utilizing AWS SCT information extraction brokers, full the next steps:

  1. Configure the AWS SCT extractor properties file with corresponding Greenplum properties:
    driver.jars="C:DMS WorkshopJDBCpostgresql-42.2.9.jar"
    redshift.driver.jars=C:DMS WorkshopJDBCRedshiftJDBC42-no-awssdk-
    ssl.truststore.path=C:/DMS Workshop/sctagent/truststore
    ssl.keystore.path=C:/DMS Workshop/sctagent/keystore

    Now you configure the AWS SCT extractor to carry out a one-time information transfer. You need to use a number of extractors when coping with a big quantity of knowledge.

  2. To register the extractor, on the View menu, select Knowledge migration view.
  3. Select Register.
  4. Enter the knowledge in your new agent.
  5. Check the connection and select Register.

    Now you create a job for the extractor to extract information into the tables created on Amazon Redshift.
  6. Beneath your schema within the left pane, select Tables (right-click) and select Create Native job.
  7. For Job title, enter a reputation.
  8. Check the connection and select OK.
  9. Select Create.
  10. Run your job and monitor its progress.

You may select every job to get an in depth breakdown of its exercise. Be sure to look at errors through the extract, add, and replica course of.

You may monitor the standing of the duties, the share accomplished, and the tables that have been loaded efficiently. You should additionally confirm the rely of data loaded into the Amazon Redshift database.

Technical validation

After the preliminary extracted information is loaded to Amazon Redshift, you have to carry out information validation assessments in parallel. The objective at this stage is to validate manufacturing workloads, evaluating Greenplum and Amazon Redshift outputs from the identical inputs.

Typical actions lined throughout this section embrace the next:

  • Depend the variety of objects and rows on every desk.
  • Evaluate the identical random subset of knowledge in each Greenplum and Amazon Redshift for all migrated tables, validating that the information is strictly the identical row by row.
  • Test for incorrect column encodings.
  • Establish skewed desk information.
  • Annotate queries not benefiting from kind keys.
  • Establish inappropriate be a part of cardinality.
  • Establish with tables with massive VARCHAR columns.
  • Verify that processes don’t crash when related with the goal setting.
  • Validate each day batch jobs (job period, variety of rows processed). To search out the proper methods to carry out most of these actions, confer with Prime 10 Efficiency Tuning Strategies for Amazon Redshift
  • Arrange Amazon Redshift automated alerts with Amazon Redshift Advisor.

Enterprise validation

After you efficiently migrate the information and validate the information motion, the final remaining job is to contain the information warehouse customers within the validation course of. These customers from completely different enterprise models throughout the corporate entry the information warehouse utilizing varied instruments and strategies: JDBC/ODBC shoppers, Python scripts, customized functions, and extra. It’s central to the migration to guarantee that each end-user has verified and tailored this course of to work seamlessly with Amazon Redshift earlier than the ultimate cutover.

This section can include a number of duties:

  • Adapt enterprise customers’ instruments, functions, and scripts to hook up with Amazon Redshift endpoints.
  • Modify customers’ information load and dump procedures, changing information motion to and from shared storage by way of ODBC/JDBC with COPY and UNLOAD operations from and to Amazon Easy Storage Service (Amazon S3).
  • Modify any incompatible queries, bearing in mind any implementation nuances between Amazon Redshift and PostgreSQL.
  • Run enterprise processes towards Greenplum and Amazon Redshift, and examine outcomes and runtimes. Be sure to inform any subject or surprising end result to the staff accountable for the migration, so the case will be analyzed intimately.
  • Tune question efficiency, bearing in mind desk distribution and kind keys, and make in depth use of the EXPLAIN command so as to perceive how Amazon Redshift plans and runs queries. For superior desk design ideas, confer with Amazon Redshift Engineering’s Superior Desk Design Playbook: Preamble, Conditions, and Prioritization.

This enterprise validation section is essential so all end-users are aligned and prepared for the ultimate cutover. Following Amazon Redshift finest practices permits end-users to totally reap the benefits of the capabilities of their new information warehouse. After you carry out all of the migration validation duties, join and check each ETL job, enterprise course of, exterior system, and person instrument towards Amazon Redshift, you may disconnect each course of from the previous information warehouse, which now you can safely energy off and decommission.


On this publish, we offered detailed steps emigrate from Greenplum to Amazon Redshift utilizing AWS SCT. Though this publish describes modernizing and transferring to a cloud warehouse, you have to be augmenting this transformation course of in direction of a full-fledged trendy information structure. The AWS Cloud lets you be extra data-driven by supporting a number of use instances. For a contemporary information structure, you must use purposeful information shops like Amazon S3, Amazon Redshift, Amazon Timestream, and different information shops based mostly in your use case.

Try the second publish on this collection, the place we cowl prescriptive steering round information sorts, capabilities, and saved procedures.

In regards to the Authors

Suresh Patnam is a Principal Options Architect at AWS. He’s obsessed with serving to companies of all sizes remodeling into fast-moving digital organizations specializing in large information, information lakes, and AI/ML. Suresh holds a MBA diploma from Duke College- Fuqua Faculty of Enterprise and MS in CIS from Missouri State College. In his spare time, Suresh enjoys enjoying tennis and spending time along with his household.

Arunabha Datta is a Sr. Knowledge Architect at Amazon Internet Companies (AWS). He collaborates with clients and companions to architect and implement trendy information structure utilizing AWS Analytics companies. In his spare time, Arunabha enjoys images and spending time along with his household.



Please enter your comment!
Please enter your name here

Most Popular