Migrate from PostgreSQL to Amazon RDS with AWS DMS
TUTORIAL
In this lesson, you migrate a self-managed PostgreSQL database instance to a fully managed database on Amazon Relational Database Service (Amazon RDS). First, you learn why you would want to use Amazon RDS to manage your relational database. Then, you work through the steps to migrate an existing PostgreSQL database to Amazon RDS. At the end of this lesson, you should feel confident in your ability to migrate an existing database to Amazon RDS.
Time to complete: 30–45 minutes
Amazon RDS is a fully managed service for running relational databases with AWS. It has support for six different database engines, including open-source options such as MySQL, PostgreSQL, and MariaDB.
With Amazon RDS, your database operations are managed by AWS, leaving your team free to focus on innovation. Amazon RDS handles instance failover, data backups, and software updates. Rely on the efficiencies of the AWS Cloud to use a faster, cost-efficient, and more reliable database option.
In this lesson, you learn how to migrate a self-managed PostgreSQL database to a fully managed database on Amazon RDS. This lesson has five steps.
-
1. Create a PostgreSQL database instance in Amazon RDS
In this module, you create a PostgreSQL database instance in Amazon RDS. This instance will be used as your primary database after you copy your existing data into it using AWS DMS.
To get started, navigate to the Amazon RDS console. On the main page, choose Create database to create a new database.
(click to zoom)This initiates the database creation wizard. In the Engine options section, choose PostgreSQL as your Engine type. Then, choose the version of PostgreSQL you want to use. Note that AWS DMS supports PostgreSQL versions 9.4 and later (9.x), 10.x, and 11.x.
The database creation wizard includes templates to make it easier to configure the settings for your Amazon RDS database. If you are creating this database to be used in production, you should choose the production template.
(click to zoom)In the Settings section, give your database a name and set the master username and password. Do not autogenerate your password for this lab, and make sure you write down your password. You need the password to connect to your database and create additional users.
(click to zoom)Next, choose the database instance size. You should make this choice based on your estimated capacity. If you are managing your own database on Amazon Elastic Compute Cloud (Amazon EC2), you can compare your current Amazon EC2 instance size to an Amazon RDS instance size.
If you want to increase or decrease your database instance size in the future, Amazon RDS allows you to do that easily. However, you may incur some downtime to change your database size.
(click to zoom)Next, configure the storage options for your Amazon RDS database. There are two storage options in Amazon RDS: general purpose and provisioned IOPS, or I/O operations per second. With general-purpose storage, you receive 3 IOPS per GiB of storage allocated. Thus, 100 GiB of storage would have 300 IOPS. Additionally, you receive burst capacity up to 3,000 IOPS.
With provisioned storage, you provision IOPS separately from your storage capacity. This allows you to tune your storage and operations settings to fit your needs. Additionally, you can enable storage autoscaling. With storage autoscaling, Amazon RDS automatically increases your storage capacity as your database is close to running out of free disk space.
(click to zoom)Next, decide whether to create a standby instance. A standby instance is a replica of your data that is available in the event of failure. It is located in a different Availability Zone in the same AWS Region as your Amazon RDS database to limit the impact of failures within a single data center. If you are running a production database where uptime is essential, a standby instance is recommended.
(click to zoom)The next section in the Amazon RDS database creation wizard is about connectivity. You must specify the Amazon Virtual Private Cloud (Amazon VPC) in which your database resides, as well as the network subnet and security groups for your database instance.
If you are migrating from a self-managed database instance on Amazon EC2, you can use the same Amazon VPC and security groups as your existing database.
If you are migrating from a database that is not hosted on AWS but your application is hosted on AWS, choose the same Amazon VPC that is used for your application. Then create a new security group for your database instance.
(click to zoom)Finally, choose the database authentication methods to allow in your database. PostgreSQL databases traditionally allow for username and password authentication. With PostgreSQL on Amazon RDS, you also can choose to allow for authentication using AWS Identity and Access Management (IAM). This integrates easily with your application compute and removes the need for credential rotation.
We recommend that you allow for both password and IAM database authentication. If you only want to start with password authentication, you may add IAM database authentication later, but it will result in some downtime for your Amazon RDS database instance.
(click to zoom)There are some additional configuration options you can configure, including settings for backups, monitoring, maintenance, and automated upgrades. The default settings work for most situations, but you should review them to ensure they work for your needs.
(click to zoom)The end of the database creation wizard shows you the estimated monthly costs for your database instance. Choose Create database to create your database instance.
(click to zoom)As Amazon RDS is provisioning your infrastructure and initializing your database, the Status of your database is Creating.
When your database is ready to use, its Status is Available.
(click to zoom)(click to zoom)
In this module, you created a fully managed, production-ready PostgreSQL database instance in Amazon RDS. In the next module, you create a replication instance in AWS DMS.
-
2. Create a replication instance in AWS Database Migration Service (AWS DMS)
In this module, you create a replication instance in AWS DMS.
AWS DMS is a service that copies data from an existing database into a fully managed database on AWS. A replication instance is an Amazon EC2 instance that can host replication tasks within AWS DMS. In the next module, you will create endpoints.
Before using AWS DMS to migrate your existing PostgreSQL database to a PostgreSQL database in Amazon RDS, you should review the AWS DMS documentation about using a PostgreSQL database as a source and using a PostgreSQL database as a target. In certain cases, it may make sense to use native tools such as pg_dump and pg_restore to migrate your database.
To create a replication instance, go to the Replication Instances section of the AWS DMS console. Choose Create replication instance to begin the replication instance creation wizard.
(click to zoom)In the Replication instance configuration section, give your replication instance a name and description. Then choose your instance class. The instance class you use depends on the size of your existing database and the amount of data flowing through it.
Then choose an engine version for AWS DMS. Finally, choose the amount of allocated storage for your replication instance.
(click to zoom)As you continue in the Replication instance configuration section, you need to choose a VPC for your replication instance. Choose the same VPC in which you provisioned your Amazon RDS database to ease network access for the replication instance.
You may choose to have a Multi-AZ setup for your replication instance for redundancy. If you will use AWS DMS to keep two databases in sync over a long period of time, you may want to use a Multi-AZ setup. If you are performing a one-time migration of your data from an existing database to a fully managed database in Amazon RDS, you likely don't need a Multi-AZ setup.
Finally, choose whether your replication instance should be publicly accessible. If your existing database is in the same VPC as your new database and your replication instance, you don't need your replication instance to be publicly accessible. If your existing database is not in the same VPC, you should use AWS Direct Connect or a VPN to allow connection from your VPC to your existing data center. You should avoid making your replication instance publicly accessible in most cases to avoid potential security issues.
(click to zoom)Next, open the Advanced security and network configuration section. For the VPC security group(s) configuration, choose the same security group that you attached to your Amazon RDS database. This allows your replication instance to access your Amazon RDS database.
(click to zoom)You also may edit the maintenance and tags settings.
When you're ready, choose Create to create your replication instance in AWS DMS.
(click to zoom)After you choose Create, AWS provisions your replication instance. It shows a Status of the replication instance is Creating while AWS provisions and initializes your instance.
(click to zoom)When your replication instance is ready to go, its Status is Available.
(click to zoom)While you are waiting for your replication instance to be available, go to the Security Groups section in the Amazon EC2 console. You need to add a rule to your security group to allow your replication instance to access your database.
In the Security Groups section, find the security group you attached to your PostgreSQL database instance and your replication instance, and choose it.
(click to zoom)Choose Edit inbound rules for your security group.
(click to zoom)Your security group has an existing rule that allows for access to your PostgreSQL instance from the IP address you used to create the database. Remove the existing IP address and enter the name of the security group used for your Amazon RDS database instance and replication instance.
Your screen should look as follows.
Choose Save rules to save the updated rules for your security group.
When your replication instance is available and you have updated the rules for your security group, you may move on to the next module.
(click to zoom)
In this module, you created a replication instance in AWS DMS. The replication instance is used to host the replication tasks that migrate data from an existing database to a fully managed database in Amazon RDS. You also updated a security group to allow access from your replication instance to your PostgreSQL database instance in Amazon RDS.
In the next module, you will create endpoints for your source and target databases in Amazon RDS.
-
3. Create endpoints in AWS DMS
In this module, you create source and target endpoints for a replication task in AWS DMS.
A replication task is a job to migrate data from one database to another by using AWS DMS. Before creating a replication task, you must register endpoints for your source and target databases. An endpoint describes the connection address, credentials, and other information required to connect to a database.
First, create the endpoint for your target database. This is the database you created in Amazon RDS.
Navigate to the Endpoints section of the AWS DMS console. Choose Create endpoint to create a new endpoint.
(click to zoom)In the endpoint creation wizard, choose to create a Target endpoint. Choose the check box to Select RDS DB instance, and choose your newly created Amazon RDS database in the dropdown.
(click to zoom)This fills in most of the Endpoint configuration details for you. You need to enter your password near the bottom of the section.
(click to zoom)Before you save your endpoint, test the connection to ensure that it was configured correctly. To test it, open the Test endpoint connection section.
Choose the replication instance you want to use, and then choose Run test. After a few seconds, you should see a Status of successful. This indicates that you configured your security group and endpoint correctly. To save your endpoint, choose Create endpoint.
(click to zoom)Follow these same steps again to create an endpoint for your source database. Unlike the target database, you need to fill out the connection endpoint, port, and credentials yourself.
You also need to ensure that your replication instance has network access to your source database. If your source database is hosted on Amazon EC2, allow traffic from your replication instance security group into the source database security group. If your source database is not hosted on Amazon EC2, you need to handle the network settings according to the location of your source database.
Before moving on to the next module, you should have two endpoints configured: one for your source database and one for your target database. Make sure that you have tested both endpoints and can successfully connect to both databases. Then move on to the next module.
In this module, you created your endpoints to connect to your databases. In the next module, you will use those endpoints to create a replication task that copies data from your source database to your target database.
-
4. Create a replication task in AWS DMS
In this module, you create a replication task in AWS DMS.
A replication task is responsible for migrating data from a source database to a target database. In your case, you are moving data from an existing database to your newly created database in Amazon RDS.
To get started, navigate to the Database migration tasks section of the AWS DMS console. Choose Create task to create a new replication task.
(click to zoom)In the Task configuration section, set up the parameters of your replication task. Give your task a name, and choose the replication instance you created in an earlier module. Then choose the source endpoint for your existing database and your target endpoint for your fully managed database in Amazon RDS.
You need to choose a migration type. There are two migration types:
1. Migrate existing data, which performs a one-time process to copy data from your source database to your target database.
2. Replicate ongoing changes, which copies all ongoing operations from your source database to your target database.If you are migrating your application from using a self-managed database to using a fully managed database, you want to use both types. The first type copies all data in your database, and the second type ensures that all additional updates are replicated to your new database until you switch your application to use the new database.
For the migration type, choose Migrate existing data and replicate ongoing changes. Note that this requires you to have logical replication enabled on your source database.
(click to zoom)In the Table mappings section, choose the tables you want to copy over. Enter the name of the schemas and tables you want to copy. You can use % as a wildcard character to copy multiple tables or schemas.
In this walkthrough, you migrate your tables exactly as they exist in the source database. If you have more complex requirements, AWS DMS supports filters and transformations of your source data before writing to your target database.
When you are ready, choose Create task to start your replication task.
(click to zoom)After you create your task, your task is shown in the Database migration tasks section with a Status of Creating.
(click to zoom)After the task is initialized, its Status is Starting.
(click to zoom)After the migration of existing data is complete, it shows a Status of Load complete, replication ongoing. Any updates to your source database at this point are copied to your target database.
(click to zoom)
In this module, you created a replication task in AWS DMS to migrate your existing data and sync ongoing changes from your previous database to your new database in Amazon RDS.
In the next module, you will complete the migration and clean up the resources you created.
-
5. Complete the migration and clean up resources
If you have followed all of the steps in this lesson, you have created a new, fully managed database in Amazon RDS. You also have created a migration task to copy data from your source database to your new database. In this final module, you learn the steps to complete your migration and to clean up your AWS DMS resources.
When your initial migration is complete and all data is synced to your new database, you are ready to use your new database as your primary database.
You have two ways you can handle this:
- If you feel confident about your migration, you can change the database configuration in your application to use your new database. This ensures all reads and writes go to your new database.
- If you want to follow a more cautious approach, you can read from and write to both databases for a period of time. This allows you to compare the results from each database for accuracy while still maintaining the correct data in your existing database. Additionally, you can use the validation feature in AWS DMS to compare the source and target records.
Whichever method you choose, you should thoroughly test your new database before making it your primary database. After you have switched to using your primary database and are confident in the results, you may want to delete your AWS DMS infrastructure.
Though we used AWS DMS in this lesson, there are other options to migrate PostgreSQL databases as well. You can use the pg_dump and pg_restore client applications to create a data export file and then load it into your new database. Additionally, you can use native logical PostgreSQL replication to migrate ongoing changes to a new database.
After you have switched to using your primary database and are confident in the results, you may want to delete your AWS DMS infrastructure.
First, stop and delete the database migration task to replicate your data. Navigate to the Database migration tasks section of the AWS DMS console. Choose the task you want to remove, and then choose Stop.
(click to zoom)It takes a few moments to stop the task. When it is stopped, choose it again, and then choose Delete.
(click to zoom)Next, navigate to the Endpoints section of the AWS DMS console. Choose both your source endpoint and your target endpoint, and then choose Delete.
(click to zoom)Then go to the Replication instances section of the AWS DMS console. If your replication instance is not being used for any other replication tasks, choose it and then choose Delete.
(click to zoom)If you do not want to keep the target Amazon RDS database that you created for this lesson, you should delete that as well.
Navigate to the Amazon RDS console. Choose your database instance, and then choose Delete from the Actions dropdown.
(click to zoom)Confirm that you want to delete your database. You can choose not to create a final snapshot or retain backups because you used the database only for this walkthrough.
(click to zoom)Finally, you may want to terminate your source database because it is no longer being used. If your source database is running on Amazon EC2, you may terminate the Amazon EC2 instance. If your source database is running elsewhere, follow the proper procedures to terminate it.
In this module, you learned how to migrate your application to use your new database. You also learned how to clean up AWS DMS resources when you are done using them.
In this lesson, we migrated an existing, self-managed PostgreSQL database to a fully managed PostgreSQL database in Amazon RDS by using AWS DMS. This migration allows you to free your developers to focus on the innovation that is core to your business. By using AWS DMS, you can automate the delicate task of migrating data to a new database.