Can i use mysql in redshift?

Question

By Roi Avinoam | January 10, 2018 | Updated On: January 30, 2021

Nội dung chính Show

Amazon Redshift to the Rescue
Handling Database Workloads with OLAP vs. OLTP
Using Two Types of Compute Nodes
Step-by-Step: MySQL to Redshift
1. Create a Redshift cluster
2. Export MySQL Data to Amazon S3
3. Upload the load files to Amazon S3
4. Run a COPY command to load the table
5. Verify the Data
Is Redshift compatible with MySQL?
Is Redshift SQL or MySQL?
What SQL language is used in Redshift?
Is Redshift Postgres or MySQL?

Data analytics is a requirement for virtually every business today. However building an analytical data warehouse requires vast computing power to maintain performance and quick retrieval of data and results. In addition, the organization’s IT teams need to acquire new skills and infrastructure know-how to establish and maintain their BI systems.

To fill this infrastructure gap, Amazon Redshift brings a fully managed cloud solution for modern data warehouses. It delivers a massive parallel processing data warehouse architecture and uses a columnar data store that can aggregate and analyze large amounts of data. It enables faster performance - no matter how much data. That’s why Amazon with its Redshift columnar storage has become the solution of choice for companies like: Nokia that moved its high data usage activity to Redshift and now runs queries two times faster. And, VivaKi that loaded 10 terabytes of data in only six hours with Redshift architecture.

So, for real-time data analytics on high volumes of data, Redshift has distinct benefits which obviously MySQL, PostgreSQL, or other common databases cannot handle at scale. However, though we see great success stories like Vivaki, moving data from your database to Redshift isn’t straight-forward. Or what if a data mart is right for you? If you're unsure, comparing a data mart vs. data warehouse is the first step.

In this article, we’ll provide a step-by-step sample procedure for how to migrate your MySQL to Redshift.

Amazon Redshift to the Rescue

Amazon Redshift has provided a very good solution for today’s issues and beyond. As you’ve probably experienced, MySQL only takes you so far. It’s not designed to cope with your data scaling, data consistency, query performance, or analytics on large amounts of data. Redshift is built to handle large scale data analytics. Plus, it’s cost effective, especially compared to other analytics-specific databases like Hive and Impala.

Before jumping into action, let’s understand these Redshift differences from MySQL:

Handling Database Workloads with OLAP vs. OLTP

One of the biggest differences between MySQL and Redshift is how the way each handles database workloads. MySQL uses OLTP system where the purpose is to control and run fundamental business tasks. Its purpose and design is for fast transactional random access, not analytics.

By using OLAP, Redshift gears its system for analytics and helps with planning and problem solving. With the OLAP columnar design, Redshift improves on performance. The data is organized in a way which allows for better compression and easier sequential reads. Therefore, large amounts of data scanning is done quickly.

In addition, as mentioned OLTP only gives output of ongoing business processes, whereas OLAP gives multi-dimensional views of various kinds of business activities. OLAP queries are often very complex and involve aggregations, versus OLTP queries that are often simple and return relatively few records.

Using Two Types of Compute Nodes

There are two types of compute nodes available for RedShift. The first is DW2 which is very a dense compute running on SSD. DW2 are very fast solid state memory drives, that support the database’s I/O performance needs. They cost about $1.50/Tb per hour. The second node type is DW1 or so-called dense storage nodes, which in comparison to DW2 run on traditional storage disks. They cost around $0.50/Tb per hour.

Step-by-Step: MySQL to Redshift

Now that you’re familiar with Amazon Redshift, we’ll show you how to load data to it.

Some preliminaries: There are several ways to perform the load. The COPY command is the most efficient way to load a table, as it can load data in parallel from multiple files and take advantage of the load distribution between nodes in the Redshift cluster. It supports loading data in CSV (or TSV), JSON, character-delimited, and fixed width formats.

After initial data load, you can add, modify, or delete a significant amount of data, follow up by running a VACUUM command to reorganize data and reclaim space after deletes.

Using individual INSERT statements to populate a table might be prohibitively slow. Alternatively, if data already exists in other Amazon Redshift database tables, use SELECT INTO … INSERT or CREATE TABLE AS to improve performance.

We’ll discuss the best ways to use each of these commands by example.

So, let’s start - here are the 5 steps for loading data into Redshift:

Create a Redshift cluster
Export a MySQL database and split it into multiple files
Upload the files to Amazon S3
Run a COPY command to load the table to Redshift
Verify that the data was loaded correctly

1. Create a Redshift cluster

Perform the following steps to create a cluster:

Enter your AWS account Console > Services > Redshift > Launch Cluster to start the RedShift cluster creation wizard.
Enter the required cluster details as shown below:

Configure the Node Type, Cluster Type and the number of Compute Nodes on the Node Configuration page:

Choose the default value for the rest of the options and you are good to go.

The next thing to configure is the Security Groups to allow hosts access:
Go to Redshift > Security > Security Groups > Connection Type > CIDR/IP and add the IP address of host that you want to access.

2. Export MySQL Data to Amazon S3

Let’s assume that we have several tables to export to Redshift.

In following example, we run a statement to select all sales transaction for the past 5 years from tbl_sales and split them over to 1 million rows per file:



Then, call the procedure:

Move the generated CSV files into a directory called s3-redshift:

3. Upload the load files to Amazon S3

Install s3cmd, a command line tool to manage Amazon S3 remotely which is available in epel-testing repository:

Configure S3 credentials:

Create a new bucket for Redshift:

Start syncing the CSV data directory into S3 bucket:



Ensure the data files exist in S3’s bucket as shown in the following list:

In the next steps, we prepare our Redshift:

Redshift runs on PostgreSQL 8.X version as the backend. We can use the standard PostgreSQL client to access the Redshift cluster with provided endpoint and credentials.

To install PostgreSQL command we can execute the following commands:

Use psql program to access the Redshift cluster:

Create a table similar to the one we have in MySQL

4. Run a COPY command to load the table

Next, run the COPY command to access the CSV files in our S3 bucket and parallel load them into the table:

Now the data is available in the Redshift cluster and ready forquery processing.

5. Verify the Data

From the psql client console, verify the table structure:

Column	TYPE	Modifiers
id	bigint	NOT NULL
trx_id	character varying(16)	NOT NULL
p_id	INTEGER	NOT NULL
created	TIMESTAMP without TIME ZONE
quantity	INTEGER	NOT NULL
price	numeric(13,2)	DEFAULT 0
subtotal	numeric(13,2)	DEFAULT 0
discount	double precision	DEFAULT 0
total	numeric(13,2)	DEFAULT 0

To verify that each table loaded correctly, execute the following commands:

All executed queries on the cluster can be seen on the Redshift Queries tab as well:

Click on the Query ID to get in-depth details on the query plan and status:

That’s it. Your data is now in Redshift!

Conclusion

Migrating data to Amazon Redshift is relatively easy when you have access to the right procedure. But, sometimes moving the data is sometimes not all you need to do. Be aware of issues like: Handling the metadata and things like stored procedures, triggers, custom functions that can be problematic and add complexity to the move.

Also, pay attention to issues that may arise while adapting all existing queries and dashboards to the new query language (PostgresSQL instead of MySQL’s SQL). Redshift also helps to solve your issues for adapting all existing queries and dashboards to their new query language by automatically sampling data into the most appropriate compression scheme.

With Amazon Redshift, your data is organized in a better way. Query processing and sequential storage gives your enterprise an edge with improved performance as the data warehouse grows. Therefore, migrating from MySQL to Redshift can be a crucial step to enabling big data analytics in your organization.

Check our our detailed comparisons between MySQL and MongoDB, and MySQL vs MariaDB.

Is Redshift compatible with MySQL?

Amazon Redshift needs database credentials to issue a federated query to a MySQL database. AWS Secrets Manager provides a centralized service to manage secrets and can be used to store your MySQL database credentials.

Is Redshift SQL or MySQL?

Amazon Redshift is built around industry-standard SQL, with added functionality to manage very large datasets and support high-performance analysis and reporting of those data.

What SQL language is used in Redshift?

The SQL language consists of commands that you use to create and manipulate database objects, run queries, load tables, and modify the data in tables. Amazon Redshift is based on PostgreSQL.

Is Redshift Postgres or MySQL?

Amazon Redshift is based on PostgreSQL. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of as you design and develop your data warehouse applications.

programming mysql Redshift MySQL Amazon Redshift Redshift to RDS CLOB Redshift