Skip to content
Search
Generic filters
Exact matches only

Amazon Redshift Architecture. Understanding the foundations of… | by Atharva Inamdar | Aug, 2020

From 10,000 ft, Redshift appears like any other relational database with fairly standard SQL and entities like tables, views, stored procedures, and usual data types.

Simplistic 10,000ft view

Knowing that Redshift is a distributed and clustered service, it is logical to expect that the data tables are stored across multiple nodes.

Leader and Compute Nodes
Nodes and Slices with table distribution
  1. Colocate data and compute minimizing data transfer and increasing join efficiency across nodes.

One key feature of Redshift that influences the compute is the columnar storage of data. In addition to the architecture and design for query efficiency, the data itself is stored in a columnar format. The majority of analytical queries will utilise a small number of columns from a table for any aggregations. Without going into details, data is stored by columns rather than rows. This presents multiple advantages for Redshift.

So far, data storage and management have shown significant benefits. Now it is time to consider management of queries and workloads on Redshift. Redshift is a data warehouse and is expected to be queried by multiple users concurrently and automation processes too. Workload Management (WLM) is a way to control the compute resource allocation to groups of queries or users. Through WLM, it is possible to prioritise certain workloads and ensure the stability of processes.