Redshift
- Its an
OLAP
i.e.Online Analytics Processing
based onPostgreSQL
- Columnar Storage
MPP
i.e.Massively Parallel Query
- Pay as instance provisioned
- For long term consider using
Reserved Instance
- Has
SQL
interface to perform query BI
tools are integrated with it- AWS Quicksight
- Tableau
- Data can be loaded from
- S3
- DynamoDB
- DMS
- It can have nodes number from
1
to128
- Each node can contain
160GB
data -
Two types of node
-
Leader Node
, do the planning and results aggregation -
Compute Node
, perform queries and send results to the leader -
Using
VPC Enhanced Routing
,Redshift Clusters
can be access through theAWS Private Network
Snapshot and DR
- Snapshots are
- Point in time backup of a cluster
- Snapshots are stored in S3
- Snapshots are incremental, on changes are saved
- Snapshots can be restored to a new cluster
- Snapshots can be copied to another
AWS Region
- Manual
snapshot
does not delete automatically - Automatic snapshot has an automatic retention period (35 days)
- Monitor performance of
Redshift Cluster
byCloudwatch
andAWS Trusted Advisor
- Can be enabled cross-region snapshots for
Redshift Cluster
Redshift Spectrum
- Direct query to
S3
without loading - Must have a
Cluster
available to start the query Query
is submitted to thousands ofRedshift Spectrum Nodes
Best Practices
- To load data from
S3
use theCOPY
command