Redshift
- Its an
OLAPi.e.Online Analytics Processingbased onPostgreSQL - Columnar Storage
MPPi.e.Massively Parallel Query- Pay as instance provisioned
- For long term consider using
Reserved Instance - Has
SQLinterface to perform query BItools are integrated with it- AWS Quicksight
- Tableau
- Data can be loaded from
- S3
- DynamoDB
- DMS
- It can have nodes number from
1to128 - Each node can contain
160GBdata -
Two types of node
-
Leader Node, do the planning and results aggregation -
Compute Node, perform queries and send results to the leader -
Using
VPC Enhanced Routing,Redshift Clusterscan be access through theAWS Private Network
Snapshot and DR
- Snapshots are
- Point in time backup of a cluster
- Snapshots are stored in S3
- Snapshots are incremental, on changes are saved
- Snapshots can be restored to a new cluster
- Snapshots can be copied to another
AWS Region - Manual
snapshotdoes not delete automatically - Automatic snapshot has an automatic retention period (35 days)
- Monitor performance of
Redshift ClusterbyCloudwatchandAWS Trusted Advisor - Can be enabled cross-region snapshots for
Redshift Cluster
Redshift Spectrum
- Direct query to
S3without loading - Must have a
Clusteravailable to start the query Queryis submitted to thousands ofRedshift Spectrum Nodes
Best Practices
- To load data from
S3use theCOPYcommand