DynamoDB Overview¶
Traditional Architecture¶
- Traditional applications leverage RDBMS databases
- These databases have SQL query language
- Strong requirements about how the data should be modeled
- Ability to do query joins, aggregations, complex computations
- Vertical scaling (getting more powerful CPU / RAM / IO)
- Horizontal scaling (increasing reading capacity by adding EC2 / RDS Read Replicas)
NoSQL databases¶
- NoSQL databases are non-relational databases and are distributed
- NoSQL databases include MongoDB, DynamoDB, ...
- NoSQL databases do not support query joins (or just limited support)
- All the data that is needed for a query is present in one row
- NoSQL databases don't perform aggregations such as SUM, AVG, ...
-
NoSQL databases scale horizontally
-
There's no "right or wrong" for NoSQL vs SQL, they just require to model the data differently and think about user queries differently.
Amazon DynamoDB¶
- Fully managed, highly available with replication across multiple AZs
- NoSQL database - not a relational database
- Scales to massive workloads, distributed database
- Millions of requests per seconds, trillions of rows, 100s of TB of storage
- Fast and consistent in performance (low latency on retrieval)
- Integrated with IAM for security, authorization and administration
- Enables event driven programming with DynamoDB Streams
- Low cost and auto-scaling capabilities
- Standart & Infrequent Access (IA) Table Class
DynamoDB - Basics¶
- DynamoDB is made of Tables
- Each table has a Primary Key (must be decided at creation time)
- Each table can have an infinite number of items (rows)
- Each item has attributes (can be added over time - can be null)
- Maximum size of an ittem - 400KB
- Data types supported are:
- Scalar Types - String, Number, Binary, Boolean, Null
- Document Types - List, Map
- Set Types - String Set, Number Set, Binary Set
DynamoDB - Primary Keys¶
- Option 1: Partition Key (HASH)
- Partition key must be unique for each item
- Partition key must be diverse so that the data is distributed
- Example: user_id for a users table
- Option 2: Partition Key + Sort Key (HASH + Range)
- The combination must be unique for each item
- Data is grouped by partition key
- Example: users-games table, user_id for partition key and game_id for sort key.
Partition Keys - Excercise¶
- We're building a movie database
-
What is the best partition key to maximize data distribution?
- Movie_id
- Producer_name
- Leader_actor_name
- Movie_language
-
Movie_id has the highest cardinality so it's a good candidate
- Movie language doesn't take many values and may be skewed towards English so it's not a great choice for the partition key.