Big Data Ingestion PipelineΒΆ
- We want the ingestion pipeline to be fully Serverless
- We want to collect data in real time
- We want to transform the data
- We want to query the transformed data using [[SQL]]
- The reports created using the queries should be in AWS S3
- We want to load that data into a [[Data warehouse]] and create [[dashboards]]
- [[IoT core]] allows you to harvest data from [[IoT device]]s
- Kinesis is great for [[real-time data collection]]
- Kinesis Firehose helps with data delivery to AWS S3 in near real-time (1 minute)
- Lambda can help Kinesis Firehose with [[data transformations]]
- AWS S3 can trigger notifications to AWS SQS
- AWS Lambda can subscribe to AWS SQS (we could have connected AWS S3 to AWS Lambda)
- AWS Athena is a Serverless [[SQL]] service and results are stored in AWS S3
- The reporting AWS S3 Bucket contains analysed data and can be used by reporting tool such as [[AWS QuickSight]], Redshift etc.