This project demonstrates a simple ETL (Extract, Transform, Load) pipeline using AWS services. The pipeline involves downloading a sample CSV file, uploading it to an S3 bucket, and processing it to store the data in a DynamoDB table.
Download a sample file in CSV format.
- Create an AWS account if you don't already have one.
- Create an S3 bucket:
- Bucket Name: etlprocess-files
- Python SDK boto3 (as of July 2024)
- Python 3.x
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
-
Create IAM Group (if it does not exist):
- Go to the IAM console.
- Create a group with the desired name.
- Attach the following policies:
- AmazonS3FullAccess
- AmazonDynamoDBFullAccess
- Click the "Create Group" button.
-
Create IAM User:
- Go to the IAM console.
- Create a user with the desired username.
- Add the user to the previously created group.
- Open the user details and navigate to the "Security credentials" tab.
- Under "Access keys", create a new access key.
- Choose "Local code" for key usage.
- Optionally, write a description tag.
- Store the access key and secret key securely. Note: The secret key is only viewable once, so store it immediately.
- Go to the DynamoDB console.
- Click the "Create Table" button.
- Enter the table name: users.
- Define the partition key: Name.
- Optionally, define a sort key: Age.
- Click the "Create Table" button.
This script retrieves data from the CSV stored in S3 and pushes the items to DynamoDB.
python3 main.py
This script retrieves data inserted in DynamoDB from the users table.
python3 view_table.py
Notes
- Ensure that your AWS credentials are configured correctly in your environment.
- The scripts assume that the necessary AWS services (S3 and DynamoDB) and IAM permissions are correctly set up.