ETL Process

This project demonstrates a simple ETL (Extract, Transform, Load) pipeline using AWS services. The pipeline involves downloading a sample CSV file, uploading it to an S3 bucket, and processing it to store the data in a DynamoDB table.

Steps Performed

Step 1: Download Sample File

Download a sample file in CSV format.

Step 2: Create AWS Account and S3 Bucket

Create an AWS account if you don't already have one.
Create an S3 bucket:
- Bucket Name: etlprocess-files

Step 3: Access the File in S3 via Python Script

Requirements

Python SDK boto3 (as of July 2024)
Python 3.x

Setup Virtual Environment

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

AWS IAM Configuration

Create IAM Group and User

Create IAM Group (if it does not exist):
- Go to the IAM console.
- Create a group with the desired name.
- Attach the following policies:
  - AmazonS3FullAccess
  - AmazonDynamoDBFullAccess
- Click the "Create Group" button.
Create IAM User:
- Go to the IAM console.
- Create a user with the desired username.
- Add the user to the previously created group.
- Open the user details and navigate to the "Security credentials" tab.
- Under "Access keys", create a new access key.
- Choose "Local code" for key usage.
- Optionally, write a description tag.
- Store the access key and secret key securely. Note: The secret key is only viewable once, so store it immediately.

Step 4: Create DynamoDB Table

Go to the DynamoDB console.
Click the "Create Table" button.
Enter the table name: users.
Define the partition key: Name.
Optionally, define a sort key: Age.
Click the "Create Table" button.

Running the ETL Scripts

Run the ETL Script

This script retrieves data from the CSV stored in S3 and pushes the items to DynamoDB.

python3 main.py

View Data in DynamoDB

This script retrieves data inserted in DynamoDB from the users table.

python3 view_table.py

Notes

Ensure that your AWS credentials are configured correctly in your environment.

The scripts assume that the necessary AWS services (S3 and DynamoDB) and IAM permissions are correctly set up.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
buildspec.yml		buildspec.yml
lambda_function.py		lambda_function.py
main.py		main.py
requirements.txt		requirements.txt
view_table.py		view_table.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Process

Steps Performed

Step 1: Download Sample File

Step 2: Create AWS Account and S3 Bucket

Step 3: Access the File in S3 via Python Script

Requirements

Setup Virtual Environment

AWS IAM Configuration

Create IAM Group and User

Step 4: Create DynamoDB Table

Running the ETL Scripts

Run the ETL Script

View Data in DynamoDB

About

Releases

Packages

Languages

srikeerthis/etl-process

Folders and files

Latest commit

History

Repository files navigation

ETL Process

Steps Performed

Step 1: Download Sample File

Step 2: Create AWS Account and S3 Bucket

Step 3: Access the File in S3 via Python Script

Requirements

Setup Virtual Environment

AWS IAM Configuration

Create IAM Group and User

Step 4: Create DynamoDB Table

Running the ETL Scripts

Run the ETL Script

View Data in DynamoDB

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages