Data Ingestion Pipeline for Dema

A Prefect pipeline that ingests data from two CSV sources, persists the raw data to a database, and then transforms the data into a normalized form.

Overview

This pipeline created using tools:

Prefect for workflow management
Pandas for data manipulation
SQLAlchemy for database interaction
Pydantic for data validation
Postgres as a database

Installation

make init

Usage

To run the pipeline:

make local_run

It will run database in Docker and Prefect flow in the local environment.

Access the Prefect UI at http://localhost:4200/runs

Development

Copy .env.example to .env and fill in the required environment variables.

Update DATABASE_DSN in the .env file with your database connection string.

Example Analytics Queries

Example queries can be found in the analytics.sql file.

Next Steps and TODOs

Run flows in Docker container
Add unittests for the pipeline tasks
Add integration tests for the pipeline
Optimize the pipeline for large datasets (current implementation works for relatively small datasets, work with raw data should be done in chunks)
Depending on the data source, consider using an incremental load strategy
If needed, historical data can be stored in and calculated in staging schema
To efficiently handle large datasets, consider using columnar storage like Parquet or Redshift.
Add analytics schema to create customized views and tables for analytics purposes:
- Create a materialized view for the most frequently used queries
Assuming pipeline will be run on a regular basis, add cron scheduler to the flow and download data for the last period from the source

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
migrations		migrations
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
analytics.sql		analytics.sql
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Ingestion Pipeline for Dema

Overview

Installation

Usage

Development

Example Analytics Queries

Next Steps and TODOs

About

Releases

Packages

Languages

sevkar/dema-test

Folders and files

Latest commit

History

Repository files navigation

Data Ingestion Pipeline for Dema

Overview

Installation

Usage

Development

Example Analytics Queries

Next Steps and TODOs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages