A Rust implementation for converting TPC-H data into Apache Iceberg tables, with support for S3 storage and DataFusion query execution.
Rice (Rust Iceberg Converter) is a tool that demonstrates how to work with Apache Iceberg in Rust, specifically focusing on:
- Converting TPC-H Parquet files to Iceberg table format
- Managing table metadata and manifest files
- Storing data in S3-compatible object storage
- Executing TPC-H queries using DataFusion
- 📊 Full TPC-H schema support for all 8 tables
- 🗄️ S3 integration with automatic bucket creation
- 📝 Iceberg metadata and manifest management
- 🔄 Parquet to Iceberg conversion
- 🔍 Query execution using DataFusion
- 🏗️ Built with Rust for performance and safety
- Rust 1.70+
- AWS credentials configured
- TPC-H Parquet files
- Environment variables:
BUCKET_NAME
: S3 bucket nameAWS_REGION
: AWS region- AWS credentials (
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
)
- Set up your environment variables
- Place your TPC-H Parquet files in the project directory
- Run the converter:
cargo run
- Uses Apache Iceberg Rust SDK for table management
- Leverages DataFusion for query processing
- Implements S3 storage integration
- Handles manifest and metadata file generation
- Supports Iceberg table operations and queries
Includes implementation of TPC-H Query 1 (Pricing Summary Report) using DataFusion and Iceberg tables.
This is a demonstration project showing how to integrate Apache Iceberg with Rust, focusing on TPC-H data conversion and querying.
- Run duckdb tpch to create the data