Skip to content

An example of using Rust, Iceberg and Datafusion to run TPC-H benchmark

Notifications You must be signed in to change notification settings

definite-app/rice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rice: TPC-H to Apache Iceberg Converter in Rust

A Rust implementation for converting TPC-H data into Apache Iceberg tables, with support for S3 storage and DataFusion query execution.

Overview

Rice (Rust Iceberg Converter) is a tool that demonstrates how to work with Apache Iceberg in Rust, specifically focusing on:

  • Converting TPC-H Parquet files to Iceberg table format
  • Managing table metadata and manifest files
  • Storing data in S3-compatible object storage
  • Executing TPC-H queries using DataFusion

Features

  • 📊 Full TPC-H schema support for all 8 tables
  • 🗄️ S3 integration with automatic bucket creation
  • 📝 Iceberg metadata and manifest management
  • 🔄 Parquet to Iceberg conversion
  • 🔍 Query execution using DataFusion
  • 🏗️ Built with Rust for performance and safety

Requirements

  • Rust 1.70+
  • AWS credentials configured
  • TPC-H Parquet files
  • Environment variables:
    • BUCKET_NAME: S3 bucket name
    • AWS_REGION: AWS region
    • AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

Getting Started

  1. Set up your environment variables
  2. Place your TPC-H Parquet files in the project directory
  3. Run the converter:
cargo run

Architecture

  • Uses Apache Iceberg Rust SDK for table management
  • Leverages DataFusion for query processing
  • Implements S3 storage integration
  • Handles manifest and metadata file generation
  • Supports Iceberg table operations and queries

Example Query

Includes implementation of TPC-H Query 1 (Pricing Summary Report) using DataFusion and Iceberg tables.

Status

This is a demonstration project showing how to integrate Apache Iceberg with Rust, focusing on TPC-H data conversion and querying.

TODO

  • Run duckdb tpch to create the data

About

An example of using Rust, Iceberg and Datafusion to run TPC-H benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages