Skip to content

Mooncake-Labs/pg_mooncake

Repository files navigation

pg_mooncake 🥮

Postgres extension for 1000x faster analytics

License Slack Twitter Docs

Overview

pg_mooncake is a Postgres extension that adds columnar storage and vectorized execution (DuckDB) for fast analytics within Postgres.

Columnstore tables are stored as Iceberg or Delta Lake tables in local file system or cloud storage.

The extension is maintained by Mooncake Labs and is available on Neon Postgres.

Option 1: Docker

Get started quickly with our Docker image:

docker pull mooncakelabs/pg_mooncake

# server
docker run --name mooncake-demo -e POSTGRES_HOST_AUTH_METHOD=trust -d mooncakelabs/pg_mooncake

# client
docker run -it --rm --link mooncake-demo:postgres mooncakelabs/pg_mooncake psql -h postgres -U postgres

Option 2: From Source

Build for Postgres versions 14–17:

git clone --recurse-submodules https://github.com/Mooncake-Labs/pg_mooncake.git
cd pg_mooncake
make release -j$(nproc)
make install

Option 3: On Neon Postgres

  1. Create a Neon project
  2. Enable beta extensions:
SET neon.allow_unstable_extensions='true';
  1. Enable the extension
CREATE EXTENSION pg_mooncake;
  1. Create a columnstore table:
CREATE TABLE user_activity(
  user_id BIGINT,
  activity_type TEXT,
  activity_timestamp TIMESTAMP,
  duration INT
) USING columnstore;
  1. Insert data:
INSERT INTO user_activity VALUES
  (1, 'login', '2024-01-01 08:00:00', 120),
  (2, 'page_view', '2024-01-01 08:05:00', 30),
  (3, 'logout', '2024-01-01 08:30:00', 60),
  (4, 'error', '2024-01-01 08:13:00', 60);

SELECT * from user_activity;

Columnstore tables behave just like regular Postgres heap tables, supporting transactions, updates, deletes, joins, and more.

Columnstore tables are stored in the local file system by default. You can configure mooncake.default_bucket to store data in S3 or R2 buckets instead.

Note: On Neon, only cloud storage is supported. Neon users must bring their own S3 or R2 buckets or get a free S3 bucket by signing up at s3.pgmooncake.com. For cloud storage configuration instructions, see Cloud Storage. We are working to improve this experience.

pg_mooncake supports loading data from:

  • Postgres heap tables
  • Parquet, CSV, JSON files
  • Iceberg, Delta Lake tables
  • Hugging Face datasets

Columnstore Tables as Iceberg or Delta Lake Tables

Find your columnstore table location:

SELECT * FROM mooncake.columnstore_tables;

The directory contains a Delta Lake (and soon Iceberg) table that can be queried directly using Pandas, DuckDB, Polars, or Spark.

Roadmap

  • Transactional INSERT, SELECT, UPDATE, DELETE, and COPY
  • JOIN with regular Postgres heap tables
  • Load Parquet, CSV, and JSON files into columnstore tables
  • Read existing Iceberg and Delta Lake tables
  • File statistics and skipping
  • Write Delta Lake tables
  • Write Iceberg tables
  • Batched small writes and compaction
  • Secondary indexes and constraints
  • Partitioned tables ^

^ File statistics and skipping should cover most use cases of partitioned tables in Postgres, including time series.