Skip to content

A cryptography powered privacy-preserving discovery service design for mapping distinct identifiers to IDs without revealing any information to the server!

License

Notifications You must be signed in to change notification settings

harsh-ps-2003/rumi

Repository files navigation

RUMI

A privacy-preserving discovery service design for mapping distinct identifiers to IDs without revealing any information to the server!

Protocol Flow

sequenceDiagram
    participant C as Client
    participant S as Server
    participant O as ORAM

    Note over C,S: Setup Phase
    C->>S: Request public set of identifiers
    S->>C: Return sorted public set

    Note over C,S: Query Phase
    C->>C: Generate random secret scalar
    C->>C: Hash identifier to curve point
    C->>C: Blind point with secret scalar
    C->>C: Generate ZKSM proof
    C->>C: Compute N-bit prefix

    C->>S: Send (prefix, blinded point, ZKSM proof)
    
    Note over S: Server Processing
    S->>S: Verify ZKSM proof
    S->>S: Blind client point with server scalar
    S->>O: Read path based on prefix
    
    loop Fixed Number of Times
        S->>O: Perform dummy accesses
    end

    S->>C: Return (double-blinded point, matching entries)

    Note over C: Client Processing
    C->>C: Unblind received point
    C->>C: Search for match in entries
    C->>C: Decode matching UUID if found
Loading

The privacy guarantees are:

  1. Server learns nothing about:
  • Which identifier is being queried
  • Whether a query resulted in a match
  • Access patterns of queries
  • The actual identifier-to-ID mappings (they're stored in blinded form)
  1. Client learns nothing about:
  • Other identifiers in the system
  • Other user IDs in the system
  • Size of the database (due to fixed access patterns)
  1. Network observer learns nothing about:
  • Query contents (due to blinding)
  • Whether queries succeeded (due to fixed-size responses)
  • Database contents or size

Demo

To start the server, run cargo run --bin server in one terminal. The server will start with 100 demo identifiers (1000000000 through 1000000099). Run cargo run --bin client lookup 1000000042 in another terminal to start the client and lookup for a valid identifier. Run cargo run --bin client lookup 9999999999 to lookup a invalid identifier.

How it works?

Cryptography used :

  1. Elliptic Curve Cryptography (ECC): Used for secure double-side blinding (SMPC) operations and point encoding. Uses P256 for curve operations.
  2. Path ORAM (Oblivious RAM): Implemented to obscure access patterns and enhance privacy.
  3. Zero-Knowledge Set Membership (ZKSM) Proofs: Ensures that clients can prove they're querying valid identifiers without revealing which ones.

Client generates blinded requests, server manages the ORAM structure and processes client requests without learning the actual identifiers.

The client has a list of identifiers corresponding to their IDs. Both client and server have their own secret keys. The client stores its data in an Path ORAM as a privacy layer. The server wants to look up a ID for specific identifier, so it blinds the identifier using its secret key and also creates a proof to show that its asking for a valid identifier without revealing which one. The client recieves the blinded identifier and proof and ensures the request is valid. It further blinds the identifier with its own secret key, then searches ORAM for matching data and sends the double-blinded data with set of potential matches. The server removes its blinding, and searches for client blinded identifier in set of matches.

  1. Client Preparation:

    • Client generates a random secret scalar for blinding operations.
    • Client receives the public set of identifiers from the server.
  2. Client Request Generation:

    • Client hashes the identifier to a point on the elliptic curve.
    • Client blinds this point using their secret scalar (client-blinded identifier point).
    • Client generates a ZKSM proof to prove the identifier is in the public set without revealing which one.
    • Client computes the N-bit prefix of the hashed identifier.
  3. Client Request Transmission:

    • Client sends to the Server: a) The N-bit prefix of the hashed identifier b) The client-blinded identifier point c) The ZKSM proof
  4. Server Processing:

    • Server verifies the ZKSM proof to ensure the identifier is in the public set.
    • If verification fails, the server rejects the request.
    • If verification succeeds, the server proceeds with the following steps: a) Server blinds the client-blinded identifier point with its own secret scalar, creating a double-blinded identifier point. b) Server uses the N-bit prefix to locate the corresponding ORAM path. c) Server reads the ORAM path, retrieving all blocks along this path. d) Server performs a fixed number of additional random ORAM accesses to obscure the actual number of matches.
  5. Server Response:

    • Server returns to the client: a) The double-blinded identifier point b) A map of server-blinded identifier points to corresponding blinded user ID points from the retrieved ORAM blocks
  6. Client Processing:

    • Client unblinds the double-blinded identifier point using the inverse of their secret scalar.
    • Client searches for the unblinded point in the returned map.
    • If a match is found, the client retrieves the corresponding blinded user ID point.
    • Client unblinds the user ID point using the inverse of the hashed identifier scalar.
  7. Final Result:

    • If a match was found and successfully unblinded, the client now has the unblinded user ID point.
    • This point can be decoded to retrieve the actual UUID.
  8. ORAM Updates:

    • After each access, the server updates the ORAM structure:

      a) Generates a new random path for the accessed block. b) Writes back all retrieved blocks along this new path.
      c) Updates its position map to reflect the new location of the block.

The benefit of this approach is that the server only learns about the association between the identifiers when the client explicitly requests it, and even then, the server doesn't learn which specific identifier was requested. The ORAM structure and fixed-size operations obscure access patterns, while the blinding process ensures that the identifiers are obscured before transmission to the server. The ZKSM proof adds an extra layer of security by proving set membership without revealing the specific element.

This enhances the privacy and confidentiality of the client's data and prevents the server from having unrestricted access to sensitive information. The client can only refer to an identifier when they know it, preventing offline guessing attacks. While the server retains some information (N-bit prefixes and blinded points), it cannot directly link this data to the original identifiers without the client's secret key.

Architecture

RUMI consists of two main stacks:

1. Application Stack (/src)

The core application is built with Rust and consists of:

  • Server (src/bin/server.rs):

    • gRPC server handling client requests
    • Implements Path ORAM for secure data access
    • Integrates with Prometheus for metrics
    • Features:
      • Zero-Knowledge Set Membership proofs
      • Elliptic Curve Cryptography (P-256)
      • Blinded identifier handling
  • Client (src/bin/client.rs):

    • Command-line interface for interacting with server
    • Handles identifier blinding and proof generation
    • Supports lookup operations
  • Core Library (src/lib.rs, src/oram.rs):

    • Path ORAM implementation
    • Cryptographic primitives
    • Protocol implementation

2. Monitoring Stack (/monitoring)

The monitoring infrastructure uses Docker Compose and includes:

  • Prometheus (monitoring/prometheus/):

    • Metrics collection and storage
    • Configuration in prometheus.yml
    • Scrapes metrics from Pushgateway
    • Default retention: 15 days
    • Port: 9090
  • Pushgateway:

    • Accepts metrics from the RUMI server
    • Bridges non-persistent metrics to Prometheus
    • Port: 9091
  • Grafana (monitoring/grafana/):

    • Metrics visualization
    • Pre-configured dashboards
    • Port: 3001 (admin/admin)

Key metrics collected:

  • Request counters by endpoint
  • Request duration histograms
  • Memory usage
  • ORAM operation timings

Deployment

Application Stack

  1. Build and start the server:
docker compose up -d server
  1. Run client commands:
docker compose run client lookup <identifier>

Server endpoints:

  • gRPC: localhost:50051
  • Metrics: localhost:9091/metrics

Monitoring Stack

  1. Start the monitoring services:
cd monitoring
docker compose up -d
  1. Access monitoring interfaces:

Docker Network Architecture

The setup uses two Docker networks:

  • rumi-net: Internal network for server-client communication
  • monitoring_monitoring: Shared network with monitoring stack

Building from Source

# Build all components
docker compose build

# Build specific components
docker compose build server
docker compose build client

# View logs
docker compose logs -f server

Development Commands

# Run server with debug logging
RUST_LOG=debug cargo run --bin server

# Run client
cargo run --bin client -- lookup 1000000001

References

I was inspired by How Signal uses ORAMs!

The best blog to learn ECC that I could find out was this! The idea of Zero Knowledge Set Membership comes from this paper paired with the simple Path ORAM coming from this paper.

Path forward

Current ORAM Usage prevents access pattern leakage, hides which identifiers are being queried, provides O(log n) access complexity and ensures fixed-size responses.

  1. Add support for Private Information Retrieval (PIR) to further reduce server knowledge. Think about IT-PIR (if distributed servers are involved), cPIR (using Homomorphic encryptions but huge computational overhead). Try tinkering with Hybrid PIR-ORAM scheme referenced in this paper for further security-performance tradeoff. Use of PIR has significant performance tradeoffs! Distributed PIR bestows us stronger security and horizontal scalability. Though PIR may leak DB size.
  2. Tinker with a Cuckoo hashing scheme proposed by this paper to improve efficiency and reduce the need for large buckets and store blocks more efficiently in the Simple Path ORAM! Though not sure about its security bottlenecks!
  3. Implement a secure update mechanism using ZKPs.
  4. Explore Ring Signatures as its more efficient than ZKSM for larger sets (proof size grows linearly with set size though) while providing stronger anonymity!
  5. Goals:
    • Write: <15μs (17% improvement)
    • Read: <50μs (50% improvement)
  6. Tackle side channel attacks!

Analysis

Run sudo -E cargo flamegraph --root --bin server and then lookup any identifier. Then close the server to get the flamegraph in the root of the project.

The benchmark results were :

oram_write: [17.631 µs 17.893 µs 18.167 µs] |min__|avg__|max__|

oram_read: [100.96 µs 101.34 µs 101.73 µs] |min_|avg_|max_|

The read is significantly slower than writes :(

Run cargo bench for detailed benchmarks.

Run RUST_LOG=debug cargo run --bin server and RUST_LOG=debug cargo run --bin client -- lookup 1000000001 to run the debug traces.

Also, tokio-console can be used for more performance monitoring.

Disclaimer

The Identifiers-UUID are hardcoded as it's a toy implementation. Not designed to be used in production!

This is a personal project to deepen the understanding of Cryptography after undertaking the course CS670 from IIT Kanpur.

About

A cryptography powered privacy-preserving discovery service design for mapping distinct identifiers to IDs without revealing any information to the server!

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages