Skip to content

Commit

Permalink
Update partitioning.md (#4811)
Browse files Browse the repository at this point in the history
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Updated terminology from "consistent hashing" to "deterministic
hashing" in S3 object partitioning documentation
- Added guidance on date format verification for timezone scenarios in
best practices section

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
wdbaruni authored Jan 15, 2025
1 parent 478e3f9 commit ac61806
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions pkg/s3/partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This documentation describes the partitioning system for S3 objects, which enabl

## Overview

The partitioning system allows you to split a collection of S3 objects across multiple processors using consistent hashing. This ensures:
The partitioning system allows you to split a collection of S3 objects across multiple processors using deterministic hashing. This ensures:
- Even distribution of objects
- Deterministic assignment of objects to partitions
- Support for various partitioning strategies
Expand All @@ -19,7 +19,7 @@ The partitioning system allows you to split a collection of S3 objects across mu

### 2. Object (`PartitionKeyTypeObject`)
- Partitions based on the complete object key
- Uses consistent hashing of the entire object path
- Uses deterministic hashing of the entire object path
- Best for random or unpredictable key patterns
- Ensures even distribution across partitions

Expand Down Expand Up @@ -96,7 +96,7 @@ type PartitionConfig struct {
3. Special handling for single partition case:
- If total partitions = 1, returns all objects without processing

### Consistent Hashing
### deterministic Hashing
- Uses FNV-1a hash algorithm
- Ensures deterministic distribution
- Formula: `partition_index = hash(key) % total_partitions`
Expand Down Expand Up @@ -255,4 +255,4 @@ The prefix trimming affects how each partition type processes keys:
4. Testing recommendations:
- Validate partition distribution with sample data
- Test edge cases and fallback scenarios
- Verify date formats with different timezone scenarios
- Verify date formats with different timezone scenarios

0 comments on commit ac61806

Please sign in to comment.