From ac61806125d6da2fb63d63906f0f583ffe096ec7 Mon Sep 17 00:00:00 2001 From: Walid Baruni Date: Wed, 15 Jan 2025 12:53:26 +0200 Subject: [PATCH] Update partitioning.md (#4811) ## Summary by CodeRabbit - **Documentation** - Updated terminology from "consistent hashing" to "deterministic hashing" in S3 object partitioning documentation - Added guidance on date format verification for timezone scenarios in best practices section --- pkg/s3/partitioning.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pkg/s3/partitioning.md b/pkg/s3/partitioning.md index 32ba31b45d..a2f5fb25d5 100644 --- a/pkg/s3/partitioning.md +++ b/pkg/s3/partitioning.md @@ -4,7 +4,7 @@ This documentation describes the partitioning system for S3 objects, which enabl ## Overview -The partitioning system allows you to split a collection of S3 objects across multiple processors using consistent hashing. This ensures: +The partitioning system allows you to split a collection of S3 objects across multiple processors using deterministic hashing. This ensures: - Even distribution of objects - Deterministic assignment of objects to partitions - Support for various partitioning strategies @@ -19,7 +19,7 @@ The partitioning system allows you to split a collection of S3 objects across mu ### 2. Object (`PartitionKeyTypeObject`) - Partitions based on the complete object key -- Uses consistent hashing of the entire object path +- Uses deterministic hashing of the entire object path - Best for random or unpredictable key patterns - Ensures even distribution across partitions @@ -96,7 +96,7 @@ type PartitionConfig struct { 3. Special handling for single partition case: - If total partitions = 1, returns all objects without processing -### Consistent Hashing +### deterministic Hashing - Uses FNV-1a hash algorithm - Ensures deterministic distribution - Formula: `partition_index = hash(key) % total_partitions` @@ -255,4 +255,4 @@ The prefix trimming affects how each partition type processes keys: 4. Testing recommendations: - Validate partition distribution with sample data - Test edge cases and fallback scenarios - - Verify date formats with different timezone scenarios \ No newline at end of file + - Verify date formats with different timezone scenarios