- Implementation of extendible hashing with dynamic bucket allocation.
- Supports CRUD operations for managing records efficiently.
- Provides user-friendly interaction to update and delete records.
- Demonstrates practical use cases for hash tables in data management.
- Extendible hashing implementation
- Dynamic hashing algorithm
- Hash table with CRUD operations
- Extendible Hashing with OOP implementation
- Dynamic Hashing with OOP implementation
- Data structures and hashing
- Bucket splitting in hashing
- Hash table CRUD functionality
- Extendible hash table Python
- Efficient data storage with hashing
In a large-scale warehouse handling thousands of products across multiple categories, ensuring efficient storage and retrieval of product data is crucial. The challenge lies in maintaining optimal storage utilization while allowing for rapid search, update, and insertion of records. The system must be scalable to accommodate significant increases in data volume over time.
A traditional hash table approach fails in this scenario due to fixed-size buckets, leading to frequent overflow and inefficient data handling. Dynamic resizing techniques like extendible hashing provide a better alternative.
- Insertion: Efficiently add new product records.
- Search: Quickly retrieve product details using the ProductID.
- Update: Modify existing product records.
- Delete: Remove product records from the system.
- Load Factor Management: Maintain a balanced load factor to optimize memory usage.
- Scalability: Dynamically adjust to accommodate growing data.
- Performance: Ensure low latency for operations.
- Data Integrity: Prevent loss of records during operations like bucket splitting.
- Ease of Use: Provide clear methods to interact with the hash table.
- Fixed bucket capacity.
- Limited initial memory allocation to simulate a real-world constraint.
Extendible Hashing dynamically adjusts the hash table size using a directory and local/global depth mechanisms. This approach minimizes collisions and reduces memory wastage by splitting buckets only when needed.
- Scalability: Only affected buckets are split, reducing unnecessary memory usage.
- Dynamic Resizing: The directory doubles in size only when global depth increases, unlike traditional approaches that require complete rehashing.
- Efficient Load Management: Load factor is automatically balanced by redistributing records during bucket splits.
- Define a directory with a global depth of 1 and buckets with a fixed capacity.
- Hash the ProductID to compute an index.
- Add the record to the corresponding bucket.
- If the bucket is full, split it and redistribute the records.
- Increment the local depth of the bucket.
- Create a new bucket and update the directory pointers.
- Redistribute the records based on updated hashes.
- If a bucket’s local depth exceeds the global depth, double the directory size.
- Copy existing pointers to the new directory slots.
- Compute the hash of the ProductID and locate the bucket using the directory.
- Retrieve the record from the bucket.
- Calculate the ratio of total records to total buckets.
- Encapsulation: The
Bucket
andExtendibleHashingOptimized
classes encapsulate data and methods, ensuring modularity. - Inheritance: Extend the base classes to introduce specialized behaviors for different scenarios.
- Polymorphism: Use method overriding to customize bucket behaviors if needed.
- Dynamic Directory: The directory is implemented as a list, allowing efficient doubling and indexing.
- Buckets: Use lists to manage records, ensuring O(1) insertion and retrieval.
- Hashing: Optimize the hash function for uniform distribution of keys.
Here I will explain only ExtendibleHashingOptimized Class as it is the main part of this solution
The ExtendibleHashingOptimized
class implements an efficient dynamic hashing algorithm called Extendible Hashing. This algorithm is designed to handle data growth by dynamically splitting buckets and expanding the directory as needed, ensuring efficient storage and retrieval of records.
This implementation is particularly useful for database systems or applications with unpredictable data size and ensures minimal collisions and optimal performance.
- Dynamic Growth: Automatically grows the hash table by splitting buckets and expanding the directory when full.
- Efficient Data Access: Uses hash values to map records to specific buckets, allowing fast insertion and search operations.
- Load Balancing: Redistributes records during bucket splits to maintain even data distribution, preventing performance issues from overfull buckets.
Initializes the hash table with the following:
global_depth
: The number of bits used for hashing, starting at 1.bucket_capacity
: The maximum number of records a bucket can hold.directory
: An initial list of buckets, with the size determined by2^global_depth
.
Generates a binary hash value for a given key:
- Uses Python’s
hash()
function and the currentglobal_depth
to determine the number of bits in the hash. - Ensures that hash values are compact and suitable for dynamic growth.
Converts the binary hash value into a directory index by extracting the first global_depth
bits. This index identifies the bucket where the record should be placed.
Handles the insertion of new records into the hash table.
- Computes the hash value for the
ProductID
in the record. - Finds the directory index corresponding to the hash value.
- If the target bucket has space, the record is added directly.
- If the bucket is full:
- Calls the
split_bucket()
method to create a new bucket and redistribute records. - Reinserts the current record into the updated structure.
- Calls the
This method ensures that no record is lost during bucket splits.
The split_bucket
method is the most critical part of the dynamic hashing mechanism. It handles bucket overflows by splitting full buckets and redistributing records.
-
Identify the Full Bucket:
- Locate the bucket at the given
index
in the directory. - Store its current
local_depth
(the number of bits this bucket uses for hashing).
- Locate the bucket at the given
-
Check Depths:
- If the bucket’s
local_depth
equals theglobal_depth
, the directory must be expanded by callingdouble_directory()
.
- If the bucket’s
-
Increase Local Depth:
- Increment the
local_depth
of the full bucket to indicate it now uses one more bit for hashing.
- Increment the
-
Create a New Bucket:
- A new bucket is created with the same capacity as the full bucket.
- This bucket will hold records that differ from the full bucket in the bit corresponding to the new
local_depth
.
-
Update Directory:
- Identify which directory indices should point to the new bucket based on the updated hash values.
- Update the directory to reflect this split.
-
Redistribute Records:
- Move all records from the full bucket into their correct buckets (either the original or the new bucket) by recalculating their hash values and indices.
- Clear the full bucket before redistributing to avoid duplicate records.
When a bucket is split, the additional bit in the local_depth
determines which bucket each record should go into. Redistributing records ensures that the hash table maintains its structural integrity and efficiency.
- Assume a bucket has
local_depth = 2
and holds records that hash to00
and10
. - After splitting,
local_depth
becomes3
. - Records with hash
000
and001
stay in the original bucket, while records with010
and011
move to the new bucket.
This method ensures that the hash table dynamically adjusts to growing data without requiring manual resizing.
When all buckets in the directory are full, the directory size needs to be doubled to accommodate new buckets.
- The directory size is doubled by copying the existing structure.
- For each bucket in the original directory, add a duplicate entry to the new slots.
- Increment the
global_depth
to reflect the increased number of bits used for hashing.
This method ensures that the hash table can handle more buckets and supports future splits.
Computes the load factor, which is the ratio of total records to the total number of buckets. A high load factor indicates efficient bucket usage, while a low value suggests underutilization.
Searches for a product by ProductID
and provides options to Update or Delete the product. This makes search operations interactive and intuitive.
- Calculates the hash value and identifies the corresponding bucket.
- Scans the bucket for a matching
ProductID
. - Prints the record if found; and then shows the menu options for update and delete. helper methods:
update_record(record)
anddelete_record(bucket, record)
do the rest of the work. the user has exiting option from the menu.
Updates the details of an existing product. Allows users to modify specific fields while retaining existing values if no new input is provided.
Deletes a product record from its corresponding bucket. Ensures the record is correctly removed without affecting the rest of the data.
Prints the current structure of the hash table, including:
- Global Depth
- Each bucket’s Local Depth and stored records.
This is useful for debugging and understanding the internal state of the hash table.
Extendible hashing ensures:
- Scalability: The hash table grows dynamically as data increases.
- Collision Handling: Splits buckets instead of resizing the entire table, minimizing disruptions.
- Resource Efficiency: Uses memory only when needed, unlike traditional hash tables with fixed sizes.
- Bucket Overflow: Triggers the
split_bucket()
method, which may also triggerdouble_directory()
if the global depth needs to be increased. - Dynamic Adjustment: Ensures the hash table remains efficient even with unpredictable data sizes.
ExtendibleHashingProject/
│
├── extendible_hashing/
│ ├── __init__.py # Package initializer
│ ├── bucket.py # Bucket class
│ ├── hashing.py # Extendible Hashing class
│
├── data/
│ └── demo_input.json # Input data in JSON format
│
├── main.py # Entry point to run the program
├── requirements.txt # Dependencies file
├── README.md # Project documentation
-
Clone the Repository:
git clone https://github.com/MehediHasan-ds/Extendible-hashing-implementation.git cd repository-name
-
Create a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Input Data:
- Place the product records in the
data/products.json
file in the following format:[ {"ProductID": 101, "ProductName": "Steel Rod", "Category": "Construction", "Quantity": 500}, {"ProductID": 202, "ProductName": "Copper Wire", "Category": "Electronics", "Quantity": 300} ]
- Place the product records in the
-
Navigate to the Main Directory:
cd src
-
Run the Application:
python main.py
-
Test the Application:
pytest ../tests
Feel free to open an issue or submit a pull request for improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.