A fixed-size hashmap using only primitive Python data types.
An implementation of fixed-size hashmap based on Python primitive types, built-in hash function1, and separate chaining -- heavily inspired by MIT opencourseware2 and Java's Hashmap implementation3.
The internal data structure carrying the information of the Hashmap is an array varialbe called table. Hashmap prehashes argument key given by user with Python's built-in hash function1 into an numeric value. It then hashes this value using a simple hash function (see below) to get the idx of the slot to be inserted into table. table has length equals to capacity.
The capacity of the hashmap is determined using the user-passed-in argument size. More specifically, capacity is the smallest power of 2 that is bigger than or equal to size.
Such implementation avoids initiating unnecessarily large table. Meantime it controls the collision rate to make search faster.
Prehashing is done using Python's built-in hash function1 to convert string key to integer. Hashmap then convert this integer to the final hash value idx by integer_value mod capacity
. idx is used later-on to insert the (key, value) pair to table.
We use separate chaining to solve collision. When key foo and "foo3* collide with each other, we append foo3 to the list where foo was inserted to. table maintains capcity number of such lists.
To build, use:
make hashmap.pyc
to compile the pyc file for external usage.
You may want to run:
make clean
beforehand to remove the pyc file from previous compilation.
Below is a brief example of the basic usage of hashmap:
from hashmap import Hashmap
my_hash_table = Hashmap(15)
# set
my_hash_table.set("foo", "bar")
my_hash_table.set("foo2", 2")
# get
bar = my_hash_table.get("foo") # "bar"
# overwrite
my_hash_table.set("foo", "hoe")
# delete
hoe = my_hash_table.delete("foo") # "hoe"
# load
load_factor = my_hash_table.load() # 0.0625
To test, use
make check
, or simply type make
to run the test script.
Depending on the system, the collision of keys may vary. For example, on 64-bit system, when size is 3, "foo" and "foo3" collides. As a potential improvement, the prehashing step described in the "idx" function of hashmap.py may change to other hashing mechanism.
This implementation is done with fixed-size hashmap in mind. In the situation where fixed-size is no longer a hard requirement, we can further improve the search performance of this Hashmap by using open addressing. However, user needs to specify a load_factor argument to prevent clustering.
For example:
my_hash_table = Hashmap(30, 0.75)
, or a default load_factor would be given.