Skip to content

A fixed-size hashmap that works with primitive Python data types using separate chaining.

Notifications You must be signed in to change notification settings

Ornithologist/hashmap_with_chaining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hashmap_with_chaining

A fixed-size hashmap using only primitive Python data types.

Basic

An implementation of fixed-size hashmap based on Python primitive types, built-in hash function1, and separate chaining -- heavily inspired by MIT opencourseware2 and Java's Hashmap implementation3.

The internal data structure carrying the information of the Hashmap is an array varialbe called table. Hashmap prehashes argument key given by user with Python's built-in hash function1 into an numeric value. It then hashes this value using a simple hash function (see below) to get the idx of the slot to be inserted into table. table has length equals to capacity.

Sizing

The capacity of the hashmap is determined using the user-passed-in argument size. More specifically, capacity is the smallest power of 2 that is bigger than or equal to size.

Such implementation avoids initiating unnecessarily large table. Meantime it controls the collision rate to make search faster.

Hashing and Chaining

Prehashing is done using Python's built-in hash function1 to convert string key to integer. Hashmap then convert this integer to the final hash value idx by integer_value mod capacity. idx is used later-on to insert the (key, value) pair to table.

We use separate chaining to solve collision. When key foo and "foo3* collide with each other, we append foo3 to the list where foo was inserted to. table maintains capcity number of such lists.

How to Use

Build

To build, use:

make hashmap.pyc

to compile the pyc file for external usage.

You may want to run:

make clean

beforehand to remove the pyc file from previous compilation.

Use

Below is a brief example of the basic usage of hashmap:

from hashmap import Hashmap

my_hash_table = Hashmap(15)

# set
my_hash_table.set("foo", "bar")
my_hash_table.set("foo2", 2")

# get
bar = my_hash_table.get("foo")      # "bar"

# overwrite
my_hash_table.set("foo", "hoe")

# delete
hoe = my_hash_table.delete("foo")   # "hoe"

# load
load_factor = my_hash_table.load()  # 0.0625

Test

To test, use

make check

, or simply type make to run the test script.

Limitation

Depending on the system, the collision of keys may vary. For example, on 64-bit system, when size is 3, "foo" and "foo3" collides. As a potential improvement, the prehashing step described in the "idx" function of hashmap.py may change to other hashing mechanism.

This implementation is done with fixed-size hashmap in mind. In the situation where fixed-size is no longer a hard requirement, we can further improve the search performance of this Hashmap by using open addressing. However, user needs to specify a load_factor argument to prevent clustering.

For example:

my_hash_table = Hashmap(30, 0.75)

, or a default load_factor would be given.

References

About

A fixed-size hashmap that works with primitive Python data types using separate chaining.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published