Skip to content

r-salas/oshash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oshash

OpenSubtitles Hash implementation.

This algorithm is focused on speed because unlike other algorithms, OSHash doesn't read the whole file. This makes it a perfect algorithm for hashing large files.

Installation

The latest stable release can be installed from PyPI:

$ pip install oshash

API usage

Simply import oshash and call oshash function with your file path.

import oshash

file_hash = oshash.oshash("/path/to/file")

Command usage

You can compute OSHash directly from the terminal.

$ oshash <file_path>

For example:

$ oshash /path/to/video.mp4
OSHash (/path/to/video.mp4) = d315edebf53a4af3

Comparison

Below we can see a small graph comparing the hashing speed (in seconds) of OSHash with other algorithms for two different files:

320p video (61.7 MB) 1080p video (339.4 MB)

You can create a comparison for any file with the following command:

$ python3 scripts/compare_algorithms.py <file_path>

If you want to view graphics, make sure you have matplotlib installed.

How It Works?

In pseudo-code, the hash is computed in the following way:

file_buffer = open("/path/to/file/")

head_checksum = checksum(file_buffer.head(64 * 1024))  # 64KB
tail_checksum = checksum(file_buffer.tail(64 * 1024))  # 64KB

file_hash = file_buffer.size + head_checksum + tail_checksum

You can read more in OpenSubtitles.org Wiki

Acknowledgements

Thanks to the OpenSubtitles.org team for this algorithm.