-
Notifications
You must be signed in to change notification settings - Fork 85
Hashing Algorithms
The average hash works on the Y(Luma) component of the YCbCr color model. First the image is rescaled and the luma value calculated.
Y = R * 0.299 + G + 0.587 + B * 0.114
. if the luma of the current pixel is higher or smaller than the average luminosity the bit of the hash is set.
This is a quick and easy operation but might get you in trouble once the luminosity of the image shifts.
The Average kernel hash simply performs an additional filter pass with an arbitrary kernel. By default a box filter is applied.
The Average Color Hash computes the grayscale value for each pixel V = (R + G + B) /3
. The same approach as the AverageHash in version 1.x.x. Relying on the actual color values makes this hash vulnerable against color changes abd it usually performs worse than the luminosity based average hash.
The MedianHash compares the luminosity value against the median value. This guarantees each has to be 50% consistent out of 0 and 1 bits but eliminates outliers which may or may not be usable to differentiate images.
Algo / Resolution | 2^6 = 64 bit | 2^8 = 256 bit | 2^12 = 4096 bit | 2^18 = 262144 bit |
AverageHash | ||||
AverageKernelHash | ||||
MedianHash | ||||
AverageColorHash |
The difference hash calculates the gradient in the image. As gradients are depended on the direction of the scanning it's a good idea to take additional dimensions into account.
- DoublePrecision will double the resulting hashSize but additionally accounts for top to bottom gradients
- TripplePrecision will triple the resulting hashSize but additionally accounts for diagonal gradients
While additional precision will increase the amount of information a 64 bit simple precision usually performs better than a 32 bit double precision hash.
Algo / Resolution | 2^6 = 64 bit | 2^8 = 256 bit | 2^12 = 4096 bit | 2^18 = 262144 bit |
Top/Bottom | ||||
Left/Right | ||||
Diagonal* |
- The images are still of version 1.X.X. The diagonal hash got altered to correctly handle the offset at the corner of the image.
The image is rescaled as usual and the discrete cosine transformation of the lum values are calculated. Bits are assigned depending on if the coefficient of the dct is greater or smaller than the average. Due to the fact that the first few values are outlier and the lower half of the dct matrix represents features which are not visible to the human perception (the same reason why jpg compression works) the hash is calculated only on a subset of the data.
Due to the cosine structured values a great part of the hash is very likely to be similar in the vast majority of the images. The usual range of normalized hamming distances for pHash is smaller than than [0-1]
and usually occupies values from [0-.5]
(no hard evidence just observations). While the bits are not optimally used the differentiation between different types of images is outstanding and pHash represents a very good hashing algorithm.
Due to computing the dct this algorithm is slower than aHash or dHash.
Algo / Resolution | 2^6 = 64 bit | 2^8 = 256 bit | 2^12 = 4096 bit | 2^18 = 262144 bit |
Perceptive Hash |
Similar to the perceptive hash this algorithm uses a discrete cosine transformation to extract the frequency of the image. Additionally the following steps are used:
- Precomputation -> resize and extract Y luma component
- Ring parition. Rotate pixels around the center and fit it into a bucket
- Sorting the lum values of each bucket increasingly and compute a 1d dct transformation. Compute hash by comparing the values to the mean of each bucket. (Multiple bits per bucket)
Again only a subset of the dct matrix is used due to the same constraints as the original pHash. RotPHash shows the same behaviour as it's brother. The hamming distance usually is on the lower range.The more pixels are present in a bucket the more bits can be computed and extracted from said collection of pixels.
The sorting step eliminates the order of the pixels and therefore gets robust against rotated images. The gray pixels at the outside currently get discarded but it could be possible to assume missing information or simply fill it with a default value to include all information.
Works identical RotPHash. The values are computed in buckets but now the average of each bucket is computed and compared to the next bucket. This allows for a quick and good computation for small bit resolutions but requires 1 bucket per bit, meaning this hash will scale badly (performance wise) for higher bit resolutions. Additionally outer buckets contain more pixel than their inner counterparts but still only contribute to 1 pixel meaning that the rot average hash suffers from pixels further away from the center being not weighted as much as central pixels.
The HOG (Histogram of gradients) is a feature descriptor traditionally used in machine learning to identify shapes. The concept can be found in the paper Histograms of Oriented Gradients for Human Detection by Navneet Dalal and Bill Triggs.
The hog works similar to dHash by calculating gradients and mapping these to buckets related to the angle of the gradients. The following image shows the features of normalized hog values calculated for the test image.
The hog hash still is experimental, while it does allow to differentiate between different images, the computation cost does to justify it's poor performance. The original hog descriptor has 4k+ byte features which are much much more information than our usual 64 bit! hash. I'll have to see how the hog features can be encoded into a significant hash value. This is a work in progress hash.