Skip to content

Commit

Permalink
Don't fail on paths that can't be processed for some reason (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
mk-fg committed Jan 16, 2013
1 parent d11858b commit aa7a272
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 10 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,15 @@ Warning
--------------------

As illustrated in
[#1](https://github.com/mk-fg/image-deduplication-tool/issues/1) and the "Known
Issues" section, libpHash and/or CImg (which it's based on) can do quite strange
and unexpected things when presented with non-images up to the point of
executing malicious code from the filename in the shell.
[#1](https://github.com/mk-fg/image-deduplication-tool/issues/1) and
[CImg#49](https://sourceforge.net/p/cimg/bugs/49/), libpHash/CImg will fall back
to using "sh -c" commands for non-image file formats and might not get
filename-escaping correct there (especially with CImg versions up to 1.5.3).

Simple safeguard for that would be only to run the tool on image paths, not
paths that contain mixed-type files.
paths that contain mixed-type files, or at least make sure there's no funky
stuff in the filenames - checking that in the script is wrong, as proper image
files work with any filename.

One other precaution is that with --feh option, script will run "feh" program,
and --feh-args parameter may contain options (e.g. --info) that will be executed
Expand Down
14 changes: 9 additions & 5 deletions image_matcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


import itertools as it, operator as op, functools as ft
import os, sys, ctypes, pickle, select, signal, struct
import os, sys, ctypes, errno, pickle, select, signal, struct


class pHash(object):
Expand All @@ -16,7 +16,8 @@ def dct_imagehash_async(self, path):
r, w = os.pipe()
pid = os.fork()
if not pid:
pickle.dump(self.dct_imagehash(path), os.fdopen(w, 'wb'))
with os.fdopen(w, 'wb') as dst:
pickle.dump(self.dct_imagehash(path), dst)
os._exit(0) # so "finally" clauses won't get triggered
else:
os.close(w)
Expand All @@ -26,7 +27,10 @@ def dct_imagehash(self, path):
phash = ctypes.c_uint64()
if self._lib.ph_dct_imagehash(path, ctypes.pointer(phash)):
errno_ = ctypes.get_errno()
raise OSError(errno_, os.strerror(errno_))
print( 'Failed to get image hash ({}): {}'\
.format(errno.errorcode[errno_], os.strerror(errno_)),
file=sys.stderr )
return None
return phash.value

def hamming_distance(self, hash1, hash2):
Expand Down Expand Up @@ -71,9 +75,9 @@ def sort_by_similarity(dcts):
log.debug('Calculating/sorting Hamming distances')
for img1, img2 in it.combinations(dcts.viewitems(), 2):
for path, h in img1, img2:
if h == 0:
if h == 0 or h is None:
if path not in paths_skipped:
log.debug('Skipping 0-hash path: {}'.format(path))
log.debug('Skipping no-hash path: {}'.format(path))
paths_skipped.add(path)
break
else:
Expand Down

0 comments on commit aa7a272

Please sign in to comment.