Skip to content

Latest commit

 

History

History
92 lines (66 loc) · 3.5 KB

README.md

File metadata and controls

92 lines (66 loc) · 3.5 KB

ruby-seeklib

FFI bindings to allow use of seeklib with ruby. seeklib is basically an alternative to pHash/pHashion. See here for background:

http://hackerlabs.github.io/blog/2012/07/30/organizing-photos-with-duplicate-and-similarity-checking/

Usage

API

Seeklib.path_sig: takes a path to an image, and produces an imgseek fingerprint.

Seeklib.calc_diff: takes two imgseek fingerprints and returns the difference score.

Allowed image types are any that CImg handles (IIRC jpg, gif, png, bmp, and maybe a couple others), or any that ImageMagick handles (basically all of them) if you have the ImageMagick library installed (CImg will call out to ImageMagick if it can't read an image itself).

It's also possible to add support for producing a fingerprint from a blob of memory containing image data. I can add this if anyone actually wants it, but I've more or less abandoned this project for the moment.

Caveats:

  1. Note that the difference score is not as straightforward as the one generated by pHash. The score is a negative integer. The more negative, the more similar the images are. Personally, I find this annoying to think about, so I'll add 50 to the result.

  2. The score resulting from comparing two identical images is undefined. If this is possible in your application, consider supplementing your comparison with good old-fashion sha1 or something.

  3. It's not clear what the max difference score is. You should probably look here if you really need to know, and read the authors' original paper.

http://grail.cs.washington.edu/projects/query/

Example

require 'seeklib'
bss = SeekLib.path_sig 'squares/blue-square.jpg'    # what it sounds like
gss = SeekLib.path_sig 'squares/green-square.jpg'   # pretty much same
rss = SeekLib.path_sig 'squares/red-square.jpg'     # a bit different

SeekLib.calc_diff(bss, gss) + 50           # => 18.641717640766657
SeekLib.calc_diff(gss, bss) + 50           # => 18.641717640766657 (thank god)

SeekLib.calc_diff(bss, rss) + 50           # => 46.336210565558794 (whoa!)
SeekLib.calc_diff(gss, rss) + 50           # => 46.169410374857364 (similar)

SeekLib.calc_diff(bss, bss) + 50           # => 17.969999238848686
SeekLib.calc_diff(gss, gss) + 50           # => 17.969999238848686 (wait, diff(a, a) undefined?)
SeekLib.calc_diff(rss, rss) + 50           # => 9.8899996727705 (yep, unfortunately)

Getting

I've finally published it (well, the hacked-up 'installable' branch) so you can do gem install seeklib. It may even work!

Background discussion

TODO: Add more. Briefly, I was working on a project where I really, really needed a good, fast perceptual hash. pHash, though excellent, was nowhere near adequate in terms of speed. The fact that the fingerprint was ~100x bigger didn't really matter to me (pHash fingerprints are only 8 bytes). The main thing preventing other people from using the imgseek fingerprint is that it was embedded in a mass of ancient code that I doubt anyone wanted to touch. Original is here:

https://github.com/ricardocabral/iskdaemon/blob/master/src/imgSeekLib/imgdb.cpp

No offense intended to Ricardo, who I'm sure has better things to do than clean it up.

It worked beautifully for me, and I made a number of improvements to its performance (which was critical for my project), particularly switching from ImageMagick to CImg. The project is currently stalled, So I figured I'd release it for the next guy who needs a fingerprint library (girls may use it too; I'm not particular).