Find better image comparison algorithm #24

kkaefer · 2015-05-19T14:24:28Z

We're currently using ImageMagick and the MAE (mean absolute error) metric to compare images. However, we often get false positives, as well as false negatives from images that appear visually identical, but aren't actually equal. It's tricky to find a good balance since we have seemingly contradictory requirements:

Fail when a small dot is missing in an otherwise blank image (= size of surrounding image shouldn't matter)
Fail when a fill color is off by one (even if it's not apparent to the human eye)
Don't fail when antialiasing is slightly different, but still appears antialiased

The text was updated successfully, but these errors were encountered:

kkaefer · 2015-05-19T14:39:17Z

Here are a few examples:

Outcome	Expected	Actual
false positive
false negative
false negative
false positive
true positive
false positive

mourner · 2015-10-08T16:29:56Z

@jfirebaugh I just tried https://huddle.github.io/Resemble.js/ 2 image comparison demo on the sample images above and the results look quite good! There can be some false negatives if we turn on "ignore antialiasing" option (e.g. the third example), but otherwise this can be a much better metric than what we use currently.

jfirebaugh · 2015-10-08T19:31:19Z

Looks promising but it appears to be a browser-only module (requires document.createElement('canvas')).

mourner · 2015-10-08T21:03:02Z

@jfirebaugh but PR to use node-canvas, or even simply work on raw image data array should be pretty straightforward right?

mourner · 2015-10-13T09:04:18Z

Using a stripped version of Resemble.js might work, but I'm really interested in checking out http://pdiff.sourceforge.net/, which sounds pretty amazing, and the source code doesn't look too complicated so it can be ported to JS.

During regression testing of a renderer, hundreds of images are generated from an older version of the renderer and are compared with a newer version of the renderer. This program drastically reduces the number of false positives (failures that are not actually failures) caused by differences in random number generation, OS or machine architecture differences.

mourner · 2015-10-13T09:19:51Z

Another tool to check out which seems better maintained and more feature-rich than Resemble.js: https://github.com/yahoo/blink-diff

mourner · 2015-10-13T11:10:28Z

The more I look into this, the more I get convinced that we should write our own simple JS tool for image comparison.

ImageMagick isn't well suited to rendering comparisons like this (e.g. no way to account for antialiasing), and it's impossible to customize for our need beyond its command-line parameters (unlike JS libraries)
Resemble.js is very simple but DOM-oriented and the quality of code is relatively poor
Yahoo blink-diff seems well thought-out but is the worst example of Java developers writing JavaScript code — the project structure is terrible, there are abstractions on top of abstractions and the code is extremely verbose

Our own tool would probably be under 200 lines of code, extremely simple and focused (just a diff between two image data arrays without much fluff), and give us the freedom to experiment with the right metrics that suit our contradicting needs. I want to make a proof of concept today.

mourner · 2015-10-13T16:00:37Z

OK, hacked around a few hours and wrote a new library: https://github.com/mapbox/pixelmatch
It's the same algorithm as Blink-diff, including ignoring antialiased pixels, but about 60 (!!!) lines of code and much faster and simpler. The output is pretty good (red is mismatch, yellow is ignored antialiasing).

I think this will already be a big improvement over the current ImageMagick approach. But to make tests much more precise and reduce false positives/negatives, I believe we should handpick matching tolerance values for each individual test (or at least for tests that are false positives/negatives).

jfirebaugh · 2015-10-13T23:55:51Z

This looks very promising. Can you make a PR that integrates pixelmatch into the test harness?

hastebrot · 2015-10-19T10:01:02Z

Thanks @kkaefer for the false positives/negatives examples and thanks @mourner for pixelmatch. This is very useful and insightful.

I simply use the difference blend mode to test my map renderer in JavaFX. This repository and the test suite of the amazing Dolphin emulator [1] were an inspiration to use images to test the renderer. A few months ago I found a blog post [2] that describes some of the problems with distance metrics in image diff'ing and provides examples. This blog post was a good start for a basic understanding of the problems with image diff'ing. But to make it more robust I really need something better than just using blend modes; it needs a distance metric based on the YUV color space/system (or maybe CIELAB or CIELUV) and support to differentiate anti-aliased pixels.

[1] https://dolphin-emu.org/blog/2015/01/25/making-developers-more-productive-dolphin-development-infrastructure/
[2] http://jeffkreeftmeijer.com/2011/comparing-images-and-creating-image-diffs/

mourner · 2015-10-19T10:44:01Z

@hastebrot pixelmatch now calculates difference using YUV distance, and has a pretty good anti-aliasing detection algorithm based on a paper — try it out! The code is short and well-commented so should be easy to port to any language.

mikemorris mentioned this issue May 19, 2015

Use node-mapnik for test image fixture comparision cutting-room-floor/node-mapbox-gl-native#100

Closed

kkaefer mentioned this issue Jun 8, 2015

Use multiple reference images #25

Closed

mourner added the enhancement label Oct 13, 2015

mourner added a commit that referenced this issue Oct 16, 2015

Use pixelmatch for comparisons, close #24

5323bfa

mourner added a commit that referenced this issue Oct 16, 2015

Use pixelmatch for comparisons, close #24

ee76213

mourner added a commit that referenced this issue Oct 16, 2015

Use pixelmatch for comparisons, close #24

ce801ba

mourner mentioned this issue Oct 16, 2015

New awesome image comparison using pixelmatch #50

Merged

2 tasks

jfirebaugh closed this as completed in #50 Oct 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find better image comparison algorithm #24

Find better image comparison algorithm #24

kkaefer commented May 19, 2015

kkaefer commented May 19, 2015

mourner commented Oct 8, 2015

jfirebaugh commented Oct 8, 2015

mourner commented Oct 8, 2015

mourner commented Oct 13, 2015

mourner commented Oct 13, 2015

mourner commented Oct 13, 2015

mourner commented Oct 13, 2015

jfirebaugh commented Oct 13, 2015

hastebrot commented Oct 19, 2015

mourner commented Oct 19, 2015

Find better image comparison algorithm #24

Find better image comparison algorithm #24

Comments

kkaefer commented May 19, 2015

kkaefer commented May 19, 2015

mourner commented Oct 8, 2015

jfirebaugh commented Oct 8, 2015

mourner commented Oct 8, 2015

mourner commented Oct 13, 2015

mourner commented Oct 13, 2015

mourner commented Oct 13, 2015

mourner commented Oct 13, 2015

jfirebaugh commented Oct 13, 2015

hastebrot commented Oct 19, 2015

mourner commented Oct 19, 2015