Fix Difference of Gaussians prefilter #21
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The panic in the original blockhash
gauss_preproc()
routine was patched in 2d203d6 to not panic, copying the behavior of the original code when run in release mode/with overflow assertions disabled.Both the original code and the patched code are unfortunately not correct and result in very destructive preprocess pass that hurts rather than helps the blockhash algorithm pass (or that of any other image algorithm). The code documents what is supposed to happen with a link to the Wikipedia article on the Difference of Gaussians which describes the purpose of this filter; specifically when using gaussian blur with a ratio of kernels K2:K1 approximately equal to 1.6, you get a fast approximation of a Laplacian of Gaussian transform, used for object detection (in plain English: it gives you edge/outline detection).
The Wikipedia article has an example of what the DoG filter is supposed to look like; here it is reproduced below with the starting image and the image after the DoG filter has been applied:
Original:

Difference of Gaussians Reference:

The idea is that you blur the image twice, the second blur being stronger than the other. Each blur is effectively a low-pass filter that lets low-frequency components (smooth areas of the image) through up to the Gaussian kernel size. Taking the difference of two low-pass filters gives you a band-pass filter that lets through the components of the image that were present in the second image but not the first, and if the kernel value is chosen correctly for the input image, it can be used to approximate areas of certain frequency (what a Laplacian transform would do).
But even all the math aside, if you think about the actual pixel components of the image, you have 0 being black and 255 being white (for a 1-channel greyscale image). If you want the difference between two images, what you really want is to selectively capture something found in one image but not the other. If you perform a wrapping subtraction, you are destroying that information because when something isn't found in one image, you are wrapping around past the zero (which would indicate not found) all the way to 255 (indicating strongly found). So it stands to reason that you can only use an absolute difference or saturating subtraction operation here, never a wrapping one.
Back to the math: the Wikipedia image isn't annotated with what Gaussian kernel values were used, but as an approximation I have run some tests with the different possible implementations of the
diff_inplace()
using kernel sizesK1=2
andK2=3.2
to demonstrate why a wrapping subtraction does not make sense here, and to show that the correct choice is a saturating subtraction operation.Wrapping Subtraction (current behavior):

The image above shows the current behavior, normalized to values between 0 and 255, using the following logic:
Absolute Difference:

This image shows the inverse of the absolute difference, normalized to values between 0 and 255:
(Since this is an absolute difference, it does not matter whether you subtract rhs minus lhs or lhs minus rhs.)
Saturating Subtraction (LHS minus RHS):
There are two possible options here, the original code subtracted rhs from lhs (
lhs.wrapping_sub(rhs)
), so here's what it looks like with the same but using saturating subtraction instead (again, pixel values normalized from 0 to 255 to darken the result):The code for this is
Saturating Subtraction (RHS minus LHS):
The second option, to match the logic above (subtracting the lower Gaussian kernel size image from the higher Gaussian kernel size image), i.e. subtracting lhs from rhs, again normalized to 0-255:
The code for this is what's in the PR:
You'll notice that I am always posting the inverse of the operation (255 minus the expression) instead of just the expression. This is because 0 is black and 255 is white, so when you subtract normally you end up with the common area black, but by convention the DoG or Laplacian transform keeps the common area white and the difference in black. Whether you invert the result of the subtraction or not, the subsequent blockhash will work, but if you don't invert then you have the blockhash (or whatever image algorithm, really) of a black image will match the DoG-preprocessed image hash of a white image. It seemed to me that sanity should prevail over performance and a
u8
inversion is such a cheap operation compared to the actual image hash algorithm that follows (or any resizing or blurring that takes place beforehand) that I believe it would be stupid not to invert the result.I hope it is clear from the images above that the original code never handled the Difference of Gaussians preproc filter correctly. As you can see just by looking at the results, the wrapping difference doesn't make any sense to use here even if you ignore the reference Wikipedia image. (Also, I looked for prior art and OpenCV does a saturating subtraction over the pixel values in general for the operation
img1 - img2
regardless of the algorithm or filter being applied.)This is obviously a breaking change, but I think it is fair and I wouldn't even have to think too hard about releasing this in a semver-compatible minor update since the old code was so broken you would never actually get a sane blockhash out of the result (but releasing it as 2.1 probably makes the most sense!). I'm also going to open a separate PR to change the default Gaussian kernel sigma values, as they don't provide the correct ratio to mimic a Laplacian of Gaussian transform.
One final note: as I noted, the images I posted have been normalized to have their pixel values fall between 0 and 255. If you don't do this, some inputs will have lighter overall outlines than other images, but in general the result will be very faint. This does not necessarily affect the subsequent image hash operation because many work on relative values (grey is still darker than the white background), but it is something to consider. The code for normalizing a
Greyscale
image follows, in case you want to consider applying this normalization as a second step after the Difference of Gaussians:For reference and posterity, here's what the correct algorithm produces when the output isn't normalized:
