Centering in normalized cross correlation for template matching

Question

Context

I'm following Lewis (1995) exposition on normalized cross correlation for template matching (Section 2).

The cross-correlation of the image and the feature at $u,v$ is denoted by $c(u,v)$ and defined as $$ c(u,v) = \sum_{x,y} f(x,y)t(x-u,y-v) $$ where $f$ is the image and the sum is over $x$, $y$ under the window containing the feature $t$ positioned at $u$, $v$. There are several disadvantages to using $c(u,v)$ for template matching:

If the image energy $\sum_{x,y}f^{2}(x,y)$ varies with position, matching can fail. For example, the correlation between the feature and an exactly matching region in the image may be less than the correlation between the feature and a bright spot.
The range of $c(u, v)$ is dependent on the size of the feature.
$c(u,v)$ is not invariant to changes in image amplitude such as those caused by changing lighting conditions across the image sequence.

The normalized cross correlation $\gamma(u,v)$ overcomes these difficulties by by normalizing the image and feature vectors to unit length, yielding a cosine-like correlation coefficient.$$\gamma(u,v) = \frac{\sum_{x,y}[f(x,y)-\bar{f_{uv}}][t(x-u,y-v)-\bar{t}]}{\{\sum_{x,y}[f(x,y)-\bar{f_{uv}}]^{2}\sum_{x,y}[t(x-u,y-v)-\bar{t}]^{2}\}^{0.5}}$$

The actual question(s)

I understand why working with normalized feature and image vectors is useful and yields a well-behaved cosine-like measure of similarity. But I'm having a hard time understanding why $\gamma(\cdot)$ normalizes the demeaned feature and image vectors instead of the feature and image vectors themselves. Would $\gamma\prime(u,v)$ as defined below have desirable template matching properties? Is the centering operation changing the angles of the feature and image vectors? $$\gamma\prime(u,v) = \frac{\sum_{x,y}[f(x,y)][t(x-u,y-v)]}{\{\sum_{x,y}[f(x,y)]^{2}\sum_{x,y}[t(x-u,y-v)]^{2}\}^{0.5}}$$

References

Lewis, J. P. (1995). Fast Normalized Cross-Correlation http://scribblethink.org/Work/nvisionInterface/nip.pdf

score 3 · Accepted Answer · answered Jul 26 '22 at 18:15

This is similar to the difference between Pearson correlation and cosine similarity.

As explained here for example, the Pearson correlation is the cosine similarity between two demeaned vectors. So the normalized cross-correlation that you show is related to a Pearson correlation, while your proposal is related to a more general cosine similarity.

The advantage of demeaning is removing influence from overall levels. To illustrate with a simple example, generate two (ideally) uncorrelated vectors from a standard normal distribution (mean 0, standard deviation 1) in R.

set.seed(101)
f0 <- rnorm(100)
t0 <-rnorm(100)

Define a function to do the cosine similarity (no demeaning), and compare against the Pearson correlation (cor() function):

cossim <-function(x,y) sum(x*y)/sqrt(sum(x^2)*sum(y^2))
cor(f0,t0)
# [1] 0.1078112
cossim(f0,t0)
# [1] 0.1093093

These aren't exactly 0, due to random sampling.

Now just add 4 units to both of these poorly correlated vectors (by either measure) and see what happens.

f4 <- f0 + 4
t4 <- t0 + 4
cor(f4,t4)
# [1] 0.1078112
cossim(f4,t4)
# [1] 0.9499962

Demeaning keeps the Pearson correlation at its original value despite the shifts in overall levels, but you now find almost perfect cosine similarity without demeaning.

Centering in normalized cross correlation for template matching

1 Answers1