2

I recently was asked to help characterize some surfaces (raised ink dots on paper) that were scanned with a profilometer, yielding height data across a grid of x & y coordinates. The critical question was to characterize the average dot height. The data were rather noisy as the paper was not perfectly flat, so I played around with various approaches to removing this noise and came upon an approach that appears to do very well (to the eye at least). See below for pseudocode and R code. The logic is that every row and column definitely has an area with no dots on it, so ideally the minimum for each row and column should be around zero. However, one obtains different surfaces depending on whether row minimums are shifted to zero or whether column minimums are shifted to zero. To solve this ambiguity, I simply do both and average the results. I then noticed that repeating this process many times appeared to converge on a very nicely de-noised final result. Is there a name for this approach or something similar?

#Pseudocode ("data" is a RxC matrix of height values)
repeat the following 1000 times:
    create a copy of data, call it "copy1"
    for every row of height data in copy1:
        subtract the minimum of that row from each value in the row (i.e. shift the row's data so that the minimum is zero)
    create a second copy of data, call it "copy2"
    for every column of height data in copy2:
        subtract the minimum of that column from each value in the column (i.e. shift the column's data so that the minimum is zero)
    replace data with the cell-wise average of copy1 and copy2

_

#R-code (data is in "a")
for(i in 1:1e3){
    c1 = a
    for(i in 1:ncol(c1)){
        c1[,i] = c1[,i]-min(c1[,i])
    }
    c2 = a
    for(i in 1:nrow(c2)){
        c2[i,] = c2[i,]-min(c2[i,])
    }
    a = (c1+c2)/2
}
Mike Lawrence
  • 13,793
  • Any chance for a "before" and "after" visualizations? I'm really not sure (even after going as far as trying your algorithm on some artificial data) what do you mean by "denoising" here? – AVB Mar 18 '11 at 23:41
  • This procedure is part of the "median polish" and "mean polish" families; you might dub it "minimum polish." However, I share babelproofreader's concerns. Undulations in paper would not be considered "noise" by most people, but rather as a secular underlying trend. Its identification and removal require a completely different treatment. – whuber May 04 '12 at 14:15

1 Answers1

2

I think that there might be a conceptual problem with this approach. If your piece of paper is not flat it is possible that a "kink" in the paper at a point with no ink dot might be higher than surrounding areas with ink dots. The proposed algorithm might inadvertently average away the very points of interest. Also "I then noticed that repeating this process many times" might introduce spuriousness due to the Slutsky Yule Effect. Might I suggest that you use more specialised approaches e.g. consider your data to be an image and use the relevant tools from a package such as this?