Noise leakage problem with least square estimation in the frequency-distance domain

Question

I have data $d$ recorded from an antenna of sensors. These data are composed of a Gaussian noise $n$ and a signal $s$ which I try to estimate. This signal propagates on the antenna with frequency dependent attenuation and velocity. Below is an illustration of such simulated data. We can observe that the signal comes from a source on the left and that the high frequencies are more attenuated.

Assuming that the propagation characteristics of the signal are perfectly known and noting $P$ the vector that propagates the frequency $f$ of the signal on the sensors. It is possible to obtain, frequency by frequency, the least squares estimate of the signal: $\hat{s}(f)=P(P^HP)^{-1}P^Hd(f)$ and then return to the time domain. The result of the estimation applied to the previous data is the following.

The result is quite good except on the first sensor where the noise is estimated. That is to say that there is a leakage of noise in the estimation of the signal. The origin of the phenomenon comes from the strong attenuation of high frequencies. Indeed, the leakage of noise on the first sensor minimizes the residual and, because of the attenuation, does not degrade the one of the following sensors.

My question is how to modify the signal estimation so that I don't have this leakage problem anymore. I have tried to modify the norm and/or regularize the solution but without success.

EDIT: Mathematical model : $d(f)=s(f)+n(f)=s_0(f).P(f)+n(f)$

$d(f)$: Fourier transform of the data recorded by the $N$ sensors at frequency $f$ (vector of size $N$x$1$)
$n(f)$: Fourier transform of the Gaussian noise on the $N$ sensors at frequency $f$ (vector of size $N$x$1$)
$s(f)$: Fourier transform of the signal on the $N$ sensors at frequency $f$ (vector of size $N$x$1$)
$s_0(f)$: signal generated by the source at the frequency f (complex scalar of size $1$x$1$)
$P(f)$: Propagator from the source to the position of the sensors at frequency f (vector of size $N$x$1$), s.a. $s(f)=s_0(f).P(f)$

The goal is knowing $d(f)$ and $P(f)$ is it possible to estimate $s(f)$ (without leakage of $n(f)$ in the estimation)?

Below is the MATLAB code used:

N = 10 ; % Number of sensors
M = 1000*5 ; % Number of samples
Delta_x = 10 ; % Distance between sensors
Delta_t = 0.001 ; % Sampling rate
x = (0:N-1).'.*Delta_x ; % positions of sensors
% frequency
Fs = 1/Delta_t ;
dF = Fs/M ;
M_pos = ceil((M+1)/2) ; % Number of "positive" frequencies
f_pos = (0 : M_pos-1)./(Delta_t*M) ;
% Velocity model
vp_f = 25.*(1+sqrt(1+(f_pos./100).^2)) ; % velocity model
% Attenuation model = Gain between sensors at each frequency
%G_f = 0.9.*ones(1,M_pos) ; % Constant
G_f = exp(-sqrt(0.01+(f_pos/100).^2)) ; % High frequencies are more attenuated
% Wavelet generation:
[b,a] = butter(1,10/(Fs/2)) ;
s0 = filter(b,a,randn(M,1)) ;
% Data generation in the frequency-distance domain
FT_s0 = fft(s0) ;
FT_s = zeros(M,N) ;
for ii = 1 : M_pos
    P_f = (G_f(ii).^(0:N-1).').exp(-1i2pi()f_pos(ii)/vp_f(ii)).^x ;
    FT_s(ii,:) = FT_s0(ii).*P_f ;
end
s = ifft(FT_s,'symmetric') ; % back to time domain
% Add noise
n = 0.05.*randn(M,N) ;
d = s+n ;
scale = Delta_x./(2*max(s(:))) ;
figure() ;
subplot(1,3,1) ; w_plot(d, Delta_t, Delta_x, scale) ; title('d: data (d=s+n)','Fontsize',32) ;
subplot(1,3,2) ; w_plot(s, Delta_t, Delta_x, scale) ; title('s: signal','Fontsize',32) ;
subplot(1,3,3) ; w_plot(n, Delta_t, Delta_x, scale) ; title('n: gaussian noise','Fontsize',32) ;
% Estimation
FT_d = fft(d) ;
FT_s_est = zeros(M,N) ;
for ii = 1 : M_pos
    P_f = (G_f(ii).^(0:N-1).').exp(-1i2pi()f_pos(ii)/vp_f(ii)).^x ;
    FT_s_est(ii,:) = (P_f((P_f'P_f)\P_f'))*FT_d(ii,:).' ;
end
s_est = ifft(FT_s_est,'symmetric') ;
figure() ;
subplot(1,3,1) ; w_plot(s, Delta_t, Delta_x, scale) ; title('s: signal','Fontsize',32) ;
subplot(1,3,2) ; w_plot(s_est, Delta_t, Delta_x, scale) ; title('signal estimation','Fontsize',32) ;
subplot(1,3,3) ; w_plot(s - s_est, Delta_t, Delta_x, scale) ; title('difference','Fontsize',32) ;
function w_plot(s, Delta_t, Delta_x, scale)
[M,N] = size(s) ;
x = (0:N-1).Delta_x ;
t = (0:M-1).Delta_t ;
FontSize = 32 ;
plot(scale*s+x,t,'k')
set(gca,'Ydir','reverse');
ylabel('t [s]','FontSize',FontSize) ; xlabel('x [m]','FontSize',FontSize) ;
ax = gca; ax.XAxis.FontSize = FontSize ; ax.YAxis.FontSize = FontSize ;
xlim([x(1)-Delta_x/2 x(end)+Delta_x/2])
end

I might be saying something stupid but in the least squares I know the formula would be $\hat{s}(f) = (P^{H}P)^{-1}P^{H}d(f)$ instead of $\hat{s}(f) = P(P^{H}P)^{-1}P^{H}d(f)$ — NokiYola, Mar 22 '23 at 13:18
@NokiYola $\hat{s}(f)=(P^HP)^{-1}P^Hd(f)$ represents the wavelet generated by the source it is a complex scalar ($P$ and $d(f)$ are of size [Nx1]). The signal recorded by the sensors corresponds to this wavelet propagated on the antenna, i.e. $\hat{s}(f)=P(P^HP)^{-1}P^Hd(f)$. In other words, $\hat{s}(f)=P.s_0, s. a. s_0=min||d(f)-s_0.P||_2$ — User327201, Mar 22 '23 at 14:04
Could you try to explain the Mathematical model? I don't get it. — Royi, Mar 25 '23 at 12:04
@Royi I have edited my question. I hope the model is clearer. — User327201, Mar 25 '23 at 15:52
What's the dot operator you use .? I am pretty sure your problem could be solved if the formulation were clearer. — Royi, Mar 28 '23 at 09:20
I'm not even close to an expert on the field but I'm wondering whether using frequency dependent regularisation would be of any help here. Maybe weighted least squares — ZaellixA, Mar 28 '23 at 10:13

Royi · Answer 1 · 2023-03-31T07:34:10.167

The model can be rewritten as:

$$ D = P S + N $$

Where $ D, P, N \in \mathcal{C}^{N \times F} $ where $ N $ is the number of sensors and $ F $ is the number of frequencies. The matrix $ S $ is basically $ \operatorname{Diag}\left( \boldsymbol{s}_{0} \right) $ where $ \operatorname{Diag} $ yields a diagonal matrix from a vector.

The least squares solution for $ S $ is given by:

$$ \hat{S} = {\left( {P}^{H} P \right)}^{-1} {P}^{H} D $$

Yet, this won't generate a diagonal matrix.
The optimization problem we're after is:

$$ \begin{align} \arg \min_{S} \quad & {\left\| P S - D \right\|}_{F}^{2} \\ \text{subject to} \quad & \begin{aligned} S & \in \mathcal{D}^{F} = \left\{ D \in \mathbb{C}^{F \times S} \mid {D}_{ij} = \begin{cases} \boldsymbol{d}_{i} & \text{ if } i = j \\ 0 & \text{ if } i \neq j \end{cases} \right\} \end{aligned} \end{align} $$

Namely, given that $ S $ is a diagonal matrix.
Since the set $ \mathcal{D}^{F} $ is convex this problem has a unique minimum value.

The closed form solution is:

$$ \boldsymbol{s}_{0} = {\left( \left( {P}^{H} P \right) \circ I \right)}^{-1} \operatorname{diag} \left( {P}^{H} D \right) $$

Where $ \circ $ is the Hadamard Product and $ \operatorname{diag} $ extracts the diagonal of a matrix and generates a vector.

Since $ \left( \left( {P}^{H} P \right) \circ I \right) $ is a diagonal matrix by itself you can basically solve this is $ F $ scalar divisions:

$$ {\boldsymbol{s}_{0}}_{i} = \frac{ {\operatorname{diag} \left( {P}^{H} D \right)}_{i} }{ {\operatorname{diag} \left( {P}^{H} P \right)}_{i} } $$

Deriving the Closed Form Solution

First, we can derive the gradient of the function:

Define $ S = \operatorname{Diag} \left( \boldsymbol{s} \right) $.
Define $ A = P S - D $.
Define $ A : A = \operatorname{Tr} \left( {A}^{T} A \right) $.
Namely the matrix trace inner product.
Implies $ f \left( A \right) = {\left\| A \right\|}_{F}^{2} = A : A $.

Now all needed is to build the gradient using the inner product:

$$ \begin{aligned} d f \left( A \right) & = 2A : A && \text{Differential of a linear operator} \\ & = 2A : P ds && \text{Differential of $P S - Y$ with respect to $S$} \\ & = 2 {P}^{H} A : dS && \text{Adjoint of the inner product} \\ & = 2 {P}^{H} A : \operatorname{Diag} \left( d \boldsymbol{s} \right) && \text{By definition} \\ & = 2 \operatorname{diag} \left( {P}^{H} A \right) : d \boldsymbol{s} && \text{Adjoint of the inner product} \end{aligned} $$

Hence the gradient with respect to $ \boldsymbol{s} $ is given by $ \frac{d f}{d \boldsymbol{s}} = 2 \operatorname{diag} \left( {P}^{H} A \right) = 2 \operatorname{diag} \left( {P}^{H} \left( P S - D \right) \right) $.

To find the optimal solution of this convex problem we find when the gradient zeros:

$$ \begin{aligned} \operatorname{diag} \left( {P}^{H} D \right) & = \operatorname{diag} \left( {P}^{H} P S \right) \\ & = \operatorname{diag} \left( {P}^{H} P \operatorname{Diag} \left( \boldsymbol{s} \right) \right) \\ & = \left( \left( {P}^{H} P \right) \circ I \right) \boldsymbol{s} && \text{Property of Hadamard Product} \end{aligned} $$

Hence $ \boldsymbol{s} = {\left( \left( {P}^{H} P \right) \circ I \right)}^{-1} \operatorname{diag} \left( {P}^{H} D \right) $.

Remark: In this section we used $ \boldsymbol{s} $ to represent $ \boldsymbol{s}_{0} $.

Adding Regularization

In case we have a prior on the values of $ \boldsymbol{s}_{0} $ we should incorporate them into the objective function.
For instance, assume we have a prior $ \boldsymbol{s}_{0} \sim \mathcal{N} \left( \boldsymbol{\mu}, \Sigma \right) $ then the objective function becomes:

$$ \begin{align} \arg \min_{S} \quad & {\left\| P S - D \right\|}_{F}^{2} + {\left( \boldsymbol{s}_{0} - \boldsymbol{\mu} \right)}^{H} {\Sigma}^{-1} \left( \boldsymbol{s}_{0} - \boldsymbol{\mu} \right) \\ \text{subject to} \quad & \begin{aligned} S & \in \mathcal{D}^{F} = \left\{ D \in \mathbb{C}^{F \times S} \mid {D}_{ij} = \begin{cases} \boldsymbol{d}_{i} & \text{ if } i = j \\ 0 & \text{ if } i \neq j \end{cases} \right\} \\ S & = \operatorname{Diag} \left( \boldsymbol{s}_{0} \right) \end{aligned} \end{align} $$

By setting $ \boldsymbol{\mu} = \boldsymbol{0} $ one basically damps the values of $ \boldsymbol{s}_{0} $ towards zero.

Thank you for your answer. The model for S is not correct, the elements of the diagonal are not identical. Instead we have: $S=diag(s_{0,1} ... s_{0,F})$ with $s_{0,i=1,F}$ the $F$ complex unknowns to determine. — User327201, Mar 28 '23 at 10:33
this is the same expression I gave in my question (although not so elegantly stated). My question was more about how to modify that expression. Because, as I show in my question, I have an overestimation of $s_{0,i}$ at high frequencies because of the low Signal to Noise Ratio. — User327201, Mar 28 '23 at 13:46
@User327201, Do you have some prior or model for the value of $ {S}_{0} \left( f \right) $ for the high frequencies? What's missing in the model is the function which works along the frequency. — Royi, Mar 28 '23 at 14:05
It is $P$ which is a function of the frequency. Concerning $S_0(f)$ I can simply estimate the signal to noise ratio. — User327201, Mar 28 '23 at 14:33
The general damping prior would be adding $ \lambda {\left| \boldsymbol{s}{0} \right|}{2}^{2} $ to the objective function. This means the prior is a Gaussian distribution. If you have a better model for $ \boldsymbol{s}_{0} $ we can incorporate it. For example, should it be monotonic? Maybe smooth? Have a specific distribution? — Royi, Mar 28 '23 at 14:37
@User327201, I added a section about regularization. Yet without some prior from your side about the actual model I had to add a generic regularization. — Royi, Mar 29 '23 at 21:40

Noise leakage problem with least square estimation in the frequency-distance domain

1 Answers1

Deriving the Closed Form Solution

Adding Regularization