I have been trying to optimise some code that takes input signal in the form of a 2D array and convolves it with a gaussian of a certain FWHM. The size of this 2D signal array is (671, 2001), and I am convolving along the second axis - the time axis. Where,
time = 0:0.5:1000
nt = length(time) # nt = 2001
I am convolving with a gaussian of FWHM of 150, corrosponding to a standard deviation of about 63.7. I then pad the initial signal by 3 times the FWHM, setting the new values before the original start index as 0 and the new values after the original end index as the signal at the last time point.
Hence, nt = 3801 after padding.
My MATLAB code calculates this convolution in 117.5043 seconds. This is done through a nested for loop over each of the axis - should be slow in MATLAB. The main bit of theMATLAB code is as follows,
for it = 1:nt
% scan convolution mask across time
kernal = preb*exp(-b*(tc-tc(it)).^2); % convolution mask - Gaussian
N(it) = sum(kernal)*dt; % normalise
kernal = kernal/N(it); % ensure convolution integrates to 1
% calculate convolution for each row
for iq = 1:Nq
WWc(iq,it) = dt*sum((WWc0(iq,1:nt).').*kernal(1:nt));
end
end
I have rewritten this in Julia, except using circulant matrices to construct a gaussian kernal, this effectivley removes the foor loop over time, and the kernal to be applied over all time is now fully defined by one matrix. I then either loop over the rows and multiply this kernal by the signal of each row, or use the mapslices function. The main bit of code in Julia is as follows,
K = Circulant(G.Fx) # construct circulant matrix of gaussian function G.Fx
conv = zeros(nr, nt) # init convolved signal
#for i=1:nr # If using a loop instead of mapslice
# conv[i, :] = transpose(sC[i, :]) * K
#end
conv = mapslices(x -> K*x, sC; dims=2)
In Julia this convolution takes 368 seconds (almost 3 times as slow as MATLAB), despite using circulant matrices to skip a foor loop and reducing it down to multiplying two arrays of size (1, 3801) and (3801, 3801) for each row.
output of @time:
368.047741 seconds (19.37 G allocations: 288.664 GiB, 14.30% gc time, 0.03% compilation time)
Both are run on the same computer, should Julia not outperform MATLAB here?
UPDATE:
Playing around with the Julia code some more, I have rewritten the matrix multiplication in terms of loops, but it is still slow. The convolution is performed within a function although calls two external constructor types - Gaussian or Lorentzian depending on an input flag. I also load SpecialMatrices.jl for the Circulant constructor. I am new to Julia and not sure if that would make it slow. When using the @time macro on independent bits of the function, it is apparent that every part is really fast up until it hits the loop at the end that actually calculations the convolution. This loop is incredibly slow either way I do it, be that using matrix multiplication over the circulant kernel, a double nested loop in combination with the sum() function, or a triple nested loop that does the sum itself.
using MAT
using SpecialMatrices
wd = " "
inputfile = matopen(wd*"/Signal.mat")
signal = read(inputfile, "pdW")
close(inputfile)
inputfile = matopen(wd*"/Vectors.mat")
qAng = read(inputfile, "qAng")
tvec = read(inputfile, "tt")
struct Gaussian
Fx:: AbstractVector{Float64}
sigma:: Float64
height:: Float64
fwhm:: Float64
function Gaussian(fwhm:: Float64, x:: StepRangeLen{Float64}, x0:: Float64)
sigma = fwhm/2sqrt(2log(2))
height = 1/sigma*sqrt(2pi)
Fx = height .* exp.( (-1 .* (x .- x0).^2)./ (2*sigma^2))
new(Fx, sigma, height, fwhm)
end
end
struct Lorentzian
Fx:: AbstractVector{Float64}
gamma:: Float64
fwhm:: Float64
function Lorentzian(fwhm:: Float64, x:: StepRangeLen{Float64}, x0:: Float64)
gamma = fwhm/2
Fx = 1 ./ ((pi * gamma) .* (1 .+ ((x .- x0)./ gamma).^2 ))
new(Fx, gamma, fwhm)
end
end
function convolve(S:: Array{Float64}, fwhm:: Float64, time:: Array{Float64}, type:: Int)
nr:: Int64, nc:: Int64 = size(S)
nt:: Int64 = length(time)
dt:: Float64 = sum(diff(time, dims=2))/(length(time)-1)
duration:: Float64 = fwhm*3 # must pad three time FHWM to deal with edges
if dt != diff(time, dims=2)[1] ; error("Non-linear time range."); end
if nr != nt && nc != nt; error("Inconsistent dimensions."); end
if nc != nt; S = transpose(S); nr, nc = nc, nr; end # column always time axis
tmin:: Float64 = minimum(time); tmax:: Float64 = maximum(time)
tconv_min:: Float64 = tmin - duration ; tconv_max:: Float64 = tmax + duration
tconv = tconv_min:dt:tconv_max # extended convoluted time vector
ntc:: Int64 = length(tconv)
padend:: Int64 = (duration/dt) # index at which 0 padding ends and signal starts
padstart:: Int64 = (padend + tmax / dt) # index where signal ends and padding starts
type == 0 ? Kv = Gaussian(fwhm, tconv, tconv[1]) : Kv = Lorentzian(fwhm, tconv, tconv[1])
println("Convolving signal with a $(typeof(Kv)) function with FWHM: $(Kv.fwhm) fs.")
K:: Array{Float64} = Circulant(Kv.Fx) # construct kernel from circ matrix
sC = zeros(Float64, nr, ntc)
sC[1:nr, 1:padend] .= S[1:nr, 1] # extended and pad signal
sC[1:nr, padend:padstart] .= S
sC[1:nr, padstart:end] .= S[1:nr, end]
println("""
Signal padded by 3*FWHM ( $(duration) fs ) forwards and backwards.
Original time length: $(nt), Extended time: $(ntc), Diff: $(ntc-nt) steps.
Padded signal size: $(size(sC)).
Kernel size: $(size(K[1, :])).
""")
conv = zeros(Float64, nr, ntc)
for t in 1:ntc
for q in 1:nr
#conv[q, t] = sum(sC[q, 1:ntc] .* K[t, :])*dt
conv[q, t] = 0.0
for j in 1:ntc
conv[q, t] += sC[q, j] * K[t, j] * dt
end
end
end
return conv, tconv
end
fwhm = 100.0
conv, tconv = convolve(signal, fwhm, tvec, 0)
UPDATE 2: Working fast code, w a wrapper for convolving a random signal.
using LinearAlgebra
using SpecialMatrices
function wrapper()
signal = rand(670, 2001)
tvec = 0:0.5:1000
fwhm = 150.0
Flag = 0
conv = convolve(signal, tvec, fwhm, Flag)
end
function Gaussian(x:: StepRangeLen{Float64}, x0:: Float64, fwhm:: T) where {T<:Union{Float64, Int64}}
sigma = fwhm/2sqrt(2log(2))
height = 1/sigma*sqrt(2pi)
Fx = height .* exp.( (-1 .* (x .- x0).^2)./ (2*sigma^2))
Fx = Fx./sum(Fx)
end
function Lorentzian(x:: StepRangeLen{Float64}, x0:: Float64, fwhm:: T) where {T<:Union{Float64, Int64}}
gamma = fwhm/2
Fx = 1 ./ ((pi * gamma) .* (1 .+ ((x .- x0)./ gamma).^2 ))
Fx = Fx./sum(Fx)
end
function generate_kernel(tconv:: StepRangeLen{Float64}, fwhm:: Float64, centre:: Float64, ctype:: Int64)
if ctype == 0
K = Gaussian(tconv, centre, fwhm)
elseif cytpe == 1
K = Lorentzian(tconv, centre, fwhm)
else
error("Only Gaussian and Lorentzian functions currently available.")
end
K = Matrix(Circulant(K))
return K
end
function convolve(S:: Matrix{Float64}, time:: StepRangeLen{Float64}, fwhm:: Float64, ctype:: Int64)
nr:: Int64, nc:: Int64 = size(S)
dt:: Float64 = sum(diff(time, dims=1))/(length(time)-1)
nt:: Int64 = length(time)
duration:: Float64 = fwhm*3 # must pad three time FHWM to deal with edges
tmin:: Float64 = minimum(time); tmax:: Float64 = maximum(time)
if dt != diff(time, dims=1)[1] ; error("Non-linear time range."); end
if nr != nt && nc != nt; error("Inconsistent dimensions."); end
if nc != nt; S = copy(transpose(S)); nr, nc = nc, nr; end # column always time axis
tconv_min:: Float64 = tmin - duration ; tconv_max:: Float64 = tmax + duration
tconv = tconv_min:dt:tconv_max # extended convoluted time vector
ntc = length(tconv)
padend:: Int64 = (duration/dt) # index at which 0 padding ends and signal starts
padstart:: Int64 = (padend + tmax / dt) # index where signal ends and padding starts
K = generate_kernel(tconv, fwhm, tconv[padend], ctype)
sC = zeros(Float64, nr, ntc)
sC[1:nr, 1:padend] .= @view S[1:nr, 1] # extended and pad signal
sC[1:nr, padend:padstart] .= S
sC[1:nr, padstart:end] .= @view S[1:nr, end]
conv = zeros(Float64, nr, ntc)
conv = convolution_integral(sC, K, dt)
S .= conv[:, 1:nt] # return convoluted signal w same original size
return S
end
function convolution_integral(signal:: Matrix{Float64}, kernel:: Matrix{Float64}, dt:: Float64)
LinearAlgebra.BLAS.set_num_threads(Sys.CPU_THREADS)
conv = signal * kernel
conv .*= dt
return conv
end
using BenchMarktools:
julia> @btime wrapper();
128.438 ms (11125 allocations: 300.24 MiB)
There is some strange slicing of the padded signal in the sense that the padding at early time flips to the end of the time extension after convolution. Something to do with the centering of the gaussian kernel I think. For now I just slice the the array back into it's original dimensions without the padding and get the result I expect.