0

For my evaluation, I have used gnuplot to plot data from two separate csv files (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs) with a different number of rows which generates the following graph.

enter image description here

These data seem to have no common timestamp (the first column) in both csv files and yet gnuplot seems to fit the plotting as shown above.

Here is the gnuplot script that I use to generate my plot.

# ###### GNU Plot

set style data lines
set terminal postscript eps enhanced color "Times" 20

set output "output.eps"

set title "Actual vs. Estimated Comparison"

set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey

set grid xtics ytics mytics
#set size 2
#set size ratio 0.4

#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"

set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0

plot  "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";

I wanted to interpolate my green line into the grid where my pink line is defined, then compare the two. Here is my initial approach

#!/usr/bin/env python
import sys

import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
    return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))

for g in lines[0].intersection(lines[1]):
    if g.geom_type != 'Point':
        continue
    print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:

set terminal pngcairo
set output 'fig.png'

set datafile separator comma
set yr [0:700]
set xr [0:10]

set xtics 0,2,10
set ytics 0,100,700

set grid

set xlabel "Time [seconds]"
set ylabel "Segments"

plot \
    'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
    'actual.csv' w l lc rgb 'green' t 'Actual', \
    '<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''

which gives us the following plot

enter image description here

I wrote the filtered points to another file (filtered_points.csv found in this link:https://drive.google.com/open?id=0B2Iv8dfU4fTUSHVOMzYySjVzZWc) from this script. However, the filtered points are less than 10% of the actual dataset (which is the ground truth).

Is there any way where we can interpolate the two lines by ignoring the pink high peaks above the green plot using python? Gnuplot doesn't seem to be the best tool for this. If the pink line doesn't touch the green line (i.e. if it is way below the green line), I want to take the values of the closest green line so that it will be a one-to-one correspondence (or very close) with the actual dataset. I want to return the interpolated values for the green line in the pink line grid so that we can compare both lines since they have the same array size.

Desta Haileselassie Hagos
  • 19,728
  • 4
  • 43
  • 51
  • 1
    I think I don't understand what you really want to do. What does "I wanted to interpolate my green line into the grid where my pink line is defined, then compare the two." mean? In my understanding you like to: 1. Fit the green curve 2. Make sure all your pink data is below the green one 3. Compare the data and by this means look for intersections. 4. Return this intersection data Is that right? Isn't the green curve already fullfilling what you're looking for? – Franz Jun 07 '17 at 11:49
  • 1
    What kind of interpolation? Linear? Splines? Other? – Stop harming Monica Jun 07 '17 at 12:00
  • @Franz, EXACTLY!!! But what I finally wanted is a one-to-one data size of the green and pink lines. If you have seen the .csv files in this link: https://drive.google.com/drive/folders/0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs - we have more data points in the `estimated.csv` than `actual.csv` (ground truth). In this case, i want to smooth so that it so it fits the ground truth. If there is a gap (as you can see from the plot, some points are below the green line - in that case, we will take the current value (data point) of the green line). Hope this explains. – Desta Haileselassie Hagos Jun 07 '17 at 12:25
  • @Goyo, I think `Splines` would be fine. – Desta Haileselassie Hagos Jun 07 '17 at 12:27

1 Answers1

1

Getting the same data size in terms of an interpolation is pretty simple by numpy.interp(). For me, this code works:

import numpy as np
import matplotlib.pyplot as plt

names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
interpolated_estimation = np.interp(estimated_data[:,0],actual_data[:,0],actual_data[:,1])

plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(estimated_data[:,0],interpolated_estimation, label='interpolated')
plt.legend()
plt.show(block=True)

After this interpolation interpolated_estimation has the same size as the x axis of actual_data, as the plot suggests. The slicing is a bit confusing but I tried to use your function and make the plot calls as clear as possible.

enter image description here

To save to a file and plot like suggested I changed the code to:

import numpy as np
import matplotlib.pyplot as plt

names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
interpolated_estimation = np.interp(estimated_data[:,0],actual_data[:,0],actual_data[:,1])

plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
#plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(estimated_data[:,0],interpolated_estimation, label='interpolated')
np.savetxt('interpolated.csv',
       np.vstack((estimated_data[:,0],interpolated_estimation)).T,
       delimiter=',', fmt='%10.5f') #saves data to filedata to file
plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,500)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

This produces the following output: enter image description here

Franz
  • 613
  • 7
  • 14
  • Thank you Franz. Will it possible to write the data points of the `interpolated` to another file so that it will be easy to see if there is a one-to-one correspondence? Finally, I want to generate a graph like in the following link: https://drive.google.com/open?id=0B2Iv8dfU4fTUSHVOMzYySjVzZWc – Desta Haileselassie Hagos Jun 07 '17 at 16:19
  • Hi Desta, I added the changed source. – Franz Jun 07 '17 at 16:36
  • Awesome. I have make your answer accepted Franz. But if you have seen the files `actual.csv`(with 15179 rows) and `estimated.csv` (with 258267 rows) - there are two columns of data separated by a comma. But in the `interpolated.csv` new file, there are about 516534 rows of one column data. Even the new data (for example the first row: 2.648999999999999879e-03) is hard to interpret. Is it possible to have two columns as the original file and make it a one-to-one correspondence with the `actual.csv`? I mean to have the same rows as `actual.csv`? – Desta Haileselassie Hagos Jun 07 '17 at 17:12
  • Very good and thank you. Now `interpolated.csv` - the new file has exactly the same rows as the original `estimated.csv`. Does that mean we can't make it a one-to-one correspondence with `actual.csv` (that means `actual.csv` and `interpolated.csv` to have the same number of rows? – Desta Haileselassie Hagos Jun 07 '17 at 19:37
  • 1
    Since the number of rows is taken from `estimated.csv` and the data is taken from `actual.csv`, the `interpolated`-data represents your `actual`-data. By this means, actual and estimated have the same number of rows if you take `interpolated` as `actual`. – Franz Jun 08 '17 at 07:23
  • Hi Franz. I would appreciate if you could have a look at this. In my previous post, I shouldn't have used the `actual.csv` for the interpolation as it is my ground truth. Is there any way that we can find a pattern of the `estimated.csv` file (signal) alone? You can either answer here or in my new question when you get a chance dear. Thanks. https://stackoverflow.com/questions/44458859/python-finding-pattern-in-a-plot – Desta Haileselassie Hagos Jun 09 '17 at 13:19