Potential evaluation based on the coherence of predicted value with actual data

Question

I have the following data over time:

that means data collected for a single variable like CPU usage in lowest, highest, and average mode over time every 5 mins (data granularity = 5mins) like the following data frame:

|    | timestamp           |   min cpu |     max cpu |     avg cpu |
|---:|:--------------------|----------:|------------:|------------:|
|  0 | 2017-01-01 00:00:00 |    715147 | 2.2233e+06  | 1.22957e+06 |
|  1 | 2017-01-01 00:05:00 |    700474 | 2.21239e+06 | 1.21132e+06 |
|  2 | 2017-01-01 00:10:00 |    705954 | 2.21306e+06 | 1.20663e+06 |
|  3 | 2017-01-01 00:15:00 |    688383 | 2.18757e+06 | 1.19037e+06 |
|  4 | 2017-01-01 00:20:00 |    688277 | 2.18368e+06 | 1.18099e+06 |

I sliced the dataframe and worked on a univariate time-series data problem as follows:

|    | timestamp           |     avg cpu |
|---:|:--------------------|------------:|
|  0 | 2017-01-01 00:00:00 | 1.22957e+06 |
|  1 | 2017-01-01 00:05:00 | 1.21132e+06 |
|  2 | 2017-01-01 00:10:00 | 1.20663e+06 |
|  3 | 2017-01-01 00:15:00 | 1.19037e+06 |
|  4 | 2017-01-01 00:20:00 | 1.18099e+06 |

I split data and applied PI (Prediction Interval) using a regression:

|                     |        pred |   lower_bound |   upper_bound |
|:--------------------|------------:|--------------:|--------------:|
| 2017-01-25 00:00:00 | 1.15232e+06 |   1.12482e+06 |   1.1874e+06  |
| 2017-01-25 00:05:00 | 1.14453e+06 |   1.10052e+06 |   1.18994e+06 |
| 2017-01-25 00:10:00 | 1.14033e+06 |   1.08739e+06 |   1.20795e+06 |
| 2017-01-25 00:15:00 | 1.13669e+06 |   1.0843e+06  |   1.20252e+06 |
| 2017-01-25 00:20:00 | 1.1271e+06  |   1.06837e+06 |   1.19865e+06 |

We know:

"Coherence: It is used for measuring the correlation between two signals. ... Coherence is the normalized cross-spectral density:" $$C x y=\frac{|P x y|^2}{P x x-P y y}$$ ref.

question:

Does one evaluate something potentially meaningful based on the coherence of predictions['upper_bound'] or predictions['pred'] with actual test data data_test['avg cpu']?

code:


#!pip install skforecast
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import  Ridge, #Lasso, LinearRegression
from skforecast.ForecasterAutoreg import ForecasterAutoreg
Create and train forecaster
==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = Ridge(alpha=0.1, random_state=765),
                 lags      =  288
             )
forecaster.fit(y=data_train['avg cpu'])
Prediction intervals
==============================================================================
predictions = forecaster.predict_interval(
                  steps    = steps,
                  interval = [1, 99],
                  n_boot   = 500
              )
Prediction error
==============================================================================
error_mse2 = mean_squared_error(
                y_true = data_test['avg cpu'],
                y_pred = predictions['upper_bound']
            )
print(f"Test error (MSE): {error_mse2}")
Plot forecasts with prediction intervals and coherence of signals
==============================================================================
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 3))
plt.ylabel('cpu', fontsize=15)
plt.ticklabel_format(style='plain')
plt.xlabel('timestamp', fontsize=15, color='darkred')
cossignal1=data_test['avg cpu'].plot(ax=ax,       label='Test-set',                    color='orange',         linestyle='-.', marker="p")
cossignal2=predictions['upper_bound'].plot(ax=ax, label=f"predictions['upper_bound']" ,     color='darkred')
predictions['pred'].plot(ax=ax, label=f"predictions['pred']")
plt.title("Signals")
#place legend in top right corner
plt.legend(bbox_to_anchor=(1.6,.9), loc="upper right")
plt.show()
Store the value of correlation in a
variable say 'cor' using the following code:
fig, ax = plt.subplots(figsize=(6, 3))
cor=plt.cohere(data_test['avg cpu'],predictions['upper_bound'], c='g')
plt.title(f"Coherence of Signals: predictions['upper_bound'] and data_test['avg cpu']")
plot the coherence graph
ax.legend(['Coherence'])
plt.show()
Store the value of correlation in a
variable say 'cor' using the following code:
fig, ax = plt.subplots(figsize=(6, 3))
cor=plt.cohere(data_test['avg cpu'],predictions['pred'], c='g')
plt.title(f"Coherence of Signals: predictions['pred'] and data_test['avg cpu']")
plot the coherence graph
ax.legend(['Coherence'])
plt.show()
```

Judging from your third-to-last plot, your predictions seem to be biased: they are systematically too low. In such a situation, I would be careful about any correlation-based measure, because you can have high correlation and high bias at the same time. Prediction intervals are typically assessed using the interval or Winkler score. — Stephan Kolassa, Mar 12 '24 at 10:35
@StephanKolassa thanks for your input. You're right about creating a minimal example using skforecast python package for time-series analytics. Within this package regardless of the regressor one can choose (in my case Ridge) within the ForecasterAutoreg() class, prediction Interval (PI) results return by using predict_interval(). Maybe using an inappropriate regressor like Ridge outputs this biased low prediction. maybe the PI approach within causes it. I included the Python codes for better understanding. — Mario, Mar 12 '24 at 11:16
Then PI results are computed by the bootstrapping process, which after reading the documentation I don't how works and create a separate post to understand it. Maybe you can assist me to understand it. — Mario, Mar 12 '24 at 11:17

Potential evaluation based on the coherence of predicted value with actual data

Create and train forecaster

==============================================================================

Prediction intervals

==============================================================================

Prediction error

==============================================================================

Plot forecasts with prediction intervals and coherence of signals

==============================================================================

Store the value of correlation in a

variable say 'cor' using the following code:

plot the coherence graph

Store the value of correlation in a

variable say 'cor' using the following code:

plot the coherence graph

0 Answers0

Linked