0

I have the following data over time:

img

that means data collected for a single variable like CPU usage in lowest, highest, and average mode over time every 5 mins (data granularity = 5mins) like the following data frame:

|    | timestamp           |   min cpu |     max cpu |     avg cpu |
|---:|:--------------------|----------:|------------:|------------:|
|  0 | 2017-01-01 00:00:00 |    715147 | 2.2233e+06  | 1.22957e+06 |
|  1 | 2017-01-01 00:05:00 |    700474 | 2.21239e+06 | 1.21132e+06 |
|  2 | 2017-01-01 00:10:00 |    705954 | 2.21306e+06 | 1.20663e+06 |
|  3 | 2017-01-01 00:15:00 |    688383 | 2.18757e+06 | 1.19037e+06 |
|  4 | 2017-01-01 00:20:00 |    688277 | 2.18368e+06 | 1.18099e+06 |

I sliced the dataframe and worked on a univariate time-series data problem as follows:

|    | timestamp           |     avg cpu |
|---:|:--------------------|------------:|
|  0 | 2017-01-01 00:00:00 | 1.22957e+06 |
|  1 | 2017-01-01 00:05:00 | 1.21132e+06 |
|  2 | 2017-01-01 00:10:00 | 1.20663e+06 |
|  3 | 2017-01-01 00:15:00 | 1.19037e+06 |
|  4 | 2017-01-01 00:20:00 | 1.18099e+06 |

I split data and applied PI (Prediction Interval) using a regression:

|                     |        pred |   lower_bound |   upper_bound |
|:--------------------|------------:|--------------:|--------------:|
| 2017-01-25 00:00:00 | 1.15232e+06 |   1.12482e+06 |   1.1874e+06  |
| 2017-01-25 00:05:00 | 1.14453e+06 |   1.10052e+06 |   1.18994e+06 |
| 2017-01-25 00:10:00 | 1.14033e+06 |   1.08739e+06 |   1.20795e+06 |
| 2017-01-25 00:15:00 | 1.13669e+06 |   1.0843e+06  |   1.20252e+06 |
| 2017-01-25 00:20:00 | 1.1271e+06  |   1.06837e+06 |   1.19865e+06 |

img img


We know:

"Coherence: It is used for measuring the correlation between two signals. ... Coherence is the normalized cross-spectral density:" $$C x y=\frac{|P x y|^2}{P x x-P y y}$$ ref.

question:

Does one evaluate something potentially meaningful based on the coherence of predictions['upper_bound'] or predictions['pred'] with actual test data data_test['avg cpu']?

img


code:


#!pip install skforecast
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

from sklearn.linear_model import Ridge, #Lasso, LinearRegression from skforecast.ForecasterAutoreg import ForecasterAutoreg

Create and train forecaster

==============================================================================

forecaster = ForecasterAutoreg( regressor = Ridge(alpha=0.1, random_state=765), lags = 288 )

forecaster.fit(y=data_train['avg cpu'])

Prediction intervals

==============================================================================

predictions = forecaster.predict_interval( steps = steps, interval = [1, 99], n_boot = 500 )

Prediction error

==============================================================================

error_mse2 = mean_squared_error( y_true = data_test['avg cpu'], y_pred = predictions['upper_bound'] )

print(f"Test error (MSE): {error_mse2}")

Plot forecasts with prediction intervals and coherence of signals

==============================================================================

import numpy as np import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(6, 3)) plt.ylabel('cpu', fontsize=15) plt.ticklabel_format(style='plain') plt.xlabel('timestamp', fontsize=15, color='darkred') cossignal1=data_test['avg cpu'].plot(ax=ax, label='Test-set', color='orange', linestyle='-.', marker="p") cossignal2=predictions['upper_bound'].plot(ax=ax, label=f"predictions['upper_bound']" , color='darkred') predictions['pred'].plot(ax=ax, label=f"predictions['pred']") plt.title("Signals") #place legend in top right corner plt.legend(bbox_to_anchor=(1.6,.9), loc="upper right") plt.show()

Store the value of correlation in a

variable say 'cor' using the following code:

fig, ax = plt.subplots(figsize=(6, 3)) cor=plt.cohere(data_test['avg cpu'],predictions['upper_bound'], c='g') plt.title(f"Coherence of Signals: predictions['upper_bound'] and data_test['avg cpu']")

plot the coherence graph

ax.legend(['Coherence']) plt.show()

Store the value of correlation in a

variable say 'cor' using the following code:

fig, ax = plt.subplots(figsize=(6, 3)) cor=plt.cohere(data_test['avg cpu'],predictions['pred'], c='g') plt.title(f"Coherence of Signals: predictions['pred'] and data_test['avg cpu']")

plot the coherence graph

ax.legend(['Coherence']) plt.show() ```

Mario
  • 421
  • 1
    Judging from your third-to-last plot, your predictions seem to be biased: they are systematically too low. In such a situation, I would be careful about any correlation-based measure, because you can have high correlation and high bias at the same time. Prediction intervals are typically assessed using the interval or Winkler score. – Stephan Kolassa Mar 12 '24 at 10:35
  • @StephanKolassa thanks for your input. You're right about creating a minimal example using skforecast python package for time-series analytics. Within this package regardless of the regressor one can choose (in my case Ridge) within the ForecasterAutoreg() class, prediction Interval (PI) results return by using predict_interval(). Maybe using an inappropriate regressor like Ridge outputs this biased low prediction. maybe the PI approach within causes it. I included the Python codes for better understanding. – Mario Mar 12 '24 at 11:16
  • Then PI results are computed by the bootstrapping process, which after reading the documentation I don't how works and create a separate post to understand it. Maybe you can assist me to understand it. – Mario Mar 12 '24 at 11:17

0 Answers0