0

I am working on my bachelor thesis with time series data. The idea is to predict the expected battery life based on voltage data from sensors.

During my research I came across SARIMAX. For me this ML algorithm sounded very plausible at first. Unfortunately, I was only able to generate constant predictions. Since I was not sure if this prediction was due to the underlying possibly incomplete data set. I Calculated a data set with charge and discharge curves myself.

So the data set my questions refer to looks like this:

enter image description here

Before passing the data to the algeorithm for learning, I logarithmized the data, formed the firstdifference, and tried to clean up the difference in seasonality. When I create a prediction with SARAMIAX I get only one constant like here:

enter image description here

My goal is to continue writing the curve into the future something like this:

enter image description here

I have read in some examples that it is not an error of the Sarimax model, but since the prediction only refers to the previous value, only a constant can be predicted.

Now, of course, I'm wondering whether I'm on the wrong track with SARIMAX, or whether I've simply taught the model incorrectly and can continue to work with SARIMAX. Maybe there is another ML algorithm you would prefer for this task?

Maybe someone reads this post who has experience with the prediction of time series data and puts me back on the right track.

I appreciate any kind of feedback, thank you in advance.

Edit:

The original data is transmitted by the sensors every 15 min, I resampled the data over the average to 1H. The data of a sensor look for example as follows:

enter image description here

Since the original data have different periods and I'm not sure how to deal with it, I first created the data set with uniform periods for simplicity. The self-generated data are recorded hourly, whereby a charge-discharge cycle lasts 2000 hours.

I attach the class with which I create the data, maybe this is the easiest way to explain myself.

class CapacitorCurve:
C = 1  # Kapazität des Kondensators in Farad
V0 = 3.6  # Anfangsspannung des Kondensators in Volt
R = 1  # Widerstand des Stromkreises in Ohm
tau = R * C  # Zeitkonstante in Sekunden
t = np.linspace(0, 5 * tau, 1000)

def init(self, C=1, V0=3.6, R=1): self.C = C self.V0 = V0 self.R = R

Funktion zur Berechnung der Ladekurve

def capacitor_charge(self, t, tau, V0): return V0 * (1 - np.exp(-t / tau))

Funktion zur Berechnung der Entladekurve

def capacitor_discharge(self, t, tau, max_capacity): return max_capacity * np.exp(-t / tau)

def multiple_charging_cycles(self, cycles): counter = 1 max_capacity = 0 min_capacity = 0 charging_cycles = np.empty(shape=1) for i in range(cycles): if counter == 1: charging_cycles = self.capacitor_charge(self.t, self.tau, self.V0) max_capacity = max(charging_cycles) counter += 1 else: charging_cycles = np.concatenate( (charging_cycles, self.capacitor_charge(np.linspace(min_capacity, 5 * self.tau, 1000), self.tau, self.V0))) counter += 1 charging_cycles = np.concatenate( (charging_cycles, self.capacitor_discharge(self.t, self.tau, max_capacity))) min_capacity = min(self.capacitor_discharge(self.t, self.tau, max_capacity)) return charging_cycles

To Dataframe:

cc = CapacitorCurve()
spannungskurven = cc.multiple_charging_cycles(5)

df = pd.DataFrame(spannungskurven) noise = 0.03 * np.random.normal(size=spannungskurven.shape) spannungskurven_noise = spannungskurven + noise df['spannungskurven_noise'] = spannungskurven_noise df['spannungskurven'] = pd.DataFrame(df[0])

my_date_range = pd.date_range(end='2023-01-01', periods=len(df), freq='H') df['DateTime'] = my_date_range df.set_index('DateTime', inplace=True) df.index.freq = 'H' df.replace([np.inf, -np.inf], np.nan, inplace=True) df = df.dropna() df = df.drop(columns=0)

  • Could you provide some more info on at least the nature of your data? May be helpful for answerers. – Shawn Hemelstrand Feb 12 '23 at 14:27
  • There are indeed a few people here with a little experience with time series forecasting (cough). First off, you should not need to logarithmize and take differences yourself, any self-respecting ARIMA tool should do that automatically. I suspect the issue is that you are not specifying the length of the periodicity. How long (in your underlying time granularity, i.e., hours) is a cycle in your application? – Stephan Kolassa Feb 12 '23 at 14:40
  • Also, your question title mentions "multiple seasonality". That typically means multiple periodicities of different length superimposed on each other (see the [tag:multiple-seasonalities] tag). For instance, hourly data very often exhibits intra-daily and intra-weekly patterns (i.e., the hourly pattern differs by day of week). I do not see anything like this in your data. Am I missing something, or do you actually not have any multiple seasonality? – Stephan Kolassa Feb 12 '23 at 14:42
  • Thank you for the feedback. I was happy to edit the question again to provide a little more insight into the data. With the 'multiple seasonality' in the title, I was indeed mistaken. I also tried to learn the SARIMAX model without logarithmizing first and doing the differentials. Also, I assume that I do not specify the length of the periodicity, or specify it incorrectly. Do you have here perhaps a reference for me? – Maximiliami Feb 12 '23 at 16:03

1 Answers1

0

Here is a tutorial on pmdarima.auto_arima. If you have quarter-hour time buckets, and each cycle lasts 2000 hours, you would need to specify the parameter m=4*2000 and seasonal=True.

This will probably break your Python kernel, because ARIMA simply was not built for such long seasonal patterns. (In addition, I don't think this implementation can deal with external regressors, so you would need to run your own regression on any predictors, then run residuals through auto_arima.)

I very much recommend this blog post by Rob Hyndman on how to deal with long seasonalities. It's in R, not Python, but the forecasting functionalities in R are better, anyway. Since you seem to have external predictors (it's not completely clear to me from your question), you could put these into a design matrix, then add columns for the Fourier terms as per the blog post, and feed all of this into the xreg parameter of forecast::auto.arima().

You may be interested in this thread on references for forecasting.

Stephan Kolassa
  • 123,354
  • Thank you for your time and expertise! It has helped me a lot to understand my problem. – Maximiliami Feb 13 '23 at 16:06
  • What could be done to improve the forecasting functionalities in Python? – Galen Mar 01 '24 at 02:20
  • @Galen: that sounds very much like a question by itself... which would probably be closed as "too broad". Is that why you are asking in a comment? – Stephan Kolassa Mar 01 '24 at 07:10
  • @StephanKolassa Yes, in retrospect my question is could be opening up a large and open-ended topic. Feel free to send grumbles about Python's timeseries tools to me in a chat. Yes, it was in the commends just to prompt you for a short response like "Here are three things I don't like [...]". – Galen Mar 05 '24 at 04:21
  • @Galen: to be honest, I'm still an inveterate R user, so Python is one of the few things even I don't feel competent to complain about... – Stephan Kolassa Mar 05 '24 at 06:58