0

Good Morning,

Currently, I'm new to Kears and neural networks in general. I'm working through Deep Learning In R With Keras with a 'capstone' project in mind, but I'm struggling to understand how to approach it properly. For reference, my background is in sport science, and am self-taught in analytics.

Project Overview

I have 20 separate days of .csv files of myself playing soccer. Each file ranges from 45 minutes to 90 minutes of velocity (vel), and heart-rate (hr) data at 10hz (.1 second timesteps). The goal is to predict future heart rate values from velocity data. Then I want to compare the predicted to the actuals. My common sense analysis says If (hr predicted > hr actual), that could mean I'm more recovered. If (hr predicted < hr actual) I could be in a state of fatigue.

These are my assumptions currently

Sample Size

With each file/sample being on average 36,000 data points (45-90 min * 60 secs * 10 deciseconds) I would assume that is too large for a local project on my CPU due to memory usage, computation time, and vanishing gradient. So I would need to slice each file into multiple samples of N-length sequences. Is this the best approach? Are there any rules of thumb for the length of each sample?

Sample Padding

The last sample of each file will not be the correct length. Would it be better to pad that sample with zeros, or to just drop the final sample?

Cardiac Drift, information loss per sample

I need information from the entire day to capture the effect of Cardiac Drift (Essentially hr will drifter higher the more fatigued you become) and other physiological processes. Secondly, the start of each sample needs to reference the state of the end of the previous sample. I could engineer a feature to estimate that (perhaps a sum of the distance covered), but I was hoping to estimate cardiac drift with lstm. I'm not sure which approach makes sense here. Any suggestions?

Multi-input architecture

Weather and other external variables will affect heart rate response. I would need to add other independent variables to the model. A little Googling shows using a multi-input architecture, but I'm afraid this idea is a bit out of my current scope. Is this the right door to knock on?

Estimating a prediction interval

This is an area I feel very unsure of. The model will return a point prediction, and I'm comparing that prediction to the actual values. Preferably I would want something like a confidence interval/prediction interval for each point, then I could do some stats like how unlikely a particular actual is outside of the predicted. Is this even possible? If not, could I calculate something from the loss function? I would assume I would use MSE, and my intuition says I can use information from that to understand the magnitude of the difference.

Thank you for any suggestions as I try to learn what I need to learn.

0 Answers0