2

Background/problem I am trying to solve: I have vehicle timeline data for position, velocity, accelerations for all vehicles on a section of a motorway for ~30min where adjacent data points are spaced ~0.1s in time. I am trying to construct a regression neural network model which can predict one-dimensional vehicle accelerations i.e. vehicles remain in one lane and we are omitting lane changes. For each vehicle in the dataset I am working with, I extract the following feature vector:

[Vehicle speed, Vehicle acceleration, Distance to vehicle ahead, Speed of vehicle ahead, Acc of vehicle ahead].

In the case where there is no preceding vehicle, I set the final 3 features related to the vehicle ahead to an arbitrarily large value of 9999.99. This would essentially indicate a vehicle which is super far away and travelling super fast so we can ignore it and our acceleration output would only depend upon our own dynamics (the first 2 features).

Intuitively, I feel like this is more than enough information to produce reasonable acceleration outputs, thinking about when I am driving myself these are pretty much the only things to consider when deciding whether to brake/accelerate. I am sure there is some sort of non-linear relationship between this feature vector and vehicle acceleration outputs, however, let me know if you disagree with these as being sufficient for this regression problem. As a result, I have decided to use a neural network to learn the non-linear mapping between my feature input and the acceleration as an output.

Issues I am facing: No matter what I try I can't seem to get my neural network to produce "reasonable" outputs. I am expecting something in the form of:

  • Negative acceleration outputs when distance to vehicle ahead is low and vehicle speed > speed of vehicle ahead
  • Positive acceleration when there is no vehicle ahead i.e. final 3 elements of feature vector are 9999.99

I will attach an example of the kind of outputs I am getting below.

Data preparation: In terms of the data I use for training, I have around 800 000 - 900 000 data points which correspond to around 1 000 different vehicles over the 30min period the data was collected. For the inputs to the NN I extract the 5 features I described above; for the outputs I look around 2.5sec into the future from the time at which a feature vector was extracted and I average the vehicle acceleration across this window. I chose 2.5 sec as it would take around 1-3 sec for a driver's action to manifest in response to a given state they are in (this parameter is not fixed and I can modify it as required). I also smooth the entire dataset slightly using a moving average function since there tends to be fluctuations in the logged velocity and accelerations.

Neural network architecture I am using: I have attached a code of the neural network architecture below. I have kept the neural network relatively small, I assumed the mapping between my inputs and outputs should be relatively straightforward to learn since there should be a strong correlation between the two. I am using Python and PyTorch for all of my Neural Network code.

class NN(nn.Module):
def __init__(self, input_size, output_size):
    super(NN, self).__init__() # initialises the constructor in nn.Module
    self.fc1 = nn.Linear(input_size, 50)
    self.fc2 = nn.Linear(50, 50)
    self.fc3 = nn.Linear(50, 50)
    self.fc4 = nn.Linear(50, output_size)

def forward(self, x): # where x is the input data x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = self.fc4(x) return x

Below this I will attach the code which I use to train the model:

# Hyperparameters
learning_rate = 0.0001
num_epochs = 10
batch_size = 1024

Load Inputs and outputs

input_size, output_size = 5, 1

X = np.loadtxt('NN_input_data', dtype=float) labels = np.transpose([np.loadtxt('NN_output_data', dtype=float)])

X = torch.from_numpy(X).float() labels = torch.from_numpy(labels).float()

train_data = torch.utils.data.TensorDataset(X, labels) train_loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)

Build Network

model = neuralnet.NN(input_size=input_size, output_size=output_size)

Loss and Optimiser

Loss_Criterion = nn.MSELoss() Optimiser = optim.Adam(model.parameters(), lr=learning_rate)

Train Network

print('Training Network:') num_batches = len(train_loader)

for epoch in range(num_epochs): t1 = time.time() for batch_idx, (data, targets) in enumerate(train_loader): data = data.to(device=device) targets = targets.to(device=device)

    # Forward pass
    scores = model(data)
    loss = Loss_Criterion(scores, targets)

    # Backwards pass
    Optimiser.zero_grad() # zero gradients from previous step
    loss.backward() # backwards pass

    # Gradient descent step
    Optimiser.step() # updates weights

# Print statistics
print('Epoch:', epoch, 'Loss:', round(loss.item(), 3), 'Time Taken:', round((time.time()-t1)/60, 3), 'min')

Save Neural Network

torch.save(model.state_dict(), 'NNRegModel_a')

The training loss I obtain from this code is shown below:

Training loss after each epoch

Finally, I will also attach below the outputs that this neural network would produce in simulation in response to a given feature vector below. The initial speeds of the vehicle under consideration and vehicle ahead are 13.4 m/s and 5.069 m/s, respectively. The simulation ran for 2.1 sec in total and terminated with a collision because the neural network outputs where not sufficient to reduce the speed of the vehicle enough to avoid a collision. The simulation logs show that even though the distance to the vehicle ahead is reducing, the acceleration output is not getting larger in magnitude (i.e. to reduce the vehicle speed faster) so the neural network outputs are not accurate enough.

Given the starting speeds of the two vehicles and the distance apart I believe it is not unreasonable for the model to be able to prevent a collision between the two vehicles, such situations (or worse) arise often in real life driving and drivers avoid a collision just fine.

Simulation logs

Things I have tried:

  • I have tried playing around with the learning rate, increasing the epochs, changing the batch size but I get a similar (or worse) response.

  • I also tried to omit all the data points with a 9999.99 value out of the training step and train my neural network without these extreme points, however, the neural network then produces extremely large acceleration outputs when it sees this data point in simulation.

  • I have attempted to take the sigmoid of the inputs in order to scale the extreme values of 9999.99 down to a more reasonable scale (the sigmoids I use are adjusted to account for the mean and variance of the corresponding feature). This does not produce a better result than the one shown below.

  • Finally, I have also tried a couple different activation functions (apart from ReLU, I have tried sigmoid and tanh) which also did not produce an improved result.

Any comments on my data processing, neural network, code or the problem as a whole?

  • You say that you expect "Positive acceleration when there is no vehicle ahead i.e. final 3 elements of feature vector are 9999.99" but this seems to ignore some pretty important context for drivers, i.e. speed limits, confusing signage, traffic lights, stop signs, speed bumps, curves, bad pavement, debris, pedestrians, bicycles, and other elements of the roadway which aren't captured by your model. – Sycorax Sep 15 '22 at 18:09
  • The dataset I am using is for a section of a motorway (the exact dataset is NGSIM - Lankershim Highway) so I have assumed that the miscellaneous features you mentioned have a negligible effect. In terms of traffic lights - I am modelling these as background vehicles with zero velocity which block the path of the vehicle under consideration, hence, allowing the model to detect these cases also. – hokage007 Sep 15 '22 at 18:30
  • I don’t know if this is a common benchmark dataset or not, but I’d start by reading about the other models and preprocessing steps that previous researchers have used. It’s plausible that either or both could contribute to a weak model over all. If nothing else, knowledge of previous research will tell you whether you’re in the same ballpark as other results. – Sycorax Sep 15 '22 at 18:49
  • Why aren't you scaling the data before you send it to the model? – Sycorax Sep 15 '22 at 20:37
  • In the things I've tried section I mentioned trying to scale the inputs between 0 and 1 using various different sigmoids which are adjusted for using the variance and means of the features in the form 1/(1 + exp(-a(x - mean))), but this didn't improve my results. – hokage007 Sep 16 '22 at 09:58
  • Do you know where I can find details about previous research in this domain? I.e. neural networks applied to spatio-temporal timeline data – hokage007 Sep 16 '22 at 10:00
  • Sigmoids will tend to saturate for large or small values, suppressing the variability of your data. Additionally, the mean might not be too close to 0. Both effects can increase the training time for the model dramatically. See: https://stats.stackexchange.com/questions/437840 I mean centering and scaling the data by subtracting the mean and dividing by the standard deviation, or similar. You can find literature by doing a Google scholar search or using a library – Sycorax Sep 16 '22 at 13:04

0 Answers0