I'm learning how to use Keras and I've had reasonable success with my labelled dataset using the examples on Chollet's Deep Learning for Python. The data set is ~1000 Time Series with length 3125 with 3 potential classes.
I'd like to go beyond the basic Dense layers which give me about 70% prediction rate and the book goes on to discuss LSTM and RNN layers.
All the examples seem to use datasets with multiple features for each timeseries and I'm struggling to work out how to implement my data as a result.
If for example, I have 1000x3125 Time Series, how do I feed that into something like the SimpleRNN or LSTM layer? Am I missing some fundamental knowledge of what these layers do?
Current code:
import pandas as pd
import numpy as np
import os
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, Dropout, SimpleRNN, Embedding, Reshape
from keras.utils import to_categorical
from keras import regularizers
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
def readData():
# Get labels from the labels.txt file
labels = pd.read_csv('labels.txt', header = None)
labels = labels.values
labels = labels-1
print('One Hot Encoding Data...')
labels = to_categorical(labels)
data = pd.read_csv('ts.txt', header = None)
return data, labels
print('Reading data...')
data, labels = readData()
print('Splitting Data')
data_train, data_test, labels_train, labels_test = train_test_split(data, labels)
print('Building Model...')
#Create model
model = Sequential()
LSTM / RNN goes here
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print('Training NN...')
history = model.fit(data_train, labels_train, epochs=1000, batch_size=50,
validation_split=0.25,verbose=2)
results = model.evaluate(data_test, labels_test)
predictions = model.predict(data_test)
print(predictions[0].shape)
print(np.sum(predictions[0]))
print(np.argmax(predictions[0]))
print(results)
acc = history.history['acc']
val_acc = history.history['val_acc']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
I managed to get my code to at least run by reshaping my data from a 1000x3125 matrix into a 3D matrix using data = np.reshape(data,(1000,1,3125)). This let me run the LSTM with input_shape(1,3125) but again, I'm not really sure what I'm doing.
Again, thank you very much for the reply. I'll have a look at the links you provided and study your answer some more.
– user1147964 Feb 07 '18 at 10:29batch_sizewhen defining the model, it will be taken from the same argument withinmodel.fit(). You should be reshaping to get(3025, 100, 1000), which means 3025 batches, each of 100 (rows) timesteps and 1000 (columns) variables. Usingnp.reshapewill sadly not work for this (you'll get an error), due the fact that you will have data overlaps... the final shape has more data than the input. 3025x100x1000 > 3125x1000 -np.reshapedoesn't like that as it's ambiguous. I suggest simply looping over the dataset, 1 loop = 1 sample. – n1k31t4 Feb 07 '18 at 11:15Your reshaping seems to generate many more values
Thanks for the response again. Sorry
– user1147964 Feb 07 '18 at 12:22Thanks again for the reply
– user1147964 Feb 07 '18 at 14:17