Why does Python crash at exactly the same point when training several Keras NNs?

Question

I am attempting to train an LSTM NN on a timeseries dataset which contains over a thousand distinct devices with daily observations to predict when the devices will fail. I have handled my data cleaning and set-up appropriately, and can successfully train a NN on the data.

Now, I would like to run through many training iterations of the NN, each iteration with different hyperparameters, in order to find the best set training parameters for the data. In doing so, I am encountering a very strange error. As I train iterations of the NN, I am able to run through exactly 13 NN's, and then my program gets killed. When running in terminal, I simply get a message that says Killed, while running in console (Pycharm) gives me a process completed with exit code 1 message. I don't think the issue is running out of memory, as I keep the Heap size monitor up when running in console and I don't come near max. It also seems like more than coincidence that I would fail on the 13th run every time, given that I pick hyperparamters randomly (so I am not running the same 13 NNs every time I run the program).

Does anyone have advice on where I might start troubleshooting this?

Thank you.

Function I loop over to train subsequent NNs:

    lrate = round(10 ** (-4 * random.uniform(0, 1)), 3)  # 10e-4 to 10
    decay_rate = round(10 ** (-4 * random.uniform(0, 1)), 3)
    epochs_ = random.randint(25, 250)
    batches = random.randint(8, 64)
    l2_rate = round(10 ** (-4 * random.uniform(0, 1)), 3)
    dropout_rate = round(random.uniform(0.1, 0.5), 3)
    record_params = {'hidden_layers': hidden_layers, 'lrate': lrate, 'decay_rate': decay_rate, 'epochs': epochs_,
                     'batches': batches, 'l2_rate': l2_rate, 'dropout_rate': dropout_rate}

    model = Sequential()
    model_input_shape = X_train_input.shape[1:]
    model.add(LSTM(hidden_layers, input_shape=model_input_shape, return_sequences=True,
                   kernel_regularizer=regularizers.l2(l2_rate),
                   activation=activation_func))  # return_sequences=True,
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(LSTM(hidden_layers, kernel_regularizer=regularizers.l2(l2_rate),
                   activation=activation_func))  # return_sequences=True,
    model.add(Dropout(dropout_rate))
    model.add(BatchNormalization())
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(lr=lrate, decay=decay_rate),
                  metrics=[metrics.AUC(), metrics.FalseNegatives(), metrics.FalsePositives()])
    model.fit(X_train_input, y_train_input2, batch_size=batches, epochs=epochs_,
                        validation_data=(X_val_input, y_val_input2), verbose=1,
                        shuffle=False, callbacks=[EarlyStopping(monitor='val_loss', patience=15)]
                        )

    # test
    yhat = model.predict(X_train_input, batch_size=batches)
    yhat_val = model.predict(X_val_input, batch_size=batches)

    # record
    record_params['train_roc'] = round(roc_auc_score(y_train_input2, yhat.reshape(yhat.shape[0])), 3)
    record_params['val_roc'] = round(roc_auc_score(y_val_input2, yhat_val.reshape(yhat_val.shape[0]),
                                                   average='weighted'), 3)


    return pd.DataFrame(record_params, index=[ind]) ```



[Does not seem like a memory issue - this screenshotted during 5th NN run. ][1]


  [1]: https://i.stack.imgur.com/Rb9GL.png

score 0 · Answer 1 · answered Dec 30 '19 at 22:35

Stpping at 13 is strange, if the params you are using are randomized. You said it's not a memory thing, but I recently worked with Keras and I saw my computer grind quite a bit. You said this is the entire for loop? It returns a DataFrame at the end. Where is this df being saved, if the loop is going to continue to run? Maybe paste the whole function.

My guess is that returning a DataFrame can be one of the issues. Could you try outputting results to a file instead? Also, I'd suggest not defining, compiling, fitting and testing the models in one function. That's an overload!

One function to define.
One function to compile and fit.
One function to test.
Test output goes into a file (txt, csv, json, etc)

Is there a more effective way of monitoring my memory usage than just watching the heap in my console? It looks like people use something like Guppy, but I don't have much experience with this. Is there a recommended resource that I could look into? Thanks so much. — Daniel Smith, Jan 01 '20 at 04:53

Why does Python crash at exactly the same point when training several Keras NNs?

1 Answers1

Linked