4

posting here is my last resort cause I can't find anything like it online. I trained a model to classify embeddings into categories (a simple three layer Dense neural network).

I'm using pandas for the input of the network, and it's performing quite well on the test set. The thing is if I process the whole set and read for example the prediction for the first element, and after that I select only the first row of the test set and pass it through the network, the prediction comes up with different values. What could be happening? this is the code in case my explanation was confusing:

i = 522
y_pred = model.predict(X_test)
y_pred_2 = model.predict(X_test.iloc[[i]])

print (f'{np.argmax(y_pred[i])} {np.argmax(y_pred_2)}')

output: 8 5

It's like my model is behaving differently if it processes the whole test set in a single run than if it processes a single row at a time.

The output shapes of y_pred and y_pred_2 are (603, 10) and (1, 10) respectively, where 10 is the number of classes I have.

Some example values for both predictions, with an arbitrary i:

y_pred[i]: array([1.3353945e-02, 2.8374636e-09, 1.4435661e-08, 3.4135045e-18,
   7.7986561e-02, 3.7737598e-03, 2.0284578e-10, 2.7154891e-03,
   9.0203673e-01, 1.3346069e-04], dtype=float32)

y_pred_2 = array([[1.1702824e-16, 1.6781385e-37, 2.5281618e-33, 0.0000000e+00, 2.3075200e-09, 1.0000000e+00, 9.9125501e-35, 6.2606384e-22, 5.8689110e-14, 2.3486194e-24]], dtype=float32)

Ramiro
  • 93
  • It's not clear why you expect the same results for these 2 different inputs. Can you elaborate on that? – Sycorax Nov 05 '21 at 20:15
  • 1
    Hi @Sycorax, the inputs are the same, the only difference is that in the first case I'm processing the whole table and reading only one of the predictions, while in the second case I'm processing only one row of the table – Ramiro Nov 05 '21 at 20:49
  • What is the model? Does it include dropout, batch norm, or similar layers? – Sycorax Nov 05 '21 at 21:50
  • Model has three Dense layers and one Dropout layer after the first one. Do you think it could be that? – Ramiro Nov 05 '21 at 22:05
  • Update: it was the dropout layers haha, I thought those only worked while training and didn't affect the result while predicting stuff. – Ramiro Nov 05 '21 at 22:47
  • Strange, because that is what the documentaion says: "Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically" https://keras.io/api/layers/regularization_layers/dropout/ I am having the same problem though. So weird. – Dr_Zaszuś Jun 10 '22 at 07:55
  • 1
    Hey @Dr_Zaszuś, I kept digging at it after I posted this question and ultimately found out that my actual problem wasn't the dropout layer but the fact that I wasn't reshaping the inputs correctly, so the input data when I trained the whole batch was actually scrambled. When I call "model.predict" it first calls a function of mine that converts the dataframe to numpy as a first step and in that process it scrambled the data. So if you do some additional pre processing before feeding the network, check that you're not reshaping incorrectly as I was! – Ramiro Jun 11 '22 at 14:11

1 Answers1

5

In comments, OP writes that the culprit was the dropout layer. Dropout randomly sets parts of the network to zero, which creates non-deterministic behavior. The correct practice is to only use dropout during training. When it's time to make predictions, turn off dropout first.

Sycorax
  • 90,934