Neural network loss not dropping to expected levels

Question

I am attempting to create a neural network that can learn to evaluate chess positions. I'm following along with this paper and trying to recreate its results. The general idea is to have the NN attempt to replicate Stockfish's evaluation function based on the board state. I am representing the input to the network as a vector of size (8,8,12) = 768. Each square on the board has a vector of length 12 - since there are 6 unique pieces and 2 sides. Each slot in the vector represents a piece - the value is 1 if it belongs to the player who should move in the position, -1 when it belongs to the opposing player, and 0 otherwise. The output is compressed in between [0, 1], with 0 representing a win for black and 1 representing a win for white.

The issue I'm having is that my loss doesn't get even close to the loss that the authors of the paper achieved. They were able to get an MSE of roughly .002, whereas my network doesn't drop below an MSE of .06. The only issue I can think of is that I have less training data than them (1 million positions vs 3 million). Here is my training code:

train_directory = '...'
test_directory = '...'
def generate_batches(file_list, batch_size, file_directory):
    cnt = 0
    while True:
        file = file_list[cnt]
        last_filename = file
        cnt = (cnt + 1) % len(file_list)
        data = pd.read_csv(file_directory + file)
    x = np.array(data['Positions'])
    Y = np.array(data['Evaluations'])

    X = []

    for pos in x:
        curr = []
        for num in pos:
            if(num == '-1'):
                curr.append(-1)
            elif(num == '1'):
                curr.append(1)
            elif(num == '0'):
                curr.append(0)
        X.append(curr)

    X = np.array(X)

    for idx in range(0, X.shape[0], batch_size):
        X_loc = X[idx:(idx + batch_size)]
        Y_loc = Y[idx:(idx + batch_size)]

        yield X_loc, Y_loc





train_filenames = []
for file in os.listdir(train_directory):
    if(file.endswith('.csv')):
        train_filenames.append(file)
test_filenames = []
for file in os.listdir(test_directory):
    if(file.endswith('.csv')):
        test_filenames.append(file)
random.shuffle(train_filenames)
random.shuffle(test_filenames)
train_generator = generate_batches(train_filenames, 10000, train_directory)
test_generator = generate_batches(test_filenames, 10000, test_directory)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2048, input_shape=(768,), activation='elu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2048, activation='elu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2048, activation='elu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
opt = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.7, nesterov=True)
stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)
save = tf.keras.callbacks.ModelCheckpoint(filepath='weights.h5', save_weights_only=True, save_best_only=True)
model.compile(optimizer=opt, loss='mse')
model.load_weights('weights.h5')
model.save("my_model")
model.fit(steps_per_epoch=len(train_filenames), workers=1, x=train_generator, max_queue_size=32, epochs=100000, callbacks=[stop, save], validation_data=test_generator, validation_steps=len(test_filenames), batch_size=256)

This is the paper's description of their neural network:

"A three hidden layer deep perceptron with 2048 hidden units per layer has been used. Each layer is activated by the Elu activation function and the SGD training parameters have been initialized as follows: η = 0.001; ε = 1e − 0.8 in combination with a Nesterov Momentum of 0.7. In addition to that Batch Normalization between all the hidden layers and Minibatches of 256 samples have been used"

Not really sure what else to try - would appreciate any guidance.

@lejlot Sorry, I don't fully understand your question. What do you mean by targets? — achandra03, Jul 12 '22 at 02:33
"The only issue I can think of is that I have less training data than them (1 million positions vs 3 million)." Well, yeah, that seems like a pretty plausible explanation for the discrepancy! This post has a number of suggestions for achieving a lower loss on training data https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn/352037#352037 but none of them will be able to overcome the data deficiency. — Sycorax, Jul 12 '22 at 14:20
@lejlot Single scalar value - just sigmoid applied to the output of stockfish. — achandra03, Jul 12 '22 at 19:44

Neural network loss not dropping to expected levels

0 Answers0