0

I am attempting to create a neural network that can learn to evaluate chess positions. I'm following along with this paper and trying to recreate its results. The general idea is to have the NN attempt to replicate Stockfish's evaluation function based on the board state. I am representing the input to the network as a vector of size (8,8,12) = 768. Each square on the board has a vector of length 12 - since there are 6 unique pieces and 2 sides. Each slot in the vector represents a piece - the value is 1 if it belongs to the player who should move in the position, -1 when it belongs to the opposing player, and 0 otherwise. The output is compressed in between [0, 1], with 0 representing a win for black and 1 representing a win for white.

The issue I'm having is that my loss doesn't get even close to the loss that the authors of the paper achieved. They were able to get an MSE of roughly .002, whereas my network doesn't drop below an MSE of .06. The only issue I can think of is that I have less training data than them (1 million positions vs 3 million). Here is my training code:

train_directory = '...'
test_directory = '...'

def generate_batches(file_list, batch_size, file_directory): cnt = 0 while True: file = file_list[cnt] last_filename = file cnt = (cnt + 1) % len(file_list) data = pd.read_csv(file_directory + file)

    x = np.array(data['Positions'])
    Y = np.array(data['Evaluations'])

    X = []

    for pos in x:
        curr = []
        for num in pos:
            if(num == '-1'):
                curr.append(-1)
            elif(num == '1'):
                curr.append(1)
            elif(num == '0'):
                curr.append(0)
        X.append(curr)

    X = np.array(X)

    for idx in range(0, X.shape[0], batch_size):
        X_loc = X[idx:(idx + batch_size)]
        Y_loc = Y[idx:(idx + batch_size)]

        yield X_loc, Y_loc




train_filenames = [] for file in os.listdir(train_directory): if(file.endswith('.csv')): train_filenames.append(file)

test_filenames = [] for file in os.listdir(test_directory): if(file.endswith('.csv')): test_filenames.append(file)

random.shuffle(train_filenames) random.shuffle(test_filenames)

train_generator = generate_batches(train_filenames, 10000, train_directory) test_generator = generate_batches(test_filenames, 10000, test_directory)

model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(2048, input_shape=(768,), activation='elu')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.Dense(2048, activation='elu')) model.add(tf.keras.layers.BatchNormalization()) model.add(tf.keras.layers.Dense(2048, activation='elu')) model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

opt = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.7, nesterov=True) stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=100) save = tf.keras.callbacks.ModelCheckpoint(filepath='weights.h5', save_weights_only=True, save_best_only=True) model.compile(optimizer=opt, loss='mse') model.load_weights('weights.h5') model.save("my_model") model.fit(steps_per_epoch=len(train_filenames), workers=1, x=train_generator, max_queue_size=32, epochs=100000, callbacks=[stop, save], validation_data=test_generator, validation_steps=len(test_filenames), batch_size=256)

This is the paper's description of their neural network:

"A three hidden layer deep perceptron with 2048 hidden units per layer has been used. Each layer is activated by the Elu activation function and the SGD training parameters have been initialized as follows: η = 0.001; ε = 1e − 0.8 in combination with a Nesterov Momentum of 0.7. In addition to that Batch Normalization between all the hidden layers and Minibatches of 256 samples have been used"

Not really sure what else to try - would appreciate any guidance.

  • what are the targets like? – lejlot Jul 11 '22 at 20:54
  • @lejlot Sorry, I don't fully understand your question. What do you mean by targets? – achandra03 Jul 12 '22 at 02:33
  • "The only issue I can think of is that I have less training data than them (1 million positions vs 3 million)." Well, yeah, that seems like a pretty plausible explanation for the discrepancy! This post has a number of suggestions for achieving a lower loss on training data https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn/352037#352037 but none of them will be able to overcome the data deficiency. – Sycorax Jul 12 '22 at 14:20
  • targets = y, how do ys look like – lejlot Jul 12 '22 at 19:10
  • @lejlot Single scalar value - just sigmoid applied to the output of stockfish. – achandra03 Jul 12 '22 at 19:44

0 Answers0