I am attempting to create a neural network that can learn to evaluate chess positions. I'm following along with this paper and trying to recreate its results. The general idea is to have the NN attempt to replicate Stockfish's evaluation function based on the board state. I am representing the input to the network as a vector of size (8,8,12) = 768. Each square on the board has a vector of length 12 - since there are 6 unique pieces and 2 sides. Each slot in the vector represents a piece - the value is 1 if it belongs to the player who should move in the position, -1 when it belongs to the opposing player, and 0 otherwise. The output is compressed in between [0, 1], with 0 representing a win for black and 1 representing a win for white.
The issue I'm having is that my loss doesn't get even close to the loss that the authors of the paper achieved. They were able to get an MSE of roughly .002, whereas my network doesn't drop below an MSE of .06. The only issue I can think of is that I have less training data than them (1 million positions vs 3 million). Here is my training code:
train_directory = '...'
test_directory = '...'
def generate_batches(file_list, batch_size, file_directory):
cnt = 0
while True:
file = file_list[cnt]
last_filename = file
cnt = (cnt + 1) % len(file_list)
data = pd.read_csv(file_directory + file)
x = np.array(data['Positions'])
Y = np.array(data['Evaluations'])
X = []
for pos in x:
curr = []
for num in pos:
if(num == '-1'):
curr.append(-1)
elif(num == '1'):
curr.append(1)
elif(num == '0'):
curr.append(0)
X.append(curr)
X = np.array(X)
for idx in range(0, X.shape[0], batch_size):
X_loc = X[idx:(idx + batch_size)]
Y_loc = Y[idx:(idx + batch_size)]
yield X_loc, Y_loc
train_filenames = []
for file in os.listdir(train_directory):
if(file.endswith('.csv')):
train_filenames.append(file)
test_filenames = []
for file in os.listdir(test_directory):
if(file.endswith('.csv')):
test_filenames.append(file)
random.shuffle(train_filenames)
random.shuffle(test_filenames)
train_generator = generate_batches(train_filenames, 10000, train_directory)
test_generator = generate_batches(test_filenames, 10000, test_directory)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2048, input_shape=(768,), activation='elu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2048, activation='elu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2048, activation='elu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
opt = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.7, nesterov=True)
stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)
save = tf.keras.callbacks.ModelCheckpoint(filepath='weights.h5', save_weights_only=True, save_best_only=True)
model.compile(optimizer=opt, loss='mse')
model.load_weights('weights.h5')
model.save("my_model")
model.fit(steps_per_epoch=len(train_filenames), workers=1, x=train_generator, max_queue_size=32, epochs=100000, callbacks=[stop, save], validation_data=test_generator, validation_steps=len(test_filenames), batch_size=256)
This is the paper's description of their neural network:
"A three hidden layer deep perceptron with 2048 hidden units per layer has been used. Each layer is activated by the Elu activation function and the SGD training parameters have been initialized as follows: η = 0.001; ε = 1e − 0.8 in combination with a Nesterov Momentum of 0.7. In addition to that Batch Normalization between all the hidden layers and Minibatches of 256 samples have been used"
Not really sure what else to try - would appreciate any guidance.