0

I am trying to find a way to calculate perplexity of a language model of multiple 3-word examples from my test set, or perplexity of the corpus of the test set. As the test set, I have a paragraph which I've split into 3-word examples like this: if the corpus is "Hello my name is Jack.", my 3-word examples would be "Hello my name", "my name is" and "name is Jack". The first 2 words are fed into the language model and the third is the correct label.

I've tried to compile a function which would calculate perplexities from the following links: https://stackoverflow.com/questions/44697318/how-to-implement-perplexity-in-keras

  • here I've chosen this function to calculate perplexity of individual examples
from keras import backend as K
def perplexity(y_true, y_pred):
    """
    The perplexity metric. Why isn't this part of Keras yet?!
    https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow
    https://github.com/keras-team/keras/issues/8267
    """
    #cross_entropy = K.sparse_categorical_crossentropy(y_true, y_pred)
    cross_entropy = K.categorical_crossentropy(y_true, y_pred)
    perplexity = K.exp(cross_entropy)
    return perplexity
  • my labels are one-hot and I've found somewhere that for one-hot labels, categorical_crossentropy should be used instead of sparse_categorical_crossentropy. This function seems to work fine for individual examples (weird examples get higher perplexity, while normal examples get lower).

According to this question I've tried to compile a function for final perplexity, which would give me only one number instead of a number for each individual example: How to find the perplexity of a corpus

This is the code I've come up with:

def total_perplexity(perplexities, N):
    # Perplexities is tf.Tensor
    # N is vocab size
    log_perp = K.log(perplexities)
    sum_perp = K.sum(log_perp)
    divided_perp = sum_perp / N
    return np.exp(-1 * sum_perp)
  • here perplexities is the outcome of perplexity(y_true, y_pred) function. However, for different examples - some of which make sense and some of which are total gibberish, the final perplexity tends to get towards 1 for smaller texts and tends to go to 0 as the size of the corpus grows.

What am I doing wrong?

Alternatively, do you recommend any other metrics for evaluating my language model?

EDIT: I've tried also a different approach, taking the true label's prediction probability (model.predict(...)) as the word's probability and putting that into the formula for implementing the corpus perplexity. Here is the new function:

def final_perplexity(y_true, y_pred, vocab_size):
    # Perplexities is tf.Tensor
    # N is vocab size
    one_hot_indices = [np.where(r==1)[0][0] for r in y_true]
    one_hot_probabilities = y_pred[range(len(one_hot_indices)),one_hot_indices]
    log_perp = K.log(one_hot_probabilities)
    divided_perp = log_perp / vocab_size
    sum_perp = K.sum(divided_perp)
    return np.exp(-1 * sum_perp)

Now the perplexity goes into infinity for larger corpus. How do I solve this?

1 Answers1

0

The logarithm you take is probably for the wrong basis. The log-exp-trick relies on 'changing nothing' in a mathematical meaning. For example:

$$ \Leftrightarrow \log_{10}(10^4) = 4 \\ 10^{log_{10}(10^4)} = 10000 $$

But this only works with the right base: $$a^{log_{a}(b)} = b$$

If you take 2 to the power of something your logarithm should be with respect to the basis of 2. However, my guess is that the log function of Keras is taking the natural logarithm (with basis of Euler's number instead).

To get the logarithm of base 2 you could do something like that:

def log2(x):
    return K.log(x) / K.log(2)
thsis
  • 1
  • Thank you, I've taken it into account. It helps a little, but I'm not sure it entirely fixes the problem :) – Cavarica2 Nov 12 '20 at 11:12