In the formula of finding the perplexity of a corpus, why is it normalized based on the total number of words?
Why shouldn't be normalized based on number of sentences? If # of sentences is used for normalization, is it valid computation?
Perplexity(C)=N-th root of 1/P(S1,S2..Sn) where N = number of words in the corpus
---- reference: