Testing the goodness of fit of a Zipf distribution

Question

I have several ranking distributions and would, for each one, like to fit a [Zipf distribution][1], and estimate the goodness of fit relative to some standard benchmark.

With the Matlab code below, I tried to do a sanity check and see if a "textbook" Zipf rank distribution passes the statistical test. Clearly something is wrong, as it does not. If that doesn't, nothing will!

Using the Kolmogorov-Smirnoff test, or the Anderson-Darling test with a custom-built (non-normal) distribution in place of the chi-squared test does not change this.

% Define some empirical frequency distribution
x = 1:10;
freq = randn(1,10); % textbook zipf!
% Define the Zipf distribution
alpha = 1.5; % Shape parameter, 1.5 is apparently a good all-round value to start with
N = sum(freq); % Total number of observations
k = 1:length(x); % Rank of each observation
zipf_dist = N ./ (k.^alpha); % Compute the Zipf distribution
% Plot our empirical frequency distribution alongside the Zipf distribution
figure;
bar(x, freq); % or freq\N
hold on;
plot(x, zipf_dist, 'r--');
xlabel('Rank');
ylabel('Frequency');
legend('Observed', 'Zipf');
% Compute the goodness of fit using the chi-squared test
expected_freq = zipf_dist .* N;
chi_squared = sum((freq - expected_freq).^2 ./ expected_freq);
dof = length(freq) - 1;
p_value = 1 - chi2cdf(chi_squared, dof);
% Display the results
fprintf('Chi-squared statistic = %.4f\n', chi_squared);
fprintf('p-value = %.4f\n', p_value);
if p_value < 0.05
    fprintf('Conclusion: The data is not from a Zipf distribution.\n');
else
    fprintf('Conclusion: The data is from a Zipf distribution.\n');
end

Testing the goodness of fit of a Zipf distribution

0 Answers0