0

I have several ranking distributions and would, for each one, like to fit a [Zipf distribution][1], and estimate the goodness of fit relative to some standard benchmark.

With the Matlab code below, I tried to do a sanity check and see if a "textbook" Zipf rank distribution passes the statistical test. Clearly something is wrong, as it does not. If that doesn't, nothing will!

Using the Kolmogorov-Smirnoff test, or the Anderson-Darling test with a custom-built (non-normal) distribution in place of the chi-squared test does not change this.

% Define some empirical frequency distribution
x = 1:10;
freq = randn(1,10); % textbook zipf!

% Define the Zipf distribution alpha = 1.5; % Shape parameter, 1.5 is apparently a good all-round value to start with N = sum(freq); % Total number of observations k = 1:length(x); % Rank of each observation zipf_dist = N ./ (k.^alpha); % Compute the Zipf distribution

% Plot our empirical frequency distribution alongside the Zipf distribution figure; bar(x, freq); % or freq\N hold on; plot(x, zipf_dist, 'r--'); xlabel('Rank'); ylabel('Frequency'); legend('Observed', 'Zipf');

% Compute the goodness of fit using the chi-squared test expected_freq = zipf_dist .* N; chi_squared = sum((freq - expected_freq).^2 ./ expected_freq); dof = length(freq) - 1; p_value = 1 - chi2cdf(chi_squared, dof);

% Display the results fprintf('Chi-squared statistic = %.4f\n', chi_squared); fprintf('p-value = %.4f\n', p_value); if p_value < 0.05 fprintf('Conclusion: The data is not from a Zipf distribution.\n'); else fprintf('Conclusion: The data is from a Zipf distribution.\n'); end

z8080
  • 2,370

0 Answers0