Questions tagged [gini]

The Gini coefficient is used to measure income inequality and discriminatory power of a classifier. If everybody has the same income, Gini coefficient = 0. If one person has all the income, Gini coefficient = 1. All other values are somewhere in between.

The Gini coefficient is used to measure income inequality and the discriminatory power of a classifier.

In income distribution, the Gini index is best explained using Lorenz curve, which shows proportion of population ordered by income on x axis, and the proportion of income these people have on y axis. E.g., a point (0.3, 0.05) on Lorenz curve means that the poorest 30% of people receive 5% of the total income in the economy. See below for a Lorenz curve from Wikipedia.

cumulative share of people vs. cumulative share of income

Gini coefficient, or Gini index, is double the grey area between the line of equality and the actual Lorenz curve. If everybody has the same income, Gini coefficient = 0. If one person has all the income, Gini coefficient = 1. All other values are somewhere in between.

There are various equivalent expressions for the Gini coefficient based on i.i.d. data:

\begin{align} G &= \frac1{\bar y n(n-1)} \sum_{i\neq j} |y_i - y_j| \\[10pt] &= \frac{1}{n}\left ( n+1 - 2 \left ( \frac{\sum\limits_{i=1}^n \; (n+1-i)y_i}{\sum\limits_{i=1}^n y_i} \right ) \right ) \\[10pt] &= \frac 1{\bar y} {\rm Cov}(F_Y(y), Y) \end{align}

where $\bar y$ is the mean income. The first expression clearly shows an interpretation of Gini coefficient as the average difference in incomes across the population (if you were taken out and thrown back into this population into a random position, by how much would your income change?), as well as provides the kernel of the second order $U$-statistic, while the last one shows relation of Gini coefficient with moment of the distribution allowing for generalization to non-i.i.d. data (such as complex survey data).

In classification applications, Gini coefficient is analogous to the area under ROC curve, with $$ AUC = (G+1)/2 $$

See also:

Wikipedia article

A. Sen. On economic inequality.

119 questions
11
votes
4 answers

Trying to compute Gini index on StackOverflow reputation distribution?

I'm trying to compute the Gini index on the SO reputation distribution using SO Data Explorer. The equation I'm trying to implement is this:…
yossale
  • 213
10
votes
4 answers

Basic Gini impurity derivation

From wikipedia: https://en.wikipedia.org/wiki/Decision_tree_learning I am unable to get my head around two of the steps: The first equation: $f_i(1 - f_i)$. This does not immediately become apparent as the "probability of being chosen times…
2
votes
1 answer

Gini Index of Vector with Negative Values

I would like to use the Gini Index to measure the sparsity in a signal. From my research so far it seems that the Gini Index is defined for a vector of positive values. My vector however also contains some (slightly) negative values. Is there a…
thebear
  • 21
2
votes
0 answers

How to compare two Gini results?

I'm comparing two Gini results from two distributions of income, in which one income is a source of the other. Let, say that Gini_1 = 0,770 and Gini_2 = 0,373. Clearly, Gini_2 shows a less inequal income distribution than Gini_1. However, what more…
1
vote
1 answer

Gini index/criterion/impurity/coefficient according to Breiman, Sen, Bishop, and Duda

once again a student (me) is lost in the sea of Gini... I am currently trying to figure out, where the Gini based formula for feature selection proposed by Cehovin and Bosnic [1] comes from: $Gini(A)=\sum_{j}p(j)\sum_{k}p(k|j)^2 - \sum_{k}p(k)^2…
Boern
  • 173
0
votes
1 answer

Gini values are not corresponding with Lorenz Curve area

I'm using Gini coefficient and Lorenz Curve plots to show the accumulation of beneficiaries in ecosystem services (ES) supply points, in R. I classify ES into three categories and calculate Gini and Lorenz curve for each category. However, the…