0

I want to calculate group fairness metrics using AIF360. This is a sample dataset and model, in which gender is the protected attribute and income is the target.

import pandas as pd
from sklearn.svm import SVC
from aif360.sklearn import metrics

df = pd.DataFrame({'gender': [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
                  'experience': [0, 0.1, 0.2, 0.4, 0.5, 0.6, 0, 0.1, 0.2, 0.4, 0.5, 0.6],
                  'income': [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1]})

clf = SVC(random_state=0).fit(df[['gender', 'experience']], df['income'])

y_pred = clf.predict(df[['gender', 'experience']])

metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)

It throws out:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-609692e52b2a> in <module>
     11 y_pred = clf.predict(X)
     12 
---> 13 metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)

TypeError: statistical_parity_difference() got an unexpected keyword argument 'y_true'

Similar error for disparate_impact_ratio. It seems the data needs to be entered differently, but I have not been able to figure out how.

Reveille
  • 3,779
  • 2
  • 22
  • 44

2 Answers2

1

Remove the y_true= and y_pred= characters in the function call and retry. As one can see in the documentation, *y within the function prototype stands for arbitrary number of arguments (see this post). So this is the most logical guess.

In other words, y_true and y_pred are NOT keyword arguments. So they cannot be passed with their names. Keyword arguments are expressed as **kwargs within a function prototype.

Bill Huang
  • 4,292
  • 1
  • 12
  • 29
  • Thanks. It resolved the current error, but now it throws `ValueError: Some of the attributes provided are not present in the dataset`, which makes sense given the `df ["gender"]` is not provided to the function. – Reveille Oct 23 '20 at 21:03
  • 1
    I'd bet the problem is now in the data, as the error message is now a ValueError on the dataset property. It is now unrelated to the function call itself. – Bill Huang Oct 23 '20 at 22:49
1

This can be done by transforming the data to a StandardDataset followed by calling the fair_metrics function below:

from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

dataset = StandardDataset(df, 
                          label_name='income', 
                          favorable_classes=[1], 
                          protected_attribute_names=['gender'], 
                          privileged_classes=[[1]])

def fair_metrics(dataset, y_pred):
    dataset_pred = dataset.copy()
    dataset_pred.labels = y_pred
        
    attr = dataset_pred.protected_attribute_names[0]
    
    idx = dataset_pred.protected_attribute_names.index(attr)
    privileged_groups =  [{attr:dataset_pred.privileged_protected_attributes[idx][0]}] 
    unprivileged_groups = [{attr:dataset_pred.unprivileged_protected_attributes[idx][0]}] 

    classified_metric = ClassificationMetric(dataset, dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)

    metric_pred = BinaryLabelDatasetMetric(dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)

    result = {'statistical_parity_difference': metric_pred.statistical_parity_difference(),
             'disparate_impact': metric_pred.disparate_impact(),
             'equal_opportunity_difference': classified_metric.equal_opportunity_difference()}
        
    return result


fair_metrics(dataset, y_pred)

which returns the correct results (image ref):

{'statistical_parity_difference': -0.6666666666666667,
 'disparate_impact': 0.3333333333333333,
 'equal_opportunity_difference': 0.0}

enter image description here

Reveille
  • 3,779
  • 2
  • 22
  • 44