0

In the simple code below, I'd like to understand how pd.DataFrame decides if the column is present in the DataFrame. How can I make all 3 print statements return the data [1,2,3]?

class ColumnName:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return self.name

    def __str__(self):
        return self.name

    def __hash__(self):
        return hash(self.name)

label1 = ColumnName('foo')
label2 = ColumnName('foo')

import pandas as pd
df = pd.DataFrame({
    label1: [1,2,3]
})

print(df[label1])  # OK
print(df[label2])  # KeyError
print(df[ColumnName('foo')])  # KeyError
s5s
  • 9,821
  • 18
  • 67
  • 110
  • 2
    Try adding an `__eq__()` method to `ColumnName` to compare if `.name` values are equal, and you should get results you're looking for. Generally when you define `__hash__()` you also need to define `__eq__()`. (see this [recent question](https://stackoverflow.com/questions/69779929/cache-object-instances-with-lru-cache-and-hash) about a similar issue) – sj95126 Nov 09 '21 at 14:14

0 Answers0