0

I am trying to rename columns in multiple dataframes and convert those columns to an integer. This is the code I have:

def clean_col(df,col_name):
    df.reset_index(inplace=True)
    df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
    df[col_name]=df[col_name].apply(lambda x: int(x))

I have a dictionary of the dataframe names and the new name of the columns:

d = {
    all_df: "all",
    coal_df: "coal",
    liquids_df: "liquids",
    coke_df: "coke",
    natural_gas_df: "natural_gas",
    nuclear_df: "nuclear",
    hydro_electricity_df: "hydro",
    wind_df: "wind",
    utility_solar_df: "utility_solar",
    geothermal_df: "geo_thermal",
    wood_biomass_df: "biomass_wood",
    biomass_other_df: "biomass_other",
    other_df: "other",
    solar_all_df: "all_solar",
}
for i, (key, value) in enumerate(d.items()):
    clean_col(key, value)

And this is the error I am getting:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Any help would be appreciated

a11
  • 2,282
  • 1
  • 15
  • 39
Shawn Jamal
  • 153
  • 7

3 Answers3

2

Using globals (or locals).

import pandas as pd
import io

data1 = '''id,name
1,A
2,B
3,C
4,D
'''
data2 = '''id,name
1,W
2,X
3,Y
4,Z
'''

df1 = pd.read_csv(io.StringIO(data1))
df2 = pd.read_csv(io.StringIO(data2))


def clean_function(dfname, col_name):
    df = globals()[dfname]   # also see locals()
    df.rename(columns={df.columns[0]:'NewID', df.columns[1]: col_name},inplace=True)
    return df

mydict = { 'df1': 'NewName', 'df2': 'AnotherName'}

for k,v in mydict.items():
    df = clean_function(k,v)
    print(df)

Output:

   NewID NewName
0      1       A
1      2       B
2      3       C
3      4       D
   NewID AnotherName
0      1           W
1      2           X
2      3           Y
3      4           Z
S2L
  • 1,319
  • 1
  • 13
  • 18
1

You are on the right track by using a dictionary to link your old and new column names. If you loop through your list of dataframes; then loop through your new column names dictionary, that will work.

df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df2 = pd.DataFrame({"A": [1, 2, 3], "D": [4, 5, 6], "F": [4, 5, 6]})
all_dfs = [df1, df2]

display(df1)
display(df2)

enter image description here

d = {
    "A": "aaaaa",
    "D": "ddddd",
}
for df in all_dfs:
    for col in d:
        if col in df.columns:
            df.rename(columns={col: d.get(col)}, inplace=True)

display(df1)
display(df2)

enter image description here

a11
  • 2,282
  • 1
  • 15
  • 39
  • @ShawnJamal great, glad it worked. I just cleaned up the loop a bit so it is a little cleaner, if that is of interest – a11 Aug 15 '21 at 03:18
0

I just created two different lists and then iterated through a list of the dataframes and a list of the new column names

def clean_col(df,col_name):
    df.reset_index(inplace=True)
    df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
    df[col_name]=df[col_name].apply(lambda x: int(x))
list_df=[all_df, coal_df, liquids_df, coke_df, natural_gas_df, nuclear_df, hydro_electricity_df, wind_df, utility_solar_df, geothermal_df, wood_biomass_df, biomass_other_df, other_df, solar_all_df]                
list_col=['total', 'coal' , 'liquids' , 'coke' , 'natural_gas', 'nuclear', 'hydro','wind','utility_solar', 'geo_thermal', 'biomass_wood',   'biomass_other', 'other','all_solar']
for a,b in zip(list_df,list_col):
    clean_col(a,b)
Shawn Jamal
  • 153
  • 7