101

I am doing some geocoding work that I used selenium to screen scrape the x-y coordinate I need for address of a location, I imported an xls file to panda dataframe and want to use explicit loop to update the rows which do not have the x-y coordinate, like below:

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        row.wgs1984_latitude = dict_temp['lat']
        row.wgs1984_longitude = dict_temp['long']

I have read Why doesn't this function "take" after I iterrows over a pandas DataFrame? and am fully aware that iterrow only gives us a view rather than a copy for editing, but what if I really to update the value row by row? Is lambda feasible?

Community
  • 1
  • 1
lokheart
  • 22,255
  • 36
  • 92
  • 166
  • 2
    I *think* you can do `rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']`, i.e. use the index to get at the right section of the original dataframe. Let me know if that doesn't work and I'll try work up a proper answer. – Marius Aug 25 '14 at 03:39
  • @Marius looks like it is working, thanks, another alternative is to convert the dataframe into dict and use ordinary for-loop to do the modification – lokheart Aug 25 '14 at 03:43
  • This answer did not work for me (why on Earth not...), but this did: https://stackoverflow.com/questions/23330654/update-a-dataframe-in-pandas-while-iterating-row-by-row/29262040?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – Pablo May 25 '18 at 14:57

2 Answers2

182

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe:

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
        rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']

In my experience, this approach seems slower than using an approach like apply or map, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.

Marius
  • 54,802
  • 15
  • 100
  • 97
  • 2
    This is not strictly true, they may not be copies. Specifically if the dtype is the same for all cols – Andy Hayden Aug 25 '14 at 07:51
  • 2
    This gave a copy warning for me. Ended up using: https://stackoverflow.com/questions/33518124/how-to-apply-a-function-on-every-row-on-a-dataframe – Peter Ehrlich Dec 26 '17 at 18:53
  • Don't you get the index back anyway? See @jpp's answer to [Pandas for loop over dataframe gives too many values to unpack](https://stackoverflow.com/a/50243204/575530). The error I get from the code in this answer is `ValueError: too many values to unpack (expected 2)` – dumbledad Feb 12 '19 at 14:24
  • 1
    I found that I had to run `df = df.reset_index()` to get this working without an index error because I had chopped and sliced my dataframe. – Gamma032 Dec 24 '21 at 06:54
2

Another way based on this question:

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        
        rche_df.at[index, 'wgs1984_latitude'] = dict_temp['lat']
        rche_df.at[index, 'wgs1984_longitude'] = dict_temp['long']

This link describe difference between .loc and .at. Shortly, .at faster than .loc.

Alireza Mazochi
  • 633
  • 1
  • 6
  • 18