16

What is the suggested way to iterate over the rows in pandas like you would in a file? For example:

LIMIT = 100
for row_num, row in enumerate(open('file','r')):
    print (row)
    if row_num == LIMIT: break

I was thinking to do something like:

for n in range(LIMIT):
    print (df.loc[n].tolist())

Is there a built-in way to do this though in pandas?

David542
  • 101,766
  • 154
  • 423
  • 727

6 Answers6

32

Hasn't anyone answered the simple solution?

for row in df.head(5).itertuples():
    # do something

Take a peek at this post.

knh190
  • 2,514
  • 1
  • 14
  • 28
8

I know others have suggested iterrows but no-one has yet suggested using iloc combined with iterrows. This will allow you to select whichever rows you want by row number:

for i, row in df.iloc[:101].iterrows():
    print(row)

Though as others have noted if speed is essential an apply function or a vectorized function would probably be better.

>>> df
     a    b
0  1.0  5.0
1  2.0  4.0
2  3.0  3.0
3  4.0  2.0
4  5.0  1.0
5  6.0  NaN
>>> for i, row in df.iloc[:3].iterrows():
...     print(row)
... 
a    1.0
b    5.0
Name: 0, dtype: float64
a    2.0
b    4.0
Name: 1, dtype: float64
a    3.0
b    3.0
Name: 2, dtype: float64
>>>
user3062260
  • 1,392
  • 1
  • 21
  • 48
3

You have values, itertuples and iterrows out of which itertuples performs best as benchmarked by fast-pandas.

enter image description here

meW
  • 3,652
  • 6
  • 24
  • 1
    @timgeb perhaps you can show each of the three approaches in code and I can answer your question? – David542 Dec 20 '18 at 16:58
  • friendly ping: @timegb you can edit my answer further if you feel it is incomplete. I tried helping from my end. :) – meW Dec 20 '18 at 16:59
  • 1
    @meW no worries, I could have made myself clearer. Reminding us which method is faster is valuable, but it does not explain how to iterate only the first N rows. – timgeb Dec 20 '18 at 17:10
  • 1
    @timgeb I'll ensure answer completeness from next time :) – meW Dec 20 '18 at 17:13
2

You can use iterools.islice to take the first n items from iterrows:

import itertools
limit = 5
for index, row in itertools.islice(df.iterrows(), limit):
    ...
Joe Halliwell
  • 1,105
  • 6
  • 20
1

Since you said that you want to use something like an if I would do the following:

limit = 2
df = pd.DataFrame({"col1": [1,2,3], "col2": [4,5,6], "col3": [7,8,9]})
df[:limit].loc[df["col3"] == 7]

This would select the first two rows of the data frame, then return the rows out of the first two rows that have a value for the col3 equal to 7. Point being you want to use iterrows only in very very specific situations. Otherwise, the solution can be vectorized.

I don't know what exactly are you trying to achieve so I just threw a random example.

gorjan
  • 4,984
  • 16
  • 37
0

If you must iterate over the dataframe, you should use the iterrows() method:

for index, row in df.iterrows():
    ...
Tim
  • 2,560
  • 1
  • 12
  • 29
  • thanks, can you limit it within the `iterrows()` or do you need to use the `limit` approach? – David542 Dec 20 '18 at 16:53
  • You'll need to use a limit approach in one form or another. Because `iterrows` returns a generator, you can call the `next` method `N` times to take the first N rows. – Tim Dec 20 '18 at 16:54