Selecting a row of pandas series/dataframe by integer index

Question

I am curious as to why df[2] is not supported, while df.ix[2] and df[2:3] both work.

In [26]: df.ix[2]
Out[26]: 
A    1.027680
B    1.514210
C   -1.466963
D   -0.162339
Name: 2000-01-03 00:00:00

In [27]: df[2:3]
Out[27]: 
                  A        B         C         D
2000-01-03  1.02768  1.51421 -1.466963 -0.162339

I would expect df[2] to work the same way as df[2:3] to be consistent with Python indexing convention. Is there a design reason for not supporting indexing row by single integer?

`df.ix[2]` does not work - at least not in `pandas version '0.19.2'` — Zahra, May 04 '17 at 19:54
To see the difference between row and column selection via the indexing operator, `[]`, [see this answer below](https://stackoverflow.com/a/46920450/3707607). Also **NEVER USE `.ix`, it is deprecated** — Ted Petrou, Nov 05 '17 at 19:43
Not sure if it helps, but if just reading/viewing is intended, one can use `df.values[n]` to view the n'th row. — 0xc0de, Feb 05 '21 at 07:59

score 733 · Accepted Answer · edited Jun 05 '18 at 12:40

733

echoing @HYRY, see the new docs in 0.11

http://pandas.pydata.org/pandas-docs/stable/indexing.html

Here we have new operators, .iloc to explicity support only integer indexing, and .loc to explicity support only label indexing

e.g. imagine this scenario

In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))

In [2]: df
Out[2]: 
          A         B
0  1.068932 -0.794307
2 -0.470056  1.192211
4 -0.284561  0.756029
6  1.037563 -0.267820
8 -0.538478 -0.800654

In [5]: df.iloc[[2]]
Out[5]: 
          A         B
4 -0.284561  0.756029

In [6]: df.loc[[2]]
Out[6]: 
          A         B
2 -0.470056  1.192211

[] slices the rows (by label location) only

edited Jun 05 '18 at 12:40

marc_aragones

4,104
4
24
38

answered Apr 19 '13 at 12:20

Jeff

117,982
20
211
179

13

What if you wanted the 2nd AND 3rd AND 4th row? – FaCoffee Nov 07 '16 at 20:36
4

you can simply pass a list of indexers; docs are pointed to above – Jeff Nov 07 '16 at 20:37
2

Does anyone have a justification for these names? I find these hard to remember because I'm not sure why `iloc` is rows and `loc` is labels. – kilojoules Apr 05 '17 at 17:58
3

@kilojoules `.iloc` looks things up by their order in the index (e.g. `.iloc[[2]]`) is the second "row" in `df`. That row happens to be at _index_ location `4`. `.loc` looks them up by their index value. So maybe "iloc" is like "i" as in `A[i]`? :) – Jim K. Nov 07 '17 at 21:47
1

@Jeff - this works great, but what happens when you want to duplicate a row from your data frame, such as `df.loc[-1] = df.iloc[[0]]`, and insert that? The frame comes with an added index column giving error `ValueError: cannot set a row with mismatched columns` (see https://stackoverflow.com/questions/47340571/adding-duplicate-row-to-dataframe-cannot-set-a-row-with-mismatched-columns) – user3871 Nov 16 '17 at 23:14
Am I correct in believing that `df.iloc[[2]]`returns a dataframe and `df.iloc[2]`returns a `pandas.core.series.Series`? Why I should use one over the other? – robertspierre Dec 13 '18 at 15:21
please explain what a label is. – Arrow_Raider Jul 01 '20 at 21:08
1

@kilojoules loc stands for 'location', iloc stands for 'integer location' – L H Aug 13 '20 at 03:47
@Arrow_Raider I believe it is the value of the index for a given row. In this case they are numbers but it could just as well be a string. – lakeside Nov 10 '21 at 18:38

Ted Petrou · Answer 2 · 2017-11-05T19:41:40.397

The primary purpose of the DataFrame indexing operator, `[]` is to select columns.

When the indexing operator is passed a string or integer, it attempts to find a column with that particular name and return it as a Series.

So, in the question above: df[2] searches for a column name matching the integer value 2. This column does not exist and a KeyError is raised.

The DataFrame indexing operator completely changes behavior to select rows when slice notation is used

Strangely, when given a slice, the DataFrame indexing operator selects rows and can do so by integer location or by index label.

df[2:3]

This will slice beginning from the row with integer location 2 up to 3, exclusive of the last element. So, just a single row. The following selects rows beginning at integer location 6 up to but not including 20 by every third row.

df[6:20:3]

You can also use slices consisting of string labels if your DataFrame index has strings in it. For more details, see this solution on .iloc vs .loc.

I almost never use this slice notation with the indexing operator as its not explicit and hardly ever used. When slicing by rows, stick with .loc/.iloc.

Trying to add rows to another dataframe using indxeing operator but the other dataframe remains empty. Why? — FindOutIslamNow, Sep 10 '18 at 10:10

score 30 · Answer 3 · answered Apr 19 '13 at 07:33

30

You can think DataFrame as a dict of Series. df[key] try to select the column index by key and returns a Series object.

However slicing inside of [] slices the rows, because it's a very common operation.

You can read the document for detail:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

answered Apr 19 '13 at 07:33

HYRY

89,863
23
181
185

Thank you for the hint. Funny, this kind of thing is what still makes question pandas at times. Adding exceptions to the behavior in certain situations, .. to me it feels like sacrificing consistency for a little bit of convenience. – Carl Berger May 13 '20 at 10:14

score 16 · Answer 4 · answered May 23 '16 at 06:53

16

To index-based access to the pandas table, one can also consider numpy.as_array option to convert the table to Numpy array as

np_df = df.as_matrix()

and then

np_df[i]

would work.

answered May 23 '16 at 06:53

Pavel Prochazka

593
6
13

20

that defeats the whole purpose of the dataframes indexes and everything else pandas offers – Fábio Dias Nov 29 '17 at 16:56

score 7 · Answer 5 · answered Apr 19 '13 at 10:47

You can take a look at the source code .

DataFrame has a private function _slice() to slice the DataFrame, and it allows the parameter axis to determine which axis to slice. The __getitem__() for DataFrame doesn't set the axis while invoking _slice(). So the _slice() slice it by default axis 0.

You can take a simple experiment, that might help you:

print df._slice(slice(0, 2))
print df._slice(slice(0, 2), 0)
print df._slice(slice(0, 2), 1)

score 7 · Answer 6 · edited Apr 13 '17 at 10:56

7

you can loop through the data frame like this .

for ad in range(1,dataframe_c.size):
    print(dataframe_c.values[ad])

edited Apr 13 '17 at 10:56

Derlin

9,003
2
25
47

answered Mar 19 '16 at 08:15

user1401491

411
4
4

Marc Steffen · Answer 7 · 2021-10-05T23:31:06.770

1

I would normally go for .loc/.iloc as suggested by Ted, but one may also select a row by transposing the DataFrame. To stay in the example above, df.T[2] gives you row 2 of df.

edited Oct 05 '21 at 23:31

answered Jan 24 '21 at 00:40

Marc Steffen

75
1
7

Selecting a row of pandas series/dataframe by integer index

7 Answers7

The primary purpose of the DataFrame indexing operator, `[]` is to select columns.

The DataFrame indexing operator completely changes behavior to select rows when slice notation is used

Linked

Related

Selecting a row of pandas series/dataframe by integer index

7 Answers7

The primary purpose of the DataFrame indexing operator, [] is to select columns.

The DataFrame indexing operator completely changes behavior to select rows when slice notation is used

Linked

Related

The primary purpose of the DataFrame indexing operator, `[]` is to select columns.