Indexing Pandas data frames: integer rows, named columns

Question

Say df is a pandas dataframe.

df.loc[] only accepts names
df.iloc[] only accepts integers (actual placements)
df.ix[] accepts both names and integers:

When referencing rows, df.ix[row_idx, ] only wants to be given names. e.g.

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
                   '1' : np.arange(6)})
df = df.ix[2:6]
print(df)

   1      a
2  2  three
3  3   four
4  4   five
5  5    six

df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

When referencing columns, iloc is prefers integers, not names. e.g.

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1'] does return 2).

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

I realize I can use:

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols) is in fact df.iloc[rows].loc[cols]?

See also [GH 9213](https://github.com/pydata/pandas/issues/9213#issuecomment-72076683), which suggests `df.loc[df.index[0], 'a']`. This has the [advantage of not using chained indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-the-assignment-when-using-chained-indexing-fail), which means it will work when making assignments, whereas `df[['a','b']].iloc[0] = val` would not. — unutbu, Feb 27 '15 at 22:46
doesn't really solve your problem but very good answer here: https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation — JohnE, Aug 15 '17 at 14:19
Or the other way around, too: df.iloc[0, df.columns.get_loc("a")] — Landmaster, Aug 18 '17 at 00:43

score 63 · Answer 1 · answered Aug 18 '17 at 00:02

63

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

To index a DataFrame with integer rows and named columns (labeled columns):

df.loc[df.index[#], 'NAME'] where # is a valid integer index and NAME is the name of the column.

answered Aug 18 '17 at 00:02

brunston

1,027
8
15

1

Seems very slow on long dataframes. – ConanG Nov 08 '17 at 17:06
But it works splendidly. I stumbled on this yesterday and it is the exact syntax I needed to update a copy of a dataframe, linking back to the original by the index and by column name. – horcle_buzz Apr 01 '18 at 15:35
4

Your method requires values in index are unique. Otherwise it will return a Series with all match index "#" – Yingbo Miao Apr 03 '19 at 13:30

Ben · Answer 2 · 2022-03-16T15:00:40.943

The existing answers seem short-sighted to me.

Problematic Solutions

df.loc[df.index[0], 'a']
The strategy here is to get the row label of the 0th row and then use .loc as normal. I see two issues.
1. If df has repeated row labels, df.loc[df.index[0], 'a'] could return multiple rows.
2. .loc is slower than .iloc so you're sacrificing speed here.
df.reset_index(drop=True).loc[0, 'a']
The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0] gives the same result as .iloc[0]. Still, the problem here is runtime, as .loc is slower than .iloc and you'll incur a cost for resetting the index.

Better Solution

I suggest following @Landmaster's comment:

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0] except we get the column index dynamically using df.columns.get_loc("a").

To index multiple columns such as ['a', 'b', 'c'], use:

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

Update

This is discussed here as part of my course on Pandas.

Your preferred solution `df.iloc[0, df.columns.get_loc("a")]` isn't exempt from duplicate labels as column labels can be dublicated too. So you gain nothing but it's more verbose and slower than `df.loc[df.index[0], 'a']`. For single value access you should use neither of them anyway. — Darkonaut, Jan 23 '20 at 00:37
@Darkonaut duplicated column names are much *much* less likely to occur than duplicated row labels. Also, `df.iloc[0, df.columns.get_loc("a")]` and `df.loc[df.index[0], 'a']` should be nearly identical in their runtime unless df has thousands of columns, but even then the difference should be marginal. — Ben, Jan 23 '20 at 04:02

Krishna · Answer 3 · 2018-09-26T20:19:35.280

6

we can reset the index and then use 0 based indexing like this

df.reset_index(drop=True).loc[0,'a']

edit: removed [] from col name index 'a' so it just outputs the value

edited Sep 26 '18 at 20:19

answered Sep 24 '18 at 05:43

Krishna

405
4
10

That would not return a valid result, because there is no '0' in the index. – Hillary Sanders Sep 25 '18 at 16:29
understand the question now, thank you! please see if the edited code seems clean enough... – Krishna Sep 26 '18 at 03:53
1

@KrishnaBandhakavi , However, it will return more exactly if you remove `[]` from `'a'`. => `df.reset_index().loc[0,'a']` – ipramusinto Sep 26 '18 at 06:09
This is the only answer that works for making assignments in the case of non-unique indices. Although, in that case you'll want to keep the original index around and put it back afterwards. – user2561747 Jul 19 '19 at 01:14

Darkonaut · Answer 4 · 2020-01-23T01:04:47.250

6

For getting or setting a single value in a DataFrame by row/column labels, you better use DataFrame.at instead of DataFrame.loc, as it is ...

faster
you are more explicit about wanting to access only a single value.

How others have already shown, if you start out with an integer position for the row, you still have to find the row-label first with DataFrame.index as DataFrame.at only accepts labels:

df.at[df.index[0], 'a']
# Out: 'three'

Benchmark:

%timeit df.at[df.index[0], 'a']
# 7.57 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.loc[df.index[0], 'a']
# 10.9 µs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.iloc[0, df.columns.get_loc("a")]
# 13.3 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For completeness:

DataFrame.iat for accessing a single value for a row/column pair by integer position.

edited Jan 23 '20 at 01:04

answered Nov 13 '19 at 15:11

Darkonaut

17,692
6
45
56

How big are the DataFrames? For indexes that aren't just ordered integers, I assume `df.index` would need to do a reverse lookup and that would likely require `O(n)` iteration over the `n` rows. How would it deal with duplicates? Wouldn't `iat` be the fastest of all the solutions and also `O(1)`? – Mateen Ulhaq Mar 01 '21 at 04:52
@MateenUlhaq Must have been the same `df` OP gave as example. `df.index` is hashed, so `O(1)`. Duplicates won't be ignored, so always ensure you filtered for duplicates before. I don't recall timings for `iat`, but in general positional lookup just isn't always an option. – Darkonaut Mar 01 '21 at 10:20

score 5 · Accepted Answer · answered Apr 26 '21 at 02:08

A very late answer but it amzed me that pandas still doesn't have such a function after all these years. If it irks you a lot, you can monkey-patch a custom indexer into the DataFrame:

class XLocIndexer:
    def __init__(self, frame):
        self.frame = frame
    
    def __getitem__(self, key):
        row, col = key
        return self.frame.iloc[row][col]

pd.core.indexing.IndexingMixin.xloc = property(lambda frame: XLocIndexer(frame))

# Usage
df.xloc[0, 'a'] # one

score -2 · Answer 6 · answered Mar 24 '19 at 14:08

-2

Something like df["a"][0] is working fine for me. You may try it out!

answered Mar 24 '19 at 14:08

prashansa agrawal

15

1

It will be a better answer if you explain why this work for you and why it will work for author – flppv Mar 24 '19 at 14:27

Indexing Pandas data frames: integer rows, named columns

6 Answers6

Problematic Solutions

Better Solution

Update

Linked