76

Say df is a pandas dataframe.

  • df.loc[] only accepts names
  • df.iloc[] only accepts integers (actual placements)
  • df.ix[] accepts both names and integers:

When referencing rows, df.ix[row_idx, ] only wants to be given names. e.g.

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
                   '1' : np.arange(6)})
df = df.ix[2:6]
print(df)

   1      a
2  2  three
3  3   four
4  4   five
5  5    six

df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

When referencing columns, iloc is prefers integers, not names. e.g.

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1'] does return 2).

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

I realize I can use:

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols) is in fact df.iloc[rows].loc[cols]?

unutbu
  • 777,569
  • 165
  • 1,697
  • 1,613
Hillary Sanders
  • 5,280
  • 8
  • 29
  • 50
  • 17
    You could use `df['a'].iloc[0]`. – unutbu Feb 27 '15 at 02:58
  • 14
    See also [GH 9213](https://github.com/pydata/pandas/issues/9213#issuecomment-72076683), which suggests `df.loc[df.index[0], 'a']`. This has the [advantage of not using chained indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-the-assignment-when-using-chained-indexing-fail), which means it will work when making assignments, whereas `df[['a','b']].iloc[0] = val` would not. – unutbu Feb 27 '15 at 22:46
  • 1
    doesn't really solve your problem but very good answer here: https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation – JohnE Aug 15 '17 at 14:19
  • 5
    Or the other way around, too: df.iloc[0, df.columns.get_loc("a")] – Landmaster Aug 18 '17 at 00:43

6 Answers6

63

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

To index a DataFrame with integer rows and named columns (labeled columns):

df.loc[df.index[#], 'NAME'] where # is a valid integer index and NAME is the name of the column.

brunston
  • 1,027
  • 8
  • 15
  • 1
    Seems very slow on long dataframes. – ConanG Nov 08 '17 at 17:06
  • But it works splendidly. I stumbled on this yesterday and it is the exact syntax I needed to update a copy of a dataframe, linking back to the original by the index and by column name. – horcle_buzz Apr 01 '18 at 15:35
  • 4
    Your method requires values in index are unique. Otherwise it will return a Series with all match index "#" – Yingbo Miao Apr 03 '19 at 13:30
34

The existing answers seem short-sighted to me.

Problematic Solutions

  1. df.loc[df.index[0], 'a']
    The strategy here is to get the row label of the 0th row and then use .loc as normal. I see two issues.

    1. If df has repeated row labels, df.loc[df.index[0], 'a'] could return multiple rows.
    2. .loc is slower than .iloc so you're sacrificing speed here.
  2. df.reset_index(drop=True).loc[0, 'a']
    The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0] gives the same result as .iloc[0]. Still, the problem here is runtime, as .loc is slower than .iloc and you'll incur a cost for resetting the index.

Better Solution

I suggest following @Landmaster's comment:

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0] except we get the column index dynamically using df.columns.get_loc("a").

To index multiple columns such as ['a', 'b', 'c'], use:

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

Update

This is discussed here as part of my course on Pandas.

Ben
  • 17,762
  • 27
  • 102
  • 166
  • 2
    Your preferred solution `df.iloc[0, df.columns.get_loc("a")]` isn't exempt from duplicate labels as column labels can be dublicated too. So you gain nothing but it's more verbose and slower than `df.loc[df.index[0], 'a']`. For single value access you should use neither of them anyway. – Darkonaut Jan 23 '20 at 00:37
  • @Darkonaut duplicated column names are much *much* less likely to occur than duplicated row labels. Also, `df.iloc[0, df.columns.get_loc("a")]` and `df.loc[df.index[0], 'a']` should be nearly identical in their runtime unless df has thousands of columns, but even then the difference should be marginal. – Ben Jan 23 '20 at 04:02
6

we can reset the index and then use 0 based indexing like this

df.reset_index(drop=True).loc[0,'a']

edit: removed [] from col name index 'a' so it just outputs the value

Krishna
  • 405
  • 4
  • 10
  • That would not return a valid result, because there is no '0' in the index. – Hillary Sanders Sep 25 '18 at 16:29
  • understand the question now, thank you! please see if the edited code seems clean enough... – Krishna Sep 26 '18 at 03:53
  • 1
    @KrishnaBandhakavi , However, it will return more exactly if you remove `[]` from `'a'`. => `df.reset_index().loc[0,'a']` – ipramusinto Sep 26 '18 at 06:09
  • This is the only answer that works for making assignments in the case of non-unique indices. Although, in that case you'll want to keep the original index around and put it back afterwards. – user2561747 Jul 19 '19 at 01:14
6

For getting or setting a single value in a DataFrame by row/column labels, you better use DataFrame.at instead of DataFrame.loc, as it is ...

  1. faster
  2. you are more explicit about wanting to access only a single value.

How others have already shown, if you start out with an integer position for the row, you still have to find the row-label first with DataFrame.index as DataFrame.at only accepts labels:

df.at[df.index[0], 'a']
# Out: 'three'

Benchmark:

%timeit df.at[df.index[0], 'a']
# 7.57 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.loc[df.index[0], 'a']
# 10.9 µs ± 53.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.iloc[0, df.columns.get_loc("a")]
# 13.3 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For completeness:

DataFrame.iat for accessing a single value for a row/column pair by integer position.

Darkonaut
  • 17,692
  • 6
  • 45
  • 56
  • How big are the DataFrames? For indexes that aren't just ordered integers, I assume `df.index` would need to do a reverse lookup and that would likely require `O(n)` iteration over the `n` rows. How would it deal with duplicates? Wouldn't `iat` be the fastest of all the solutions and also `O(1)`? – Mateen Ulhaq Mar 01 '21 at 04:52
  • @MateenUlhaq Must have been the same `df` OP gave as example. `df.index` is hashed, so `O(1)`. Duplicates won't be ignored, so always ensure you filtered for duplicates before. I don't recall timings for `iat`, but in general positional lookup just isn't always an option. – Darkonaut Mar 01 '21 at 10:20
5

A very late answer but it amzed me that pandas still doesn't have such a function after all these years. If it irks you a lot, you can monkey-patch a custom indexer into the DataFrame:

class XLocIndexer:
    def __init__(self, frame):
        self.frame = frame
    
    def __getitem__(self, key):
        row, col = key
        return self.frame.iloc[row][col]

pd.core.indexing.IndexingMixin.xloc = property(lambda frame: XLocIndexer(frame))

# Usage
df.xloc[0, 'a'] # one
Code Different
  • 82,550
  • 14
  • 135
  • 153
-2

Something like df["a"][0] is working fine for me. You may try it out!

  • 1
    It will be a better answer if you explain why this work for you and why it will work for author – flppv Mar 24 '19 at 14:27