Suppose I have a dataframe with columns a, b and c, I want to sort the dataframe by column b in ascending order, and by column c in descending order, how do I do this?
- 7,457
- 4
- 52
- 48
- 10,627
- 16
- 49
- 71
-
1check this answer http://stackoverflow.com/a/14946246/1948860 – richie Jun 17 '13 at 06:44
-
Does this answer your question? [Pandas sort by group aggregate and column](https://stackoverflow.com/questions/14941366/pandas-sort-by-group-aggregate-and-column) – vestland Aug 14 '20 at 05:57
3 Answers
As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same:
df.sort_values(['a', 'b'], ascending=[True, False])
You can use the ascending argument of sort:
df.sort(['a', 'b'], ascending=[True, False])
For example:
In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])
In [12]: df1.sort(['a', 'b'], ascending=[True, False])
Out[12]:
a b
2 1 4
7 1 3
1 1 2
3 1 2
4 3 2
6 4 4
0 4 3
9 4 3
5 4 1
8 4 1
As commented by @renadeen
Sort isn't in place by default! So you should assign result of the sort method to a variable or add inplace=True to method call.
that is, if you want to reuse df1 as a sorted DataFrame:
df1 = df1.sort(['a', 'b'], ascending=[True, False])
or
df1.sort(['a', 'b'], ascending=[True, False], inplace=True)
- 328,850
- 93
- 598
- 514
-
13Sort isn't in place by default! So you should assign result of the `sort` method to a variable or add `inplace=True` to method call. – renadeen Sep 22 '14 at 16:58
-
2@renadeen very good point, I've updated by answer with that comment. – Andy Hayden Sep 22 '14 at 17:51
-
1I was surprised to learn today that sort has been deprecated! Based on some of the opinions in this meta post: http://meta.stackoverflow.com/questions/297404/if-a-correct-answer-is-deprecated-should-i-vote-it-down I decided to add a new answer rather than attempt an edit to yours – Kyle Heuton Nov 20 '15 at 23:14
-
2@Snoozer Yeah, I don't think sort's ever going to go away (mainly as it's used extensively in Wes' book), but there has been [some big changes in calling sort](https://github.com/pydata/pandas/pull/10726). Thanks! .. I really need to automate going through all my 1000s of pandas answers for deprecations! – Andy Hayden Nov 21 '15 at 00:47
As of pandas 0.17.0, DataFrame.sort() is deprecated, and set to be removed in a future version of pandas. The way to sort a dataframe by its values is now is DataFrame.sort_values
As such, the answer to your question would now be
df.sort_values(['b', 'c'], ascending=[True, False], inplace=True)
- 8,878
- 4
- 38
- 50
For large dataframes of numeric data, you may see a significant performance improvement via numpy.lexsort, which performs an indirect sort using a sequence of keys:
import pandas as pd
import numpy as np
np.random.seed(0)
df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])
df1 = pd.concat([df1]*100000)
def pdsort(df1):
return df1.sort_values(['a', 'b'], ascending=[True, False])
def lex(df1):
arr = df1.values
return pd.DataFrame(arr[np.lexsort((-arr[:, 1], arr[:, 0]))])
assert (pdsort(df1).values == lex(df1).values).all()
%timeit pdsort(df1) # 193 ms per loop
%timeit lex(df1) # 143 ms per loop
One peculiarity is that the defined sorting order with numpy.lexsort is reversed: (-'b', 'a') sorts by series a first. We negate series b to reflect we want this series in descending order.
Be aware that np.lexsort only sorts with numeric values, while pd.DataFrame.sort_values works with either string or numeric values. Using np.lexsort with strings will give: TypeError: bad operand type for unary -: 'str'.
- 147,904
- 31
- 244
- 302