Sorting arrays in NumPy by column

Question

How can I sort an array in NumPy by the nth column?

For example,

a = array([[9, 2, 3],
           [4, 5, 6],
           [7, 0, 5]])

I'd like to sort rows by the second column, such that I get back:

array([[7, 0, 5],
       [9, 2, 3],
       [4, 5, 6]])

This is a really bad example since `np.sort(a, axis=0)` would be a satisfactory solution for the given matrix. I suggested an edit with a better example but was rejected, although actually the question would be much more clear. The example should be something like `a = numpy.array([[1, 2, 3], [6, 5, 2], [3, 1, 1]])` with desired output `array([[3, 1, 1], [1, 2, 3], [6, 5, 2]])` — David, Aug 04 '17 at 16:16
David, you don't get the point of the question. He wants to keep the order within each row the same. — marcorossi, Nov 08 '17 at 23:22
@marcorossi I did get the point, but the example was very badly formulated because, as I said, there were multiple possible answers (which, however, wouldn't have satisfied the OP's request). A later edit based on my comment has indeed been approved (funny that mine got rejected, though). So now everything is fine. — David, Jun 09 '20 at 09:51
If the answers could be sorted by order of decreasing interest... — mins, Apr 08 '21 at 14:18
I think using a structured array could be a way to make the code more readable. I attached a possible answer here: https://stackoverflow.com/a/67788660/13890678 — lhoupert, Jun 01 '21 at 12:15

score 926 · Answer 1 · edited Apr 02 '21 at 11:24

926

To sort by the second column of a:

a[a[:, 1].argsort()]

edited Apr 02 '21 at 11:24

Mateen Ulhaq

21,459
16
82
123

answered May 13 '10 at 15:39

Steve Tjoa

55,811
16
88
98

6

This is not clear, what is `1` in here? the index to be sorted by? – orezvani Apr 14 '14 at 05:30
40

`[:,1]` indicates the second column of `a`. – Steve Tjoa Apr 17 '14 at 20:49
79

If you want the reverse sort, modify this to be `a[a[:,1].argsort()[::-1]]` – Steven C. Howell May 14 '15 at 14:49
1

Looks simple and works! Is it faster than `np.sort` or not? – Václav Pavlík Feb 04 '16 at 12:34
23

I find this easier to read: `ind = np.argsort( a[:,1] ); a = a[ind]` – poppie Feb 13 '17 at 03:40
I found 1 little confusing, a[a[:,0].argsort()] – Deepak Sharma Apr 17 '17 at 14:02
For a more general solution that can manage non-square arrays, use: `a = array([[1,6,2],[5,3,7]]) print(a) c = a[0,:].argsort() print(c) d = a[:,c] print(d) c = a[:,1].argsort() print(c) e = a[c,:] print(e)` – Robert Jul 13 '17 at 05:35
2

a[a[:,k].argsort()] is the same as a[a[:,k].argsort(),:]. This generalizes to the other dimension (sort cols using a row): a[:,a[j,:].argsort()] (hope i typed that right.) – bean Feb 04 '18 at 17:40
As far as I can see this does not generalize to the case that one would want to sort by multiple columns `cols`, i.e. all columns. For that case I do `a[np.lexsort(a.T[cols])]` where `[cols]` could be left out if sorted by all columns and from right to left. This also allows to define the order of columns by which to sort, i.e `cols = range(a.shape[1])[::-1]`. – Radio Controlled Apr 11 '18 at 11:05
@Robert You telling me this solution doesn't work for non-square arrays??? – NoName May 19 '20 at 18:42
needed to use b = a[a[:, 1].argsort()] then b is the sorted one – pippo1980 Jul 06 '21 at 13:19

score 175 · Accepted Answer · edited Jun 16 '20 at 00:22

175

@steve's answer is actually the most elegant way of doing it.

For the "correct" way see the order keyword argument of numpy.ndarray.sort

However, you'll need to view your array as an array with fields (a structured array).

The "correct" way is quite ugly if you didn't initially define your array with fields...

As a quick example, to sort it and return a copy:

In [1]: import numpy as np

In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])

In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None

In [7]: a
Out[7]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

@Steve's really is the most elegant way to do it, as far as I know...

The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].

edited Jun 16 '20 at 00:22

Trenton McKinney

43,885
25
111
113

answered May 13 '10 at 16:10

Joe Kington

258,645
67
583
455

4

In my numpy 1.6.1rc1, it raises `ValueError: new type not compatible with array.` – Clippit Oct 05 '11 at 17:40
10

Would it make sense to file a feature request that the "correct" way be made less ugly? – endolith Aug 21 '13 at 03:15
6

What if the values in the array are `float`? Should I change anything? – Marco Mar 23 '14 at 09:23
If you have transposed ``a`` before getting to this point in the code, you will need to take a full copy so as to get the proper meaning from the ``view`` method. [See this quension](http://stackoverflow.com/a/19826315/2399799). – dan-man Jul 14 '14 at 11:18
In reply to @Marco's question, you can use ``a.dtype.str`` to get the ``'i8'`` or ``'f8'`` string...which you then need to build into the full comma-delimited string. – dan-man Jul 14 '14 at 11:24
1

And for hybrid type like `a = np.array([['a',1,2,3],['b',4,5,6],['c',0,0,1]])` what approach should I follow? – ePascoal May 09 '15 at 16:50
1

@MrMartin - Based on this and some of your other comments, it sounds like you have data with a common type in each column (e.g. spreadsheet-like data). If so, have a look at `pandas`. It will simplify a lot of the type of operations you're wanting to do. – Joe Kington May 09 '15 at 20:16
12

One major advantage of this method over Steve's is that it allows very large arrays to be sorted in place. For a sufficiently large array, the indices returned by `np.argsort` may themselve take up quite a lot of memory, and on top of that, indexing with an array will also generate a copy of the array that is being sorted. – ali_m Jul 11 '15 at 23:38
@Clippit `new type not compatible with array.` can also be caused by NaN values. Use `np.nan_to_num(a)` to fix. – jaycode Dec 19 '15 at 11:25
@dan-man: Thanks for the link (two years later...). If `array.flags.c_contiguous` is true, is that a guarantee that the view method will work? – Linuxios Nov 29 '16 at 06:36
6

Can someone explain the `'i8,i8,i8'`? This is for each column or each row? What should change if sorting a different dtype? How do I find out how many bits are being used? Thank you – evn Nov 28 '20 at 23:52
This sorts `[[2,4], [1, 3], [2, 2], [1, 1]]` into `[[1,1], [2, 2], [1, 3], [2, 4]]` instead of `[[2,1], [1, 2], [2, 3], [1, 4]]` if sorted based on 2nd col. – gargoylebident Apr 05 '21 at 19:02
Sorry bad example. This sorts `[[2, 3], [1, 2], [2, 1]]` into `[[1, 1], [2, 3], [2, 2]]` instead of `[[2, 3], [1, 1], [2, 2]]` if sorted based on 1st col. – gargoylebident Apr 05 '21 at 19:09
@JoeKingtonwhat does (order=['f1'] stand for ? field 1 ? cant figure out from https://numpy.org/doc/stable/user/basics.rec.html – pippo1980 Jul 07 '21 at 18:13

J.J · Answer 3 · 2017-02-25T22:37:00.443

48

You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:

a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]

This sorts by column 0, then 1, then 2.

edited Feb 25 '17 at 22:37

answered Jul 05 '16 at 01:42

J.J

3,179
1
27
35

5

Why does First Sort not need to be stable? – Little Bobby Tables Oct 26 '16 at 20:59
13

Good question - stable means that when there's a tie you maintain the original order, and the original order of the unsorted file is irrelevant. – J.J Oct 27 '16 at 13:06
This seems like a really super important point. having a list that silently doesn’t sort would be bad. – Clumsy cat May 21 '18 at 09:07

score 24 · Answer 4 · answered Feb 25 '16 at 10:37

In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:

import numpy as np
table = np.random.rand(5000, 10)

%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop

%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

So, it looks like indexing with argsort is the quickest method so far...

score 23 · Answer 5 · edited May 26 '17 at 10:00

23

From the Python documentation wiki, I think you can do:

a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]); 
a = sorted(a, key=lambda a_entry: a_entry[1]) 
print a

The output is:

[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]

edited May 26 '17 at 10:00

Peter Mortensen

30,030
21
100
124

answered Sep 28 '11 at 20:05

user541064

323
2
7

21

With this solution, one gets a list instead of a NumPy array, so this might not always be convenient (takes more memory, is probably slower, etc.). – Eric O Lebigot Sep 28 '11 at 20:13
this "solution" is slower by the most-upvoted answer by a factor of ... well, close to infinity actually – Jivan Jun 18 '20 at 12:03
1

@Jivan Actually, this solution is faster than the most-upvoted answer by a factor of 5 https://imgur.com/a/IbqtPBL – Antony Hatchkins Nov 26 '20 at 16:43

score 22 · Answer 6 · edited May 26 '17 at 10:00

22

From the NumPy mailing list, here's another solution:

>>> a
array([[1, 2],
       [0, 0],
       [1, 0],
       [0, 2],
       [2, 1],
       [1, 0],
       [1, 0],
       [0, 0],
       [1, 0],
      [2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
       [0, 0],
       [0, 2],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 2],
       [2, 1],
       [2, 2]])

edited May 26 '17 at 10:00

Peter Mortensen

30,030
21
100
124

answered Jun 03 '15 at 15:03

fgregg

3,093
29
36

5

The correct generalization is `a[np.lexsort(a.T[cols])]`. where `cols=[1]` in the original question. – Radio Controlled Apr 11 '18 at 13:12

score 8 · Answer 7 · edited May 26 '17 at 10:09

I had a similar problem.

My Problem:

I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors. My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.

So I want to sort a two-dimensional array column-wise by the first row in descending order.

My Solution

a = a[::, a[0,].argsort()[::-1]]

So how does this work?

a[0,] is just the first row I want to sort by.

Now I use argsort to get the order of indices.

I use [::-1] because I need descending order.

Lastly I use a[::, ...] to get a view with the columns in the right order.

score 4 · Answer 8 · edited Jun 27 '20 at 09:14

import numpy as np
a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]])
y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4]
a=a[y]
print(a)

Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]

note that argsort(numArray) returns the indices of an numArray as it was supposed to be arranged in a sorted manner.

example

x=np.array([8,1,5]) 
z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array**
print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z

answer would be [1,5,8]

You sure its not [1,2,0]? – adir abargil Dec 20 '20 at 06:04 — adir abargil, Dec 20 '20 at 06:04

score 3 · Answer 9 · answered Aug 07 '16 at 16:33

A little more complicated lexsort example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.

In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]: 
array([[1, 2, 1],
       [3, 1, 2],
       [1, 1, 3],
       [2, 3, 4],
       [3, 2, 5],
       [2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]: 
array([[3, 1, 2],
       [3, 2, 5],
       [2, 1, 6],
       [2, 3, 4],
       [1, 1, 3],
       [1, 2, 1]])

score 1 · Answer 10 · answered Jan 30 '18 at 19:36

Here is another solution considering all columns (more compact way of J.J's answer);

ar=np.array([[0, 0, 0, 1],
             [1, 0, 1, 0],
             [0, 1, 0, 0],
             [1, 0, 0, 1],
             [0, 0, 1, 0],
             [1, 1, 0, 0]])

Sort with lexsort,

ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]

Output:

array([[0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 1, 0, 0],
       [1, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 1, 0, 0]])

score 0 · Answer 11 · edited Mar 05 '22 at 08:17

0

Simply using sort, use column number based on which you want to sort.

a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a = a.tolist() 
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)

edited Mar 05 '22 at 08:17

marc_s

704,970
168
1,303
1,425

answered Apr 19 '20 at 17:13

Jerin Antony

21
4

score 0 · Answer 12 · answered Apr 27 '20 at 04:59

It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:

np.einsum('ij->ij', a[a[:,1].argsort(),:])

This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per @steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.

Output:

[[7 0 5]
 [9 2 3]
 [4 5 6]]

score 0 · Answer 13 · answered Aug 15 '20 at 08:45

0

#for sorting along column 1

indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable') 
dataset   = dataset[indexofsort,:]

answered Aug 15 '20 at 08:45

umair ali

19
6

Arkady · Answer 14 · 2021-01-31T14:58:57.603

def sort_np_array(x, column=None, flip=False):
    x = x[np.argsort(x[:, column])]
    if flip:
        x = np.flip(x, axis=0)
    return x

Array in the original question:

a = np.array([[9, 2, 3],
              [4, 5, 6],
              [7, 0, 5]])

The result of the sort_np_array function as expected by the author of the question:

sort_np_array(a, column=1, flip=False)

[2]: array([[7, 0, 5],
            [9, 2, 3],
            [4, 5, 6]])

score 0 · Answer 15 · answered Jun 01 '21 at 12:12

Thanks to this post: https://stackoverflow.com/a/5204280/13890678

I found a more "generic" answer using structured array. I think one advantage of this method is that the code is easier to read.

import numpy as np
a = np.array([[9, 2, 3],
           [4, 5, 6],
           [7, 0, 5]])

struct_a = np.core.records.fromarrays(
    a.transpose(), names="col1, col2, col3", formats="i8, i8, i8"
)
struct_a.sort(order="col2")

print(struct_a)

[(7, 0, 5) (9, 2, 3) (4, 5, 6)]

score 0 · Answer 16 · answered Mar 04 '22 at 23:09

Pandas Approach Just For Completeness:

a = np.array([[9, 2, 3],
              [4, 5, 6],
              [7, 0, 5]])              
a = pd.DataFrame(a) 

             
a.sort_values(1, ascending=True).to_numpy()
array([[7, 0, 5], # '1' means sort by second column
       [9, 2, 3],
       [4, 5, 6]])

prl900 Did the Benchmark, comparing with the accepted answer:

%timeit pandas_df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

%timeit numpy_table[numpy_table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

Sorting arrays in NumPy by column

16 Answers16

Linked

Related