0

Does anyone know how to sort rows to ["D9", "D10", "E9P", "E10P"] ? I want to sort by the preceding alphabet first and then sort by number inside.

In [2]: rows
Out[2]: ['D10', 'D9', 'E9P', 'E10P']

In [3]: sorted(rows)
Out[3]: ['D10', 'D9', 'E10P', 'E9P']


1. I can sort 9 ahead of 10 like this.
In [9]: sorted(rows, key=lambda row: int(re.search('(\d+)', row, re.IGNORECASE).group(1)))
Out[9]: ['D9', 'E9P', 'D10', 'E10P']

2. This doesn't work for me
In [10]: sorted(rows, key=lambda row: (row, int(re.search('(\d+)', row, re.IGNORECASE).group(1))))
Out[10]: ['D10', 'D9', 'E10P', 'E9P']
Jon
  • 175
  • 2
  • 8
  • 1
    [natsort](https://pypi.org/project/natsort/) is good for this. Here is a [SO](https://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort) about this – han solo Oct 04 '19 at 18:14

3 Answers3

1

This will take any amount of characters at the front, and any amount of numbers after that.

def key(x):
    alpha, num_str = re.match(r'([A-Z]+)(\d+)', x).groups()
    num = int(num_str)
    return (alpha, num)

>>> sorted(["AC40", "AB55", "D9", "D10", "E9P", "E10P"], key=key)
['AB55', 'AC40', 'D9', 'D10', 'E9P', 'E10P']
Evan
  • 1,830
  • 1
  • 14
  • 18
0

Extending what you already have, you could use row[0] instead of row as your primary sort key;

In [8]: sorted(rows, key=lambda row: (row[0], int(re.search('(\d+)', row, re.IGNORECASE).group(1))))
Out[8]: ['D9', 'D10', 'E9P', 'E10P']
fuglede
  • 16,023
  • 2
  • 50
  • 86
0

You could do:

lst = ["D9", "D10", "E9P", "E10P"]

def keys(val):
    first = val[0]
    number = int(''.join(filter(str.isdigit, val)))
    return first,  number 

result = sorted(lst, key=keys)
print(result)

Output

['D9', 'D10', 'E9P', 'E10P']

Or if you want to use regex:

def keys(val):
    first = val[0]
    number = int(re.search('\d+', val).group())
    return first, number

Or also using regex:

def keys(val):
    alpha, digits = re.search('^([^\d]+)(\d+)', val).groups()
    return alpha, int(digits)

This last function has the advantage it accommodates multiple non-digits characters at the beginning of the string.

Dani Mesejo
  • 55,057
  • 6
  • 42
  • 65