8

Do Python's str.__lt__ or sorted order characters based on their unicode index or by some locale-dependent collation rules?

jfs
  • 374,366
  • 172
  • 933
  • 1,594
Aivar
  • 6,356
  • 4
  • 43
  • 71

1 Answers1

9

No, string ordering does not take locale into account. It is based entirely on the Unicode codepoint sort order.

The locale module does provide you with a locale.strxform() function that can be used for locale-specific sorting:

import locale

sorted(list_of_strings, key=locale.strxfrm)

This tool is quite limited; for any serious collation task you probably want to use the PyICU library:

import PyICU

collator = PyICU.Collator.createInstance(PyICU.Locale(locale_spec))
sorted(list_of_strings, key=collator.getSortKey)
Martijn Pieters
  • 963,270
  • 265
  • 3,804
  • 3,187