16

I'm porting a C++ program to Python. There are some places where it uses std::set to store objects that define their own comparison operators. Since the Python standard library has no equivalent of std::set (a sorted key-value mapping data structure) I tried using a normal dictionary and then sorting it when iterating, like this:

def __iter__(self):
    items = self._data.items()
    items.sort()
    return iter(items)

However, profiling has shown that all the calls from .sort() to __cmp__ are a serious bottleneck. I need a better data structure - essentially a sorted dictionary. Does anyone know of an existing implementation? Failing that, any recommendations on how I should implement this? Read performance is more important than write performance and time is more important than memory.

Bonus points if it supports multiple values per key, like the C++ std::multimap.

Note that the OrderedDict class doesn't fit my needs, because it returns items in the order of insertion, whereas I need them sorted using their __cmp__ methods.

EMP
  • 55,280
  • 50
  • 161
  • 218

4 Answers4

5

You should use sort(key=...).
The key function you use will be related to the cmp you are already using. The advantage is that the key function is called n times whereas the cmp is called nlog n times, and typically key does half the work that cmp does

If you can include your __cmp__() we can probably show you how to convert it to a key function

If you are doing lots of iterations between modifications, you should cache the value of the sorted items.

John La Rooy
  • 281,034
  • 50
  • 354
  • 495
  • While not directly answering the question about data structures this has certainly helped to improve performance. +1 – EMP Feb 23 '10 at 22:23
5

For the sorted dictionary, you can (ab)use the stable nature of python's timsort: basically, keep the items partially sorted, append items at the end when needed, switching a "dirty" flag, and sort the remaining before iterating. See this entry for details and implementation (A Martelli's answer): Key-ordered dict in Python

Community
  • 1
  • 1
LeMiz
  • 5,318
  • 5
  • 26
  • 23
3

Python does not have built-in data-structures for this, though the bisect module provides functionality for keeping a sorted list with appropriately efficient algorithms.

If you have a list of sorted keys, you can couple it with a collections.defaultdict(list) to provide multimap-like functionality.

Mike Graham
  • 69,495
  • 14
  • 96
  • 129
0

In his book "Programming in Python 3", Mark Summerfield introduces a sorted dictionary class. The source code is available in this zip archive - look for SortedDict.py. The SortedDict class is described in detail in the book (which I recommend very much). It supports arbitrary keys for comparison and multiple values per key (which any dictionary in Python does, so that's not that big a deal, I think).

Tim Pietzcker
  • 313,408
  • 56
  • 485
  • 544