36

I would like to write a custom list class in Python (let's call it MyCollection) where I can eventually call:

for x in myCollectionInstance:
    #do something here

How would I go about doing that? Is there some class I have to extend, or are there any functions I must override in order to do so?

K Mehta
  • 10,083
  • 4
  • 43
  • 73
  • Could you clarify better your requirements? If you subclass any iterable class (list, dict, etc...) it should work without problems. But maybe I am missing something? – mac Jul 03 '11 at 00:33
  • @mac: If I subclassed an iterable class, I'd also want a way to be able to access the underlying list object so that I can provide additional functions that act on it. I don't want a key-value pair (dict), so something that emulates an indexed collection (list) would suffice. – K Mehta Jul 03 '11 at 00:46

5 Answers5

39

Your can subclass list if your collection basically behaves like a list:

class MyCollection(list):
    def __init__(self, *args, **kwargs):
        super(MyCollection, self).__init__(args[0])

However, if your main wish is that your collection supports the iterator protocol, you just have to provide an __iter__ method:

class MyCollection(object):
    def __init__(self):
        self._data = [4, 8, 15, 16, 23, 42]

    def __iter__(self):
        for elem in self._data:
            yield elem

This allows you to iterate over any instance of MyCollection.

K Mehta
  • 10,083
  • 4
  • 43
  • 73
jena
  • 7,377
  • 1
  • 21
  • 23
  • 1
    I think the signature of your super method is wrong (only 99% sure, but doesn't `list()` accept only one iterable as argument? Also, is there a special reason for the final colon on the `super` call? – mac Jul 03 '11 at 01:00
  • 1
    Yes, you are of course right, I edited to code. Thanks for the hint. – jena Jul 03 '11 at 01:08
21

I like to subclass MutableSequence, as recommended by Alex Martelli. This works well... I frequently need to add custom methods on top of the list I'm building.

#####################################################################
# For more complete methods, refer to UserList() in the CPython source...
# https://github.com/python/cpython/blob/208a7e957b812ad3b3733791845447677a704f3e/Lib/collections/__init__.py#L1215
#####################################################################

try:
    # Python 3
    from collections.abc import MutableSequence
except ImportError:
    # Python 2.7
    from collections import MutableSequence

class MyList(MutableSequence):
    """A container for manipulating lists of hosts"""
    def __init__(self, data=None):
        """Initialize the class"""
        super(MyList, self).__init__()
        if (data is not None):
            self._list = list(data)
        else:
            self._list = list()

    def __repr__(self):
        return "<{0} {1}>".format(self.__class__.__name__, self._list)

    def __len__(self):
        """List length"""
        return len(self._list)

    def __getitem__(self, ii):
        """Get a list item"""
        if isinstance(ii, slice):
            return self.__class__(self._list[ii])
        else:
            return self._list[ii]

    def __delitem__(self, ii):
        """Delete an item"""
        del self._list[ii]

    def __setitem__(self, ii, val):
        # optional: self._acl_check(val)
        self._list[ii] = val

    def __str__(self):
        return str(self._list)

    def insert(self, ii, val):
        # optional: self._acl_check(val)
        self._list.insert(ii, val)

    def append(self, val):
        self.insert(len(self._list), val)

if __name__=='__main__':
    foo = MyList([1,2,3,4,5])
    foo.append(6)
    print(foo)  # <MyList [1, 2, 3, 4, 5, 6]>

    for idx, ii in enumerate(foo):
        print("MyList[%s] = %s" % (idx, ii))
Mike Pennington
  • 40,496
  • 17
  • 132
  • 170
  • 9
    `if not (data is None):` that's not how you write your python code, until a Grand Jedi Master you are. `if data is not None` -- this looks good. – byashimov Feb 05 '16 at 13:56
  • 2
    True, in addition, I guess only `if data:` should work fine. – Grzegorz Sep 17 '19 at 18:44
5

In Python 3 we have beautiful collections.UserList([list]):

Class that simulates a list. The instance’s contents are kept in a regular list, which is accessible via the data attribute of UserList instances. The instance’s contents are initially set to a copy of list, defaulting to the empty list []. list can be any iterable, for example a real Python list or a UserList object.

In addition to supporting the methods and operations of mutable sequences, UserList instances provide the following attribute: data A real list object used to store the contents of the UserList class.

https://docs.python.org/3/library/collections.html#userlist-objects

ramusus
  • 6,999
  • 5
  • 36
  • 44
4

You could extend the list class:

class MyList(list):

    def __init__(self, *args):
        super(MyList, self).__init__(args[0])
        # Do something with the other args (and potentially kwars)

Example usage:

a = MyList((1,2,3), 35, 22)
print(a)
for x in a:
    print(x)

Expected output:

[1, 2, 3]
1
2
3
mac
  • 40,746
  • 24
  • 119
  • 129
2

Implementing a list from scratch requires you to implement the full container protocol:

__len__()

__iter__()    __reversed__()

_getitem__()  __contains__()
__setitem__() __delitem__()

__eq__()      __ne__()       __gt__()
__lt__()      __ge__()       __le__()

__add__()     __radd__()     __iadd__()
__mul__()     __rmul__()     __imul__()

__str__()     __repr__()     __hash__

But the crux of the list is its read-only protocol, as captured by collections.abc.Sequence's 3 methods:

  • __len__()
  • __getitem__()
  • __iter__()

To see that in action, here it is a lazy read-only list backed by a range instance (super handy because it knows how to do slicing gymnastics), where any materialized values are stored in a cache (e.g. a dictionary):

import copy
from collections.abc import Sequence
from typing import Dict, Union

class LazyListView(Sequence):
    def __init__(self, length):
        self._range = range(length)
        self._cache: Dict[int, Value] = {}

    def __len__(self) -> int:
        return len(self._range)

    def __getitem__(self, ix: Union[int, slice]) -> Value:
        length = len(self)

        if isinstance(ix, slice):
            clone = copy.copy(self)
            clone._range = self._range[slice(*ix.indices(length))]  # slicing
            return clone
        else:
            if ix < 0:
                ix += len(self)  # negative indices count from the end
            if not (0 <= ix < length):
                raise IndexError(f"list index {ix} out of range [0, {length})")
            if ix not in self._cache:
                ...  # update cache
            return self._cache[ix]

    def __iter__(self) -> dict:
        for i, _row_ix in enumerate(self._range):
            yield self[i]

Although the above class is still missing the write-protocol and all the rest methods like __eq__(), __add__(), it is already quite functional.

>>> alist = LazyListView(12)
>>> type(alist[3:])
LazyListView

A nice thing is that slices retain the class, so they refrain breaking laziness and materialize elements (e.g. by coding an appropriate repr() method).

Yet the class still fails miserably in simple tests:

>>> alist == alist[:]
False

You have to implement __eq__() to fix this, and use facilities like functools.total_ordering() to implement __gt__() etc:

from functools import total_ordering

@total_ordering
class LaxyListView
    def __eq__(self, other):
        if self is other:
            return True
        if len(self) != len(other):
            return False
        return all(a == b for a, b in zip(self, other)

    def __lt__(self, other):
        if self is other:
            return 0
        res = all(self < other for a, b in zip(self, other)
        if res:
            return len(self) < len(other)

But that is indeed considerable effort.

NOTICE: if you try to bypass the effort and inherit list (instead of Sequence), more modifications are needed because, e.g. copy.copy() would now try to copy also the underlying list and end up calling __iter__(), destroying laziness; furthermore, __add__() method fills-in internally list, breaking adding of slices.

ankostis
  • 7,558
  • 3
  • 41
  • 57