Python: Uniqueness for list of lists

Question

I am curious what would be an efficient way of uniquefying such data objects:

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]

For each data pair, left numeric string PLUS the type at the right tells the uniqueness of a data element. The return value should be a list of lists as same as the testdata, but with only the unique values kept.

score 143 · Accepted Answer · answered Sep 16 '10 at 07:30

143

You can use a set:

unique_data = [list(x) for x in set(tuple(x) for x in testdata)]

You can also see this page which benchmarks a variety of methods that either preserve or don't preserve order.

answered Sep 16 '10 at 07:30

Mark Byers

767,688
176
1,542
1,434

Do note that you lose the ordering with this method. If it's relevant than you'll have to sort it after or remove the items manually. – Wolph Sep 16 '10 at 07:31
1

I am getting an error: `TypeError: unhashable type: 'list'`. Python 2.6.2, Ubuntu Jaunty. – Manoj Govindan Sep 16 '10 at 07:31
@Hellnar: he just updated the code to use a tuple, now you won't get that problem anymore :) – Wolph Sep 16 '10 at 07:32
1

@Manoj Govindan: The problem occurs because lists aren't hashable and only hashable types can be used in a set. I have fixed it by converting to tuples and then converting back to a list afterwards. Probably though the OP should be using a list of tuples. – Mark Byers Sep 16 '10 at 07:35
@Khan: Python sets are unordered. That doesn't mean you won't get a consistent result from `list(some_set)` but it means that you cannot set or influence the sort order in any way. For more info: https://stackoverflow.com/questions/12165200/order-of-unordered-python-sets – Wolph Mar 03 '19 at 00:22
@Wolph: Replace `set` with `dict.fromkeys`, and leave everything else the same, and on CPython/PyPy 3.6+ (or any Python 3.7+), you'll preserve order (the first copy of each duplicated value is kept in the original order, subsequent duplicates are discarded). – ShadowRanger Mar 09 '21 at 02:34

Manoj Govindan · Answer 2 · 2010-09-16T08:20:36.717

9

I tried @Mark's answer and got an error. Converting the list and each elements into a tuple made it work. Not sure if this the best way though.

list(map(list, set(map(lambda i: tuple(i), testdata))))

Of course the same thing can be expressed using a list comprehension instead.

[list(i) for i in set(tuple(i) for i in testdata)]

I am using Python 2.6.2.

Update

@Mark has since changed his answer. His current answer uses tuples and will work. So will mine :)

Update 2

Thanks to @Mark. I have changed my answer to return a list of lists rather than a list of tuples.

edited Sep 16 '10 at 08:20

answered Sep 16 '10 at 07:33

Manoj Govindan

68,439
21
129
136

1

Here's a little trick: instead of `lambda x: foo(x)` you can just write `foo`. – Mark Byers Sep 16 '10 at 07:49
@Mark: Where `foo` is a callable. Gotcha. – Manoj Govindan Sep 16 '10 at 08:20

score 2 · Answer 3 · answered Sep 16 '10 at 07:43

2

import sets
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']]
conacatData = [x[0] + x[1] for x in testdata]
print conacatData
uniqueSet = sets.Set(conacatData)
uniqueList = [ [t[0:-3], t[-3:]] for t in uniqueSet]
print uniqueList

answered Sep 16 '10 at 07:43

pyfunc

63,167
15
145
135

2

Also, the sets module is deprecated, use the builtin set-type instead. – Björn Pollex Sep 16 '10 at 08:36
Simple Ideas work – nish Feb 24 '22 at 13:11

score 2 · Answer 4 · answered Oct 11 '18 at 19:51

Expanding a bit on @Mark Byers solution, you can also just do one list comprehension and typecast to get what you need:

testdata = list(set(tuple(x) for x in testdata))

Also, if you don't like list comprehensions as many find them confusing, you can do the same in a for loop:

for i, e in enumerate(testdata):
    testdata[i] = tuple(e)
testdata = list(set(testdata))

Shaido · Answer 5 · 2020-09-18T03:54:55.527

2

Use unique in numpy to solve this:

import numpy as np

np.unique(np.array(testdata), axis=0)

Note that the axis keyword needs to be specified otherwise the list is first flattened.

Alternatively, use vstack:

np.vstack({tuple(row) for row in testdata})

edited Sep 18 '20 at 03:54

answered Sep 18 '20 at 03:46

Shaido

25,575
21
68
72

score 1 · Answer 6 · answered Mar 02 '19 at 20:27

1

if you have a list of objects than you can modify @Mark Byers answer to:

unique_data = [list(x) for x in set(tuple(x.testList) for x in testdata)]

where testdata is a list of objects which has a list testList as attribute.

answered Mar 02 '19 at 20:27

Khan

1,287
1
23
45

score 0 · Answer 7 · answered Feb 17 '22 at 18:24

I was about to post my own take on this until I noticed that @pyfunc had already come up with something similar. I'll post my take on this problem anyway in case it's helpful.

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
flatdata = [p[0] + "%" + p[1] for p in testdata]
flatdata = list(set(flatdata))
testdata = [p.split("%") for p in flatdata]
print(testdata)

Basically, you concatenate each element of your list into a single string using a list comprehension, so that you have a list of single strings. This is then much easier to turn into a set, which makes it unique. Then you simply split it on the other end and convert it back to your original list.

I don't know how this compares in terms of performance but it's a simple and easy-to-understand solution I think.

Python: Uniqueness for list of lists

7 Answers7

Linked

Related