18

I'd like to know if the absence of element ordering of the Python's built-in set structure is "random enough". For instance, taking the iterator of a set, can it be considered a shuffled view of its elements?

(If it matters, I'm running Python 2.6.5 on a Windows host.)

sblundy
  • 59,536
  • 22
  • 119
  • 123
Chuim
  • 1,893
  • 2
  • 15
  • 20

5 Answers5

34

No, it is not random. It is "arbitrarily ordered", which means that you cannot depend on it being either ordered or random.

Ignacio Vazquez-Abrams
  • 740,318
  • 145
  • 1,296
  • 1,325
  • 4
    It's important to understand the difference between "undefined" and "random". – Matti Virkkunen May 18 '10 at 19:22
  • 7
    Indeed, the order is predictable from the ID's of the various objects in the set. It's quite rigorously defined by the code. BUT -- bonus -- the details are none of your business, making them "arbitrary" and "implementation-specific" and "undependable for anything". And "undefined as far as you're allowed to care." – S.Lott May 18 '10 at 19:25
  • OK. The hash function will determine the order. For instance, for integer elements we'll get the natural order. So, I conclude that we'll have an "undefined", "arbitrary" and "repeatable" ordering for the same set of elements. – Chuim May 18 '10 at 20:01
  • 2
    It might only be repeatable under a single implementation of Python. If the spec says it's undefined, don't assume anything else about it (not even repeatability). – Matti Virkkunen May 18 '10 at 20:03
  • 1
    Undefined means "changeable without notice". So an upgrade from 2.6.1 to 2.6.2 i allowed to change things that are otherwise undefined. – S.Lott May 19 '10 at 11:03
  • The comments of this answer are as good as the answer it self. Too bad you can't share rep. – e-satis Nov 02 '11 at 14:20
6

In a word, no:

>>> list(set(range(10000))) == list(range(10000))
True
Daniel Stutzbach
  • 70,140
  • 17
  • 84
  • 76
4

No, you can not rely on it for any real statistical purpose. The implementation of sets in Python is in terms of a hash table, and can cause the element distribution to display some very non-random properties. There's a large gap between "not having a guaranteed order" and "guaranteed to be unordered in a uniform-random manner".

Use random.shuffle to really shuffle elements of a sequence.

Eli Bendersky
  • 248,051
  • 86
  • 340
  • 401
  • The thing is `random.shuffle` may only be used for sequences, which a `set` is not. One may convert it to a `list` but for a big number of elements and performance sensitive code it may be an issue... – Chuim May 18 '10 at 20:04
4

Arbitrariness is central when designing programs, each of these freedoms that you reserve is like a joker card that you can use when you implement, develop, or rewrite your program. The more of these free-cards you collect, the more efficiency can you deliver from your code (probably), since you have more freedom to change it.

It is not random, it's only freedom. If it's a better set that way, the order can be forwards on Wednesdays and "backwards" on Fridays.

u0b34a0f6ae
  • 45,417
  • 14
  • 88
  • 100
4

Just a note about the rigorously of the order. It seems that it is very unreliable even in the same running environment.

For example this code gives different answers:

data = 'KSRNDOW3GQ'
chars = set(data)
print(list(chars))

enter image description here

xslittlegrass
  • 4,346
  • 4
  • 23
  • 32