30

I am trying to get a random object from a model A

For now, it is working well with this code:

random_idx = random.randint(0, A.objects.count() - 1)
random_object = A.objects.all()[random_idx]

But I feel this code is better:

random_object = A.objects.order_by('?')[0]

Which one is the best? Possible problem with deleted objects using the first code? Because, for example, I can have 10 objects but the object with the number 10 as id, is not existing anymore? Did I have misunderstood something in A.objects.all()[random_idx] ?

Erwan
  • 901
  • 1
  • 11
  • 24
  • Why would you make 2 queries (one for count, one for actual select) instead of 1? – Selcuk Apr 02 '14 at 15:51
  • 2
    I think the second one is probably better, but the first one isn't subject to the problem you describe, because it's indexing a list you've already bounded, not selecting by the database ID. Also, why not `random.choice(A.objects.all())`? – Two-Bit Alchemist Apr 02 '14 at 15:51
  • possible duplicate of [How to pull a random record using Django's ORM?](http://stackoverflow.com/questions/962619/how-to-pull-a-random-record-using-djangos-orm) – alecxe Apr 02 '14 at 15:54
  • 2
    @Two-BitAlchemist blergh, that's the worst of all: getting all rows from the database in order to return just one. – Daniel Roseman Apr 02 '14 at 15:55
  • @DanielRoseman It's also plenty readable, leaves `A.objects.all()` in order (unlike solution 2) if it's used somewhere else, and concisely illustrates another potential use case. I don't see anything asking about _performance_, just what will work, and for a small number of objects, readability is more important. – Two-Bit Alchemist Apr 02 '14 at 15:58
  • @alecxe I don't think it is a duplicate. I already read the answers on this thread before submitting mine but my question is more accurate, and the answers and comments here are more interesting. Just my opinion... – Erwan Apr 02 '14 at 16:16

7 Answers7

62

Just been looking at this. The line:

random_object = A.objects.order_by('?')[0]

has reportedly brought down many servers.

Unfortunately Erwans code caused an error on accessing non-sequential ids.

There is another short way to do this:

import random

items = list(Product.objects.all())

# change 3 to how many random items you want
random_items = random.sample(items, 3)
# if you want only a single random item
random_item = random.choice(items)

The good thing about this is that it handles non-sequential ids without error.

nik_m
  • 11,374
  • 4
  • 44
  • 55
lukeaus
  • 10,094
  • 7
  • 48
  • 60
  • 5
    Looking at the documentation of the `random` module, `random.sample(items, 1)[0]` can be avoided by using `random.choice(items)`. See [random.choice](https://docs.python.org/3/library/random.html#random.choice). – Acsor Aug 11 '17 at 19:15
  • 2
    If you want to get the object from `random.choice(items)`, use `items = list(Product.objects.all())` – therealak12 Jun 19 '20 at 17:40
13

Improving on all of the above:

from random import choice

pks = A.objects.values_list('pk', flat=True)
random_pk = choice(pks)
random_obj = A.objects.get(pk=random_pk)
km6
  • 1,842
  • 2
  • 12
  • 18
11

The second bit of code is correct, but can be slower, because in SQL that generates an ORDER BY RANDOM() clause that shuffles the entire set of results, and then takes a LIMIT based on that.

The first bit of code still has to evaluate the entire set of results. E.g., what if your random_idx is near the last possible index?

A better approach is to pick a random ID from your database, and choose that (which is a primary key lookup, so it's fast). We can't assume that our every id between 1 and MAX(id) is available, in the case that you've deleted something. So following is an approximation that works out well:

import random

# grab the max id in the database
max_id = A.objects.order_by('-id')[0].id

# grab a random possible id. we don't know if this id does exist in the database, though
random_id = random.randint(1, max_id + 1)

# return an object with that id, or the first object with an id greater than that one
# this is a fast lookup, because your primary key probably has a RANGE index.
random_object = A.objects.filter(id__gte=random_id)[0]
Sohan Jain
  • 2,218
  • 1
  • 14
  • 17
  • 1
    The first code does not evaluate the entire list. Slices in Django querysets are translated into LIMIT/OFFSET calls in the SQL. – Daniel Roseman Apr 02 '14 at 16:05
  • What I meant is: LIMIT/OFFSET in SQL is notoriously slow, because it has to nearly evaluate the entire list. – Sohan Jain Apr 02 '14 at 16:06
  • You should replace the `get` by `filter`. Now you get the following error: `TypeError: 'A' object does not support indexing` – J. Ghyllebert Aug 08 '14 at 13:31
  • 1
    I would replace all "id"s with "pk"s. For more information, take a look at http://stackoverflow.com/questions/2165865/django-queries-id-vs-pk – 1man May 09 '16 at 23:59
  • This is not working if there are too many gaps in the PKs, like in a table that is constantly re-imported. – Risadinha Sep 22 '17 at 09:27
  • 2
    Not very great random. Imagine, you have 3 objects with id 1, 2 and 99 (other was removed). In this case we have 98% possibility that you algorithm returns 99 – rluts Feb 24 '20 at 20:46
4

How about calculating maximal primary key and getting random pk?

The book ‘Django ORM Cookbook’ compares execution time of the following functions to get random object from a given model.

from django.db.models import Max
from myapp.models import Category

def get_random():
    return Category.objects.order_by("?").first()

def get_random3():
    max_id = Category.objects.all().aggregate(max_id=Max("id"))['max_id']
    while True:
        pk = random.randint(1, max_id)
        category = Category.objects.filter(pk=pk).first()
        if category:
            return category

Test was made on a million DB entries:

In [14]: timeit.timeit(get_random3, number=100)
Out[14]: 0.20055226399563253

In [15]: timeit.timeit(get_random, number=100)
Out[15]: 56.92513192095794

See source.

After seeing those results I started using the following snippet:

from django.db.models import Max
import random

def get_random_obj_from_queryset(queryset):
    max_pk = queryset.aggregate(max_pk=Max("pk"))['max_pk']
    while True:
        obj = queryset.filter(pk=random.randint(1, max_pk)).first()
        if obj:
            return obj

So far it did do the job as long as there is an id. Notice that the get_random3 (get_random_obj_from_queryset) function won’t work if you replace model id with uuid or something else. Also, if too many instances were deleted the while loop will slow the process down.

Pawel Kam
  • 791
  • 6
  • 20
1

Yet another way:

pks = A.objects.values_list('pk', flat=True)
random_idx = randint(0, len(pks)-1)
random_obj = A.objects.get(pk=pks[random_idx])

Works even if there are larger gaps in the pks, for example if you want to filter the queryset before picking one of the remaining objects at random.

EDIT: fixed call of randint (thanks to @Quique). The stop arg is inclusive.

https://docs.python.org/3/library/random.html#random.randint

Risadinha
  • 14,638
  • 2
  • 78
  • 87
0

I'm sharing my latest test result with Django 2.1.7, PostgreSQL 10.

students = Student.objects.all()
for i in range(500):
    student = random.choice(students)
    print(student)

# 0.021996498107910156 seconds

for i in range(500):
    student = Student.objects.order_by('?')[0]
    print(student)

# 0.41299867630004883 seconds

It seems that random fetching with random.choice() is about 2x faster.

Exis Zhang
  • 444
  • 6
  • 10
-1

You Can use "choice" from "random" module

from .models import MyModel
from random import choice    

MyRandomChoice = choice(MyModel.objects.all())
  • 2
    While this code snippet may solve the question, [including an explanation](//meta.stackoverflow.com/q/392712/4733879) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations! – Filnor Sep 03 '20 at 09:13