0

Initially, the query was fast but as the rows increased the performance degraded. My table consists of 600K rows and now we face a major performance issue with the Rand() function.

SELECT id,postalCode,location 
from jobs_feed 
WHERE check_parsed= 1 AND similarjob_status=0 AND is_updated != 0
ORDER BY RAND() LIMIT 1;
Schecher_1
  • 148
  • 2
  • 10
Chowdary
  • 1
  • 1
  • 1
    Please explain why you think this is an issue with the RAND function, because I think the issue is with the number of records. (and maybe a missing index? Can you add the output of `SHOW CREATE TABLE jobs_feed ` ? ) – Luuk May 22 '22 at 08:25
  • Yes, this question has been asked before, see https://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql, or https://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast – Shadow May 22 '22 at 08:32

3 Answers3

0

MySQL needs to determine a random number of all 600K+ rows, just to determine which row comes first.

A possible work around can be:

SELECT id,postalCode,location
FROM (
   SELECT id,postalCode,location 
   from jobs_feed 
   WHERE check_parsed= 1 AND similarjob_status=0 AND is_updated != 0
   ORDER BY id LIMIT 1000 )x
ORDER BY RAND() LIMIT 1;

This will first select the first 1000 records, and then pick a random row from those.

NOTE: This behavior is different from that you where doing, but faster because only the first 1000 records need to be fetched.

NOTE2: You can change 1000 to another (bigger) value, as long as you think performance is OK (and this alternative way of selecting is OK.)

EDIT: An alternative approach would be to do a dynamic (read RANDOM) offset, like given in this answer: https://stackoverflow.com/a/5873705/724039

This should be done in a stored procedure, and would look something like this:

CREATE DEFINER=`root`@`localhost` PROCEDURE `getRandomRow`()
BEGIN
    DECLARE offset BIGINT;
    SET offset = RAND()*600000; -- change this to get the number of records from your table, and not the fixed 600000
    SELECT id,postalCode,location 
       from jobs_feed 
       WHERE check_parsed= 1 AND similarjob_status=0 AND is_updated != 0
     ORDER BY id LIMIT offset,1;
END
Luuk
  • 9,042
  • 4
  • 20
  • 28
  • Thanks, Luuk, but I'm looking to fetch a random id from whole 600K records. – Chowdary May 22 '22 at 08:55
  • That restricts you to the first (or last) 1000 rows. It is not practical to use this technique to sometimes get rows from the 'middle' of the table. – Rick James May 23 '22 at 01:01
  • True, that why I wrote the NOTE, because yesterday I did not have enough time to think if one could add a dynamic (=random) OFFSET. An answer using dynamice OFFSET is given here: https://stackoverflow.com/a/5873705/724039 With some change it could be changed to random OFFSET with a LIMIT of 1 record. – Luuk May 23 '22 at 17:43
0

Add a composite index as already discussed, then pick the appropriate algorithm from this set: Random

Rick James
  • 122,779
  • 10
  • 116
  • 195
-1

There isn't much way to optimize ORDER BY RAND() LIMIT 1. But we can try adding the following index to speed up the WHERE filter:

CREATE INDEX idx ON jobs_feed (check_parsed, similarjob_status, is_updated);

This index, if used, can potentially allow MySQL to discard many records from the query pipeline, leaving behind only a small subset to randomly shuffle.

Tim Biegeleisen
  • 451,927
  • 24
  • 239
  • 318