MySQL Sort Randomly and Efficiently with Stored Procedure
Sort Randomly and Efficiently Purpose
There are many things on websites that developers need to display randomly and efficiently. Examples include classifieds, news feed items, and pictures. As a human, it makes sense to see a random list of the newest content posted to a website. If the website is getting thousands of images an hour, could someone really keep up with all of them? Probably not, so it is better to display the user a random list. If the website has many levels of categories, it would be difficult to know everything that is on the site. Using random sorting when displaying content to the user will increase their coverage of the website.
To get this random data, you will need to get content from your data source randomly. Let’s use the example of storing image names in a database table called pictures.
At first, this may seem challenging, because what’s random about rows of images in a table? Assuming you’re using an RDMS, especially MySQL, this can be accomplished very easily with the following code:
SELECT pic FROM pictures ORDER BY RAND() LIMIT 10;
The way this works is MySQL reads every row in that table and generates a random number between 0 and 1 for each row. It then sorts by that number and returns 10 rows with the lowest number. This is not efficient at all though. The problem comes with an increase in number of rows. Even if the query returns 10 rows, MySQL needs to hit every row in the table and generate a random number. This can become unusable when the size of the table gets larger (100’s of millions)
As a human, we can look at any number of things and efficiently pick things randomly. For example, pick 3 random cards from a deck after it falls on the floor. Computers can only see one thing at a time though. That would be equivalent to the deck being brand new and not fallen on the floor. You can only see one card at a time and every card needs to be pulled out of the deck before deciding if it will be part of the random group.
How can we tell MySQL to throw all the rows on the ground and just pick up 10? Maybe instead of telling it to randomly select 10 from 100, we could generate 10 random numbers first, and then have the query fetch rows with matching ids? Imagine if each card in a deck had a number from 1 to 52 on it. Intead of MySQL generating 52 random numbers, ordering them, and then pulling the top 10, lets generate 10 numbers between 1 and 52 and then have MySQL go get the cards with those numbers on them. Below is the code to generate a random number between a range in MySQL.
FLOOR(low + (RAND() * (high - low)));
With this code we can generate numbers that can match the numbers on our cards by setting low = 1 and high = 52. We can get 52 by getting the largest number on the deck. If we do this ten times, we can tell MySQL exactly which cards to get so it can use an index and not a table scan.
* If a card is removed from the deck, we need to make sure we don’t try to search for it.
* Searching for the same card more than once still only means one card
Stored Procedure Implementation
CREATE PROCEDURE getPictures( _numPics INT UNSIGNED ) MODIFIES SQL DATA SQL SECURITY DEFINER BEGIN DECLARE minId BIGINT UNSIGNED; DECLARE maxId BIGINT UNSIGNED; DECLARE _id BIGINT UNSIGNED; SELECT MIN(picId), MAX(picId) INTO minId, maxId FROM pictures; DROP TABLE IF EXISTS randIds;CREATE TEMPORARY TABLE randIds(id BIGINT UNSIGNED NOT NULL PRIMARY KEY); WHILE (SELECT COUNT(*) < _numPics FROM randIds) DO SET _id = FLOOR(minId + (RAND() * (maxId - minId))); INSERT IGNORE INTO randIds (id) SELECT picId FROM pictures WHERE picId = _id ; END WHILE; SELECT p.picId, pic FROM randIds AS r JOIN pictures AS p ON p.picId = r.id ; END~
This MySQL Stored Procedure takes a number of random results as an input and outputs that many results very efficiently.
- The first select statement gets the smallest and largest ids from the pictures table. This is 1 and 52 in the deck of cards.
- The next line creates a temporary table to hold these random ids. It has a constraint of uniqueness so we don’t attempt to search for the same id twice
- The loop will continue to execute until there are _numPics ids in the temp table. Within the loop we generate a random id within the range and put that into random ids table if it exists in pictures. This will help if we generate an id that does not exist.
- Finally we join the random ids on the pictures table and return those results.
This is a very efficient way to get a random data set from a table and will scale with your application. Instead of using a computer to sort randomly but not efficiently, we are using human logic to sort randomly and efficiently.