Storing Images with Unique Names in MySQL
Our applications can get 10’s of thousands of images uploaded daily. The business requirements asked that the image names be obscured, so someone doesn’t write a simple crawler to get image 001, 002, 003, etc and that the images are stored in an S3 bucket. We came to the conclusion to create and store the image file names in the db. The backend code takes care of the uploading. SHA2 (256) will generate a binary string between 0 and 2^256. That’s a large enough number that crawling it will not be so easy. 2^64 = 18,446,744,073,709,551,615. Raise that to the 4th power and you have the number of possibilities stored in an 256 bit value. The basic plan is (completely in MySQL):
1. Generate the SHA2(256) hash of some random data.
2. Compress this value into base64 (*01) and use it as the image name.
To do step one accurately, we had to first get random data. MySQL has a RAND() function. The actual randomness of this data is not important as it is in encryption, because there is no security involved. Psuedo-random will be ok.
Below is the function I am using to convert a base16 number (hex) into base64. The returned string may look similar to a YouTube url. Now that we have that, we can send this function the hex output from the SHA2 hash function and get back a compressed format.
CREATE FUNCTION base16_base64( _textIn VARCHAR(255) ) RETURNS VARCHAR(255) NO SQL SQL SECURITY DEFINER BEGIN DECLARE base64Num VARCHAR(255) DEFAULT ''; DECLARE chars CHAR(64) DEFAULT '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_'; DECLARE curPos INT DEFAULT 1; DECLARE curConversion INT UNSIGNED; SET curPos = CHAR_LENGTH(_textIn) - 3; WHILE curPos > -3 DO SET curConversion = CONV(SUBSTRING(_textIn, IF(curPos > 0, curPos, 1), LEAST(3, curPos + 3)), 16, 10); SET base64Num = CONCAT(SUBSTRING(chars, FLOOR(curConversion/64) + 1, 1), SUBSTRING(chars, curConversion%64 + 1, 1), base64Num); SET curPos = curPos - 3; END WHILE; RETURN TRIM(LEADING '0' FROM base64Num); END~
Here is the function I am using to get a new image name:
CREATE FUNCTION nextImage( ) RETURNS CHAR(42) READS SQL DATA SQL SECURITY DEFINER BEGIN DECLARE picExists BOOLEAN DEFAULT TRUE; DECLARE base64Num VARCHAR(100) DEFAULT ''; WHILE picExists DO SET base64Num = LPAD(base16_base64(SHA2(RAND(), 256)), 42, '0'); SELECT COUNT(*) > 0 INTO picExists FROM images WHERE imageURL = base64Num; END WHILE; RETURN base64Num; END~
With these two functions made, and an images table, you can start making randomized image file names and storing them. This could certainly be expanded on to generate and compress urls for anything else that needs to be stored. We are also using it to store files that need processing. If you are concerned about concurrent calls and the possibility of naming two images the same, wrap the calls in a GET_LOCK and RELEASE_LOCK.
Be sure to index the image name column (at least the first few bytes) or lookups will be very slow (table scans).
(*01) Base64 in the sense of base10, base2, etc. This is not the equivalent of base64 encoding.