InnoDB Primary Key Clustering In Action @ Flickr

InnoDB clustered indexes is one of the most powerful, yet poorly understood optimizations.

The idea is that InnoDB will try to place items in primary key sort order together on disk. Thus, if your primary key is a numeric column called id, rows with id 1, 2, and 3 will be adjacent on disk, as will rows with id 21, 22, 23 etc. If you’re fetching rows that are adjacent when sorted on primary key, they will be adjacent on disk, resulting in sequential disk access. This sort of thinking is important when you make the decision to put on your big-boy pants and not just put your entire data set in memory.

The part that people don’t often think about is that composite primary keys follow the same rule. Tim Denike, Flickr’s DBA, demonstrates just how powerful the effect of clustered indexes can be:

The trick, as explained in the comments on the photo, is that most queries on Flickr end up being queries for multiple photos by the same owner (I suspect narcissism has a lot to do with that..) Thus if you have an index on owner_id, photo_id, the fact that the owner_id is the prefix causes photos from the same owner to be placed together on disk.

Obviously, since not all the photos are written to the database at the same time, some fragmentation is inevitable in the real world; nonetheless, it’s clearly a huge performance win. I believe some routine maintenance commands (OPTIMIZE? I can never remember all the different commands, someone will have to fact-check this..) will effectively “defrag” the table, moving all the rows into sequential primary-key order.

InnoDB clustering is covered in great detail in Ch. 3 of “High Performance MySQL

Tags: , , , , ,

Leave a Reply