THE BLOG IS DEAD! LONG LIVE THE BLOG!

September 1st, 2012

The first post on this blog was made on September 13th 2007 – almost 5 years ago. I’m actually very surprised it’s been this long.

Over these 5 years lots has changed. Most importantly, I’ve grown weary of WordPress and Github Pages have emerged. Jekyll is the nail in the coffin. I’ve gotten just tired enough of jumping through hoops just to be able to write my posts in Markdown and of upgrading WP.

Without further ado, here are the links to my new blog:

This blog will be left up and set to read only (including the database). There are a few decent posts here (I’ll especially want to keep the Year in Review posts), and some traffic does still come through here from Google.

It’s been a good run.

Catch you on the flip side!

Come to Surge 2012, it’ll be awesome (bonus: I’m speaking)

July 19th, 2012

Surge is the practitioner’s conference. The speakers are encouraged to share real stories and connect those to actionable recommendations; they oblige. I came away from last year’s talks with a brutal list of things I felt embarrassed I didn’t know more about. Identifying important holes in my knowledge/training is one of the most important motivations for going to conferences in the first place, so Surge ranks highly on my list of conferences to get to.

Naturally, I am unbelievably flattered to have been selected to speak there again. I’ll be showing off a system I built at Urban Airship for performing complex queries against heterogeneous data stores and narrating its development. Some more details here.

Photo courtsey of Surge 2012 website http://omniti.com/surge/2012

Be a better technical communicator

March 22nd, 2012

This is mostly a reminder to myself, but I know I’m not the only one that suffers from this. Partially due to my non-technical education, but mostly due to lack of attention, I often find myself frustrated while trying to explain an idea or a problem to a co-worker. It either takes me way too long or feels way too clumsy in the first place, or requires iteration, repetition, and drawing things in the air. And I have some smart coworkers.

Be a better communicator

A lot of the concepts discussed are generic; this post is merely an application to a familiar domain. The principles were taught to me (and since gradually forgotten) in debate and speech classes in highschool many moons ago.

Own the vocabulary

There is a widespread epidemic of overstatement and term misuse in the industry. I won’t call anybody out specifically, but the themes will be familiar.

What is “scale”? Which direction are you scaling?

What is “fast”? Are you talking about throughput or latency? (Unsure you understand the difference? Shut the fuck up, Donny, you’re out of your element.)

What did you measure, in what units, over what time?

Your quotidien interactions with colleagues will be much improved by a more faithful diction as well. If you’re a database engineer, learn the terms. I was guilty of this for a long time while working on indexing at SimpleGeo, and still am to some extent. Stand in your interlocutor’s shoes: it’s like listening to your parents explain a computer problem, or a person that knows nothing about cars explain a mechanical issue they’re “hearing” or “feeling.” If you don’t know that the Cs in CAP and ACID stand for, no time like yesterday to find out.

The above is especially true of high-pressure situations. Systems don’t “freak out” or “shit themselves.” If you find yourself personifying a deterministic system frequently, you should do the due diligence of understanding the events or abstain from commenting during an outage – you’re only going to frustrate the folks involved.

Which brings us to…

Know your audience

Just as you have to downgrade your vocabulary when explaining computers to your parents, you have to pump the breaks when dealing with someone new to the domain in question. Last week, I was trying to explain a system I was building to a co-worker. He called me out, and was right to do so: within seconds, I used cursor pagination, predicate tree, merge join, transform, conjunctive operator, and the list went on. No term in and of itself was unfamiliar to him, but he had just spent 4 years at Flickr, working on much higher-level stuff. To top it off, I later I realized I wasn’t using “predicate tree” in a widely-understood way. Womp!

By that same token, you may reach (and I have reached) a point where you have to read academic papers while researching problems. Many will have the feeling of being useful that you will initially be unable to confirm or dispel due to the frustrating onslaught of jargon. Your choices are simple: give up and hack something together, or learn the argot and join the intended audience. If you choose the latter, you will be able to turn the aforementioned feeling into certainty. It’s a time-consuming endeavor that is not often easily justified.

Pause

Most importantly, just take your time. If the formulation feels wrong in your head, just pause and think about it. A technical discussion is easily derailed by an incorrect word choice; attention and credibility is lost quickly and difficult to regain. You were encouraged to favor a pause over saying “um” back in school – nothing’s changed.

Be direct and explicit

All of the above will help you reduce your signal-to-noise ratio, improving your odds of being understood and listened to. To get an idea of the effect of delivery, consult the plethora of literature on the role of cultural differences in aviation accidents. Some examples are referneced here.

2011 – one to remember

January 14th, 2012

This post, as I write it, reads a bit #humblebrag-ey, but I’m going to keep it that way. 2011 was an amazing year in my life, and I can only hope there’ll be more like it. The post is also pretty boring, so don’t read this unless you actually want to know what I was up to last year. I mostly write these for myself.

Grades

Fitness: A-

Had to give myself a slight bump from last year: been playing soccer consistently twice a week. Added an 11v11 full field 90 minute game on Sundays. We never had subs, and I was eventually able to play through the entire game without being completely out of gas – something I could never do even at the end of a full season in high school. Fixing my feet last year is really paying off. Guessing my asthma is slowly getting less severe as I age, though it’s still there. The end of the year was marred by a really shitty leg injury that I brought entirely upon myself during a 6v6 game (pro tip: don’t cut people off from the ball aggressively when you’re already up 9-0 and they’re in a full sprint); I spent most of November and December rehabbing it. I’ll be ready for the start of the January season, and actually might come back with better balance due to all the rehab work I did.

I also started riding my bike to and from work – initially almost every day. I have since gotten a bit lazy, even before the leg injury put it out of the question. I suspect I’ll get back into it, especially as it gets warmer.

Travel: B-

I finally made it to a SXSW. It was an absolutely epic week (did not go to a single talk.. didn’t even have a badge), but I probably won’t go again – Austin is too small and boring of a place to spend that much time at, and I can go to parties and get drunk with nerds anywhere. There is talk of a trip to Vegas instead.

Went to Amsterdam with Alexa and a couple of friends, which was awesome. The last full day we were there, we rented a bunch of bikes and went on a ride through the Dutch countryside – probably one of the most beautiful experiences of my life. Gabe came up from Paris and hung out for a few days.

Finally made it to an away Blues game in Denver with Wade. We also got hit on by a cougar – always a treat.

As part of the Urban Airship acquisition of SimpleGeo, I got to explore Portland on a couple of trips, and actually did a decent job – tried a bunch of local restaurants (including the food carts!), and went on a big brewery crawl with coworkers. Portland is way too clean a city for me, but it is a great place in general. Looking forward to spending more time there.

After attending Surge, I spent a couple of nights in Philly, hanging out with my old high school friend Blaise. Ended up being a very relaxing trip, though I did not explore much of the city during the daytime – Blaise was too hung over, and slept most of the day.

I still wasn’t as opportunistic with my trips as I would like. 2012 promises to be an even better travel year – in the middle of planning a trip to Belgium around the Fosdem conference! Also on the list are Vegas, Seattle, DC, and LA.

Work: A-

During the first part of the year, we deployed our awesome distributed index to production and started iterating on it. I was heavily involved, though I will admit that I was pretty clueless at the time. For a large part of it, I was blindly implementing things I only half understood. However, as active development slowed down due to external distractions, I was able to go back, read some books and papers, and actually get a grip on what was happening. I feel like I ended the year solid on Java, concurrency (particularly on the JVM), and broader topics related to distributed systems. By the time acquisition talks came around, I felt like I could speak intelligently to an erudite audience about the work we were doing and possible future directions. As I thought about various problems, things others had said before that I just took for granted started to make sense. I refactored a lot of old code, both mine and otherwise, with unequivocally positive results. Feels good, man! I have also identified tons of other things that need to be refactored or rewritten in the process – need more hours in the day, as always.

I got a bit obsessed with parallels between complex systems in various industries and web architecture. That lead to a fairly productive cycle of research and reflection, resulting in the Surge 2011 talk.

I believe it was the best talk I had ever given, low as the bar may be, and the feedback I got on it confirms that. I will look to build on that. Also moderated a panel and took a smaller part in a couple other multi-speaker events. Getting up on stage for a keynote with Matt at Where 2.0 was probably the most intimidating public speaking experience to date, albeit a short one.

A big goal for 2012 is to get accepted at Strangeloop again, so I’ll likely be submitting multiple talks to that.

I’m keeping the minus, because I feel like I could have worked a lot smarter (not harder) – several books and papers that I read half way through the year would have made a tremendous difference in my productivity if read sooner.

Writing: C-

I feel about the same about this year as I do about last year: a couple of the posts seem worthwhile, but one of them is piggy backing on Surge and another on someone else’s research. I could do a lot more, and hopefully will in 2012. Since writing more was one of my goals from last year, I have to give myself a downgrade.

People: B

I made quite a few new friends this year, which I’m very happy about. I caught up with an old friend I hadn’t spoken to in nearly 4 years in Denver – that’s pretty big! I did a decent job running into people as I traveled to their cities. I also fell out of touch with others, but the overall trend is good.

Music: F

Absolutely zero improvement there. Just did not do anything.

General Learning: B+

I’ve been reading tons of papers and slowly working my way through a pretty low level computer systems book. I feel like I’ve grown more in the past year as an engineer than I did in the 3 years before that.

Outside computahs, I read a few general interest books (mostly related to business and the financial crisis). Taking advantage of any opportunities I have to pick the brains of more experienced people, I learned a bit about startups, the venture model, and other general financial topics. I’ve also started regularly reading the Economist and paying attention to financial and political happenings. For the most part, it’s just depressing – it’s been all about the Euro crisis and the US election since I started reading, but I do feel parts of my brain that have long been dormant coming back to life. Reading about non-technical subjects has caused me to generally feel more sharp, since my mind isn’t completely tunneled all the time.

Goals for 2012

  • I want this to be the year I start to feel comfortable on a snowboard. Alexa and I are both getting all our own gear this year, and I’m putting a roof rack on the car this week. No more excuses. Mother nature, of course, is shitting on my plan by withholding the snow.
  • Getting a talk accepted at Strangeloop. I will obviously apply to speak at and attend many other conferences as well, but Strangeloop is my annual Moby Dick.
  • Fucking pick up the guitar. It’s just sitting there in the corner, mocking me.
  • Start participating in some sort of regular exercise that requires the use of French. Just added a “I can also communicate in french, if that’s easier” to the end of an email with a Belgian landlord, and it took like 5 minutes to put that together and double check.
  • Get through the Computer Systems book; keep reading books, technical and not.
  • Write some meaningful code in assembly, C, and a functional language (probably going to end up being Clojure).
  • (Re)learn me some discrete math, with particular focus on set theory and logic.
  • Learn R, re-learn statistics, advanced regressions. Wonder if I still have those notebooks.
  • Go broader on databases. People apparently consider me a Big Data guy now. I think that label is fucking stupid, but I should probably get a better grip on databases that aren’t MySQL and Cassandra.
  • Write more – technical and not. Keep up the private travel log I started.
  • Have fun in St. Louis. Every year I go, and every year I hate it. There are fun things to do in St. Louis, we just have to do them.
  • Meet more strangers. Wade and I took a chance and hung out with two dudes we met at a bar playing ping pong after the hockey game, and it was fucking awesome. Mike, Richard and I also nearly got in a fight with some xenophobes in Amsterdam; need more of the former, less of the latter.
  • Be a better editor. I used to write laconically, and people liked it. This reads like a 12 year old wrote it.
  • Spend less time reading Twitter and other internet horseshit.
  • Actually dress up for Halloween. I got all the stuff to make an awesome Bender costume this year, and then got lazy and didn’t go out at all.

Photos from 2011

portrait of a drunk , aka @mjmalone Bacon log ftw, awesome job @cap Yeah were into the portion of the evening in which @mjmalone dances with the airmattres Um @formspring better pay me for making them look this sexy cc @cap Wearing sunglasses indoors. @cap is keeping it real. Oh man rickhouse keeping it mad real True CEO grit: @jayadelson delivering ice cold beers to employees in need of refreshment View from @typekit office Yeah, I'm picking my lady up in a photo all cliche and shit big woop wanna fight about it Hipster @rcrowley is hipster View from our hotel window Gabe + Julianna - Photo of the Year Bicycle status: mounted. /cc @rcrowley Step 2 Beard Update 06/14/2011 Eric turns thirty somethin Cameron and Larissa Grimaldi's mroth and mroth Labor Day BBQ @ Crissy FIeld Ingrid Visits SF SG Office Silliness Blaise: "last time I saw you in 2008 I came up to NY and you were zoned in on @alexaguerra. I was pissed." #mybad Fleet Week 2011 Fleet Week 2011 Random Dinner Cardinals World Series Win 2011 Cardinals World Series Win 2011 Im on a boat Ladies?... King shit lunch with @nolancaudull Get to Know a Portland Get to Know a Portland In snowy denver with @wadey

Surge 2011

October 21st, 2011

Surge was, as expected, an awesome conference. I’m so glad I got to go in only it’s second year, as I will now forever be able to say “I went to one of the first Surge conferences!” It has quickly become a must-go devops/scalability conference. I’ve sat on this post for a little too long, but better late than never.

“Also, cloud” – Hindsight

The talk I delivered was about the challenges of building a hosted service on a cloud infrastructure. The abstract is here, the slides are here or here if you prefer a PDF. Apart from some poor clock management early on, the talk went fairly well – we’ll see what the audience thought of the content when the ratings come out.

As always, there were things I realized I missed after the talk was over. Here are some follow up thoughts on topics I did not address to my satisfaction – that damn hindsight!

Multi-Provider Strategy

During the talk, I was a bit dismissive about the idea of a multi-provider cloud strategy. Jay Janssen rightfully called me out on it in the Q&A (small aside: I had seen Jay’s name all over various internal Yahoo! mailing lists during my tenure there – conferences are awesome for putting names with faces like that). Jay pointed out that my distaste for multi-provider strategies was in direct conflict with the theme of the talk – decoupling. If you’re trying to decouple from everything, shouldn’t you also decouple yourself from your provider?

Jay is correct – philosophically, I should be all for multi-provider. However, as with many efforts at decoupling, going multi-provider introduces a substantial amount of complexity; an unreasonable amount in my opinion. Let’s think about what this entails.

First and foremost, any redundancy strategy would have to be hot-hot. I do not believe in hot-cold or “spares” because I’ve seen them fail every single time. It’s just too easy to let the spare deteriorate and find it in a state of utter disrepair when you need it most.

So now you’ve got essentially two copies of your infrastructure. Suddenly you have to manage:

  • Network connectivity between the datacenters
  • Data replication between the datacenters (related to the above; admittedly, you’d have this problem going “multi-region” on ec2 only)
  • Different versions of base OS’s
  • Different kernel builds
  • Different underlying performance characteristics
    • You have to come up with a formula for deploying equivalent capacity across providers
    • You now have two capacity plans
  • Multiple provisioning pipelines (yes, you can abstract most of this)
  • Multiple sources of Heisenbugs (different virtualization hosts)

There are likely other things that will come up. Where do your load balancers go? How do you compensate for different providers’ divergent feature sets? It could be that you have a hefty budget and availability is your highest concern. In this case, the additional complexity introduced by the additional provider could well be worth it to your organization.

However, at that point you might take a step back and re-evaluate – should you even be using the cloud? People use cloud infrastructures for all sorts of reasons, but one of the big ones is the lowered barrier to entry for building up a distributed infrastructure – no need to deal with datacenter leases, uplinks, hiring a site-ops team, and all sorts of related unpleasantness. Having to manage multiple cloud providers increases the complexity of your deployment to the point that I start to wonder if it’s worthwhile.

Additional Examples of Simplified Datacenters

As part of my emphasis on the importance of reducing complexity, I made references to a talk I had attended by Yahoo!’s Mike Christian at geekSessions 2.2, in which he touted things like datecenters built in colder climates with no HVAC system at all – one less component to potentially fail (I’m pretty sure similar strategies have been in deployment since at least 2008). Since the talk a few folks pointed me to additional impressive examples:

  • AOL has been showing off its “human-free datacenters” – The Register article with additional links within here
  • Google has also been doing clever things with datacenters. A good, albeit dated, example can be fond here

“Amortizing Complexity”

(not to be confused with amortized computational complexity)

There’s an awesome term found in Sidney Dekker’s “Ten Questions about Human Error” – “amortizing complexity” (it is used as part of a discussion of G. Ross’s “Flight strip survey report” (1995), but I can’t find that paper). The term refers to techniques that allow the operator to interface with a complex system using only a handful of parameters. I really wish I had thought to use this term in my talk. Those that saw it might recall me pulling on an imaginary lever to symbolize an operator removing an entire datacenter from operation in the event of an unexpected failure. “Amortizing” describes this action perfectly – instead of having to figure out an unexpected, possibly baffling interaction in a complex system under production pressure, the operator is able to just disable the entire piece of infrastructure and investigate leisurely. Instead of having N possible ourses of action in the event of a failure, the operator has one attractive option that enables the problem to be sorted out (amortized) over time.

Of course, amortizing complexity almost inevitably leads to some corner cases where some variable that is normally unimportant is ignored. However, this shouldn’t stop us from trying to improve the normal case and relying on monitoring techniques such as confidence bands to alert us when more obscure metrics are not what we should expect.

Thoughts about the conference

I’m going to write down some general themes from the conference; the themes are in no particular order.

Theme: Node.js

With Joyent heavily represented and the likes of Bryan Cantrill in the crowd, there was lots of talk of Node.js. I was encouraged by the fact that most of the applications mentioned were for usecases that I actually believe to be appropriate for the platform – small, near-stateless services. Time will tell if Node.js fulfills its destiny of being the next Rails, but for now, it’s making for some hilarious internet shit talking to be sure.

Theme: AWS Bashing

One thing that soured the event for me was the constant bashing of AWS by just about everyone in attendance. Being the most popular girl is hard, especially when you don’t come to the party.

Theme: Disaster Porn, aka Generalist Porn

The part about engineers and ops folks loving to talk about the gnarly shit they have lived through is not new. The thing that I found refreshing (possibly because I wasn’t at last year’s conference) is the emphasis on being able to go up and down the stack. From Ben Fried’s keynote to Artur Bergman’s “full stack” talk to Theo’s closing notes, the theme of being able to follow a bug through the entrails of your platform kept showing up, accompanied by a bewildering story of the descent. The recurrence of “baffling” failures set my talk and the larger theme of humans interacting with complex systems nicely. One thing I’ll say is, listening to guys like Artur and Theo describe the debugging process for some of these low level bugs inspires to dig deeper and better know a computer.

Overall, the conference was what a conference should be – a thought-provoking experience where I got to meet tons of awesome people. Definitely looking forward to going again next year.

Things I can’t believe still exist, First Installment

July 17th, 2011

I’m going to start documenting the shit that just blows my mind on a daily basis.

Hosts that allow you to use FTP to upload files

Seriously. FTP is just a fucking mess. Every client I’ve worked for that used FTP has been compromised. EVERY TIME I find a folder full of bullshit scripts of unknown origin, the existence of which nobody can explain. Fucking turn that shit off. There is a sufficient number of SFTP clients out there, it’s well worth the marginal extra hassle.

Restrictions on characters allowed in passwords

Seriously. Get the fuck out. If you were responsible for security for Verifed By Visa and you decided it was appropriate to even give a shit about what’s inside a password, you should be fired. I’m ok with requiring that multiple classes of characters are required (like at least one number), whatever, that’s fine. But why the fuck can’t I have the @ symbol in my password? What the fuck do you care? You should be one-way hashing that shit and forgetting about it. SO MAD RIGHT NOW.

Unix-y things that make me happy: disown

July 16th, 2011

I’m on a perpetual quest to better know my operating system, so I’m constantly discovering awesome little bits that make life easier. Recently, I discovered the disown command, which turns out most of my friends didn’t know about either, so here’s the rundown:

disown can be used to detach jobs from the session they were started in. It is the solution to when you forgot to nohup/screen a long running process, and now want to disconnect from the server and go home. It’s the retroactive nohup.

There’s a decent writeup of nohup, disown, and related concepts here and some more specific info here.

A quick note on cron utilities

July 10th, 2011

Apparently my destiny is to wrangle cronjobs, because they are back in my life again.

I was discussing overlapping crons with a coworker, and we were discussing options for preventing that. I’ve long been a huge fan of flock (1) for this purpose; since we were talking about a metrics-collecting cron that occasionally hung due to resource starvation and ended up spawning copy upon copy of itself, he suggested it might be easier to just make it timeout. He talked about forking a process and having the parent time the execution and killing it. “There has to be a simpler way,” I thought.

A bit of research on a sunday afternoon while nursing some minor whiplash revealed that there is. It also revealed that people are fucking crazy, far as I can tell anyway.

The “state of the art”

I tweeted about my search for a utility that would wrap a process and kill it if it was misbehaving. I got a bunch of responses back, including a suggestion to check out the sysadvent article about cron practices. I checked it out and found a shitload of random Ruby and Shell scripts. I know we all love writing our own code to do shit, but I always prefer decades-old C code do my bidding for me.

Since wrangling crons using previously-invented wheels appears to be a lost art, here’s my part for bringin’ it back.

flock(1) – prevent jobs from trampling on themselves

The flock util is available on every Linux box I’ve ever logged onto; it is mind-numbingly simple to use. It has only a few options, most of which you only need if your case is special, like if your job operates on the file you’re using to lock. I don’t believe that’s the common case – usually, the lockfile is used to simply indicate that a process is already running. Your crontab line will look roughly like this:

* * * * * /usr/bin/flock -n /tmp/lockfileforyakshaver /usr/bin/yakshaver

This will cause this job to fail if another process already has a lock on the same file. The job will fail with code 1 if the lock can’t be acquired. Alternatively, you could allow flock to wait a few seconds (let’s say 5) before failing:

* * * * * /usr/bin/flock -w 5 /tmp/lockfileforyakshaver /usr/bin/yakshaver

That’s it. No need to write a custom script around it, no need to distribute this script to boxes and force your coworkers to read it to audit the crontab. All they have to do is man 1 flock. And you know this code works – it’s been around for fucking ever. Even if you need some more complicated behavior, you would be better off using flock inside a shellscript than writing your own implementation.

Note: this will obviously only work for jobs that only need to lock locally. Jobs that need to share, say, a database, will require something more complex. An article I’ve linked to before discusses what you might do with a MySQL database. You can also use something like Zookeeper for this more complicated usecase, but then you might have two problems. Just noticed Peter’s article references a PHP script he uses that essentially replicates the behavior of flock(1). Sigh.

timeout(1) – prevent jobs from running for too long

This utility is an administrator’s dream, far as I can see. Check out the two options it has:

   -k, --kill-after=DURATION
          also send a KILL signal if COMMAND is still running this long after the initial signal was sent.

   -s, --signal=SIGNAL
          specify the signal to be sent on timeout.  SIGNAL may be a name like `HUP' or a number.  See `kill -l` for a list of signals

So you tell it what signal to use normally using -s (defaults to TERM), but you can then add a-k option to make sure that if it doesn’t catch your drift on the first go around, it is swiftly KILLed. How fucking nice, right? So instead of using flock, you might time your yakshaver:

* * * * * /usr/bin/timeout -k 59s 30s /usr/bin/yakshaver

This will send yakshaver the TERM signal after 30 seconds and follow up with a full on kill after 59 seconds, presumably just in time for the next iteration of the cronjob. Nice and easy. And it’s all just part of linux, ain’t that fuckin’ great?

But I like ruby!

Yeah, that’s fine. However, these are succinct, declarative, straight forward options that require minimal cognitive overhead to understand and debug. There is enormous value in that. They are also part of what one might refer to as a canon – a long tradition of simple, short utilities that perform one specific task exceptionally well. While it’s en-vogue these days to throw out “the old ways” in favor of something shiny, we’re starting to see that maybe that’s not the best way to go. Simple is good, especially when the requirement is also simple.

Small aside: solving the problem from the other end

I realized as I was re-reading this post just before hitting “publish” that I would be remiss if I didn’t mention another solution we’re considering to the problem of monitoring crons being unable to run to completion: a company called Librato, which is run by some very smart friends of mine, offers a service called “SIlverline,” which wraps processes in containers that ensure that sufficient resources are always available. Think of it as a rigid allowance for the observer effect. It does quite a few other things, but I’ll let you make up your own mind, lest I start gushing: https://silverline.librato.com/.

My view on “BitCoin” summarized in a conversation

June 25th, 2011

Thanks to my friend Zooko for always being a good sport, and to exquisitetweets.com for making it marginally easier to put the damn graphic together.

InnoDB Primary Key Clustering In Action @ Flickr

May 8th, 2011

InnoDB clustered indexes is one of the most powerful, yet poorly understood optimizations.

The idea is that InnoDB will try to place items in primary key sort order together on disk. Thus, if your primary key is a numeric column called id, rows with id 1, 2, and 3 will be adjacent on disk, as will rows with id 21, 22, 23 etc. If you’re fetching rows that are adjacent when sorted on primary key, they will be adjacent on disk, resulting in sequential disk access. This sort of thinking is important when you make the decision to put on your big-boy pants and not just put your entire data set in memory.

The part that people don’t often think about is that composite primary keys follow the same rule. Tim Denike, Flickr’s DBA, demonstrates just how powerful the effect of clustered indexes can be:

The trick, as explained in the comments on the photo, is that most queries on Flickr end up being queries for multiple photos by the same owner (I suspect narcissism has a lot to do with that..) Thus if you have an index on owner_id, photo_id, the fact that the owner_id is the prefix causes photos from the same owner to be placed together on disk.

Obviously, since not all the photos are written to the database at the same time, some fragmentation is inevitable in the real world; nonetheless, it’s clearly a huge performance win. I believe some routine maintenance commands (OPTIMIZE? I can never remember all the different commands, someone will have to fact-check this..) will effectively “defrag” the table, moving all the rows into sequential primary-key order.

InnoDB clustering is covered in great detail in Ch. 3 of “High Performance MySQL