Abusing MySQL: The Federated Engine

I don’t have quite the experience that Kellan and Richard do with wrangling databases (yet), but I have seen some relatively unorthodox stuff. I’ll write a quick note about something quirky we did back when I worked on internal tools at Yahoo!

The Problem

We had one central database that contained a lot of the information about the company’s infrastructure (say, db_central). Among other things, it contained information about users, user groups, and inventory, for some definition of that word.

There were several other tools built around it that had their own databases, but still relied on some of the same data, particularly the user-related tables. We wanted to be able to do joins across the databases, but you can’t do that easily when the databases are on different physical boxes.

You could set up replication on the box that the auxilary database is on (say, db_app1), but we couldn’t do that either – we were doing dual-master replication for HA, and a single MySQL instance can only slave from one host at a time.

Federated Engine

One of the advantages of being an internal team is the ability to stay on top of the ‘new hotness’. Since our datasets and userbases were always relatively small, we were able to upgrade frequently; we were on 5.0 and 5.1 fairly soon after they were released.

With 5.0 came the Federated Engine. It allows you to create a sort of shim for a table on a remote machine and access it as if it were a local table; most notably it allows you to join against the remote table.

Obviously, this sounds like a performance nightmare. Though we never tested it in a straight-forward setup (you’ll see what I mean in a second), and it might have turned out OK for this particular use-case, even at our relatively small size and low traffic, slow joins were a serious problem (at Yahoo!’s size, even the internal apps had multi-million row tables). Adding network latency to that was not something we were interested in.

The Prestige

This is where one of the (other) crazy Russian guys on the team thought of something awesome (can’t remember if it was Andrey or Alex.. neither has a blog, unfortunately).

We would set up an additional instance of MySQL on another port on each of the db_app1 masters (call them db_app1_plus). This instance would be a slave for db_central. Then, on the db_app1 instance, we would set up a Federated Engine table (aha!) that would point to the db_central replica on the db_app1_plus port over localhost. Though we never scientifically benchmarked this setup, in our limited testing it worked like a charm and performed beautifully.

Of course, as I said before, this would not work for a high-traffic production setup. However, it did allow us to simplify the code in our internal apps (and you always want that code to be simple and readable) and did not cause any noticeable performance degradation or additional operational headaches. As far as I know, that setup is still in place.

I’ve recently come back to this idea for some uses at Flickr (mainly background data-mining jobs) and keep forgetting to talk to Kellan about it. Let’s see what he says :D

Leave a Reply