Posts Tagged ‘php’

Define Failure…

Saturday, December 20th, 2008

failure graphicJust saw a salient example of something I notice quite a bit in various documentation sources.

Lots of manuals and API references will say stuff like “Returns FALSE on failure” with little to no clarification as to what that means. Though it is usually intuitive, there are frequent cases where a little more attention should be paid. Example: PHP manual page for Memcache::delete.

The method takes two parameters: the key to be deleted and the optional timeout. The second parameter specifies how long Memcache should wait before deleting the key. Like many others, the function “Returns TRUE on success or FALSE on failure.”

The primary use case of this method is obviously just deleting a value stored in memcached. But let’s actually examine the possible outcomes.

  • Key is found and successfully deleted. Obvious Success.
  • Memcached server cannot be reached. Obvious Failure.
  • Key is found, but some sort of network or server glitch prevents it from being deleted. Obvious Failure.
  • Key doesn’t exist. How do we classify this? On one hand, the “goal” of calling the method is accomplished – the key is not in Memcache. However, the method itself didn’t technically do what it was supposed to do. Its purpose is to delete a key, and it didn’t. Furthermore, it appears that the developer was mistaken as to the state of the cache. Though in a lot of cases that’s fine, what if the developer is privately counting on the key actually being there (this will be more fleshed out in the second usecase). Matters are further complicated by things like memcachedb – a persistent storage backend that uses the memcache protocol. Here, a missing key could present a serious problem, and the developer should definitely know about it, granted this could just be another argument against putting persistent storage behind a protocol meant for the opposite. One way or another, it’s not clear from the documentation what the method would return in this event.

Things get a little more complicated once the second parameter is invoked. The timeout parameter allows the developer to delay the deletion of the key. The first three bullets above roughly apply the same way with minor obvious adjustments (i.e, in the third bullet replace “being deleted” with “having its ttl adjusted”).

The fourth bullet, however, is even more salient. The setting of a timeout value implies that the developer indeed not only expects the key to be there, but is counting on the key to be there n seconds later.

Naturally, I fully understand that one should rarely COUNT on certain things being in the cache, so the aforementioned concerns will likely be irrelevant in most applications. However, I’m not singling out PHP or Memcache: the same concern applies to plenty of other APIs. I remember wondering what “Failure” meant to the YUI Get utility (I thought it was just a non-200 HTTP code, but directing it to a non-existent URL didn’t seem to trigger it, so it’s unclear), and there are plenty of other cases.

I’m not a fan of returning error codes and having to use huge switch statements to determine what to do in the event of every failure in the application, nor do I advocate throwing finegrained exceptions left and right (the two are nearly identical in my mind). However, I do believe that more care should be taken to document what constitutes a failure and, for functions that don’t have a clearly defined return value, success. Perhaps @failure and @success tags can be added to the docblock spec to facilitate such documentation. For every place in a method where false is returned to signify failure, a @failure block would be added, and likewise for success.

PS: if anybody happens to know the correct answer to my Memcache::delete question, let me know!

A Couple of Quick PHP Tricks

Friday, December 19th, 2008

More seasoned PHP hackers probably already know this stuff, but I thought I’d share a couple things that made me think “Man this is a cool fucking feature” when I used them in my work yesterday.

array_keys is smarter than it looks

The short description for this function in the PHP docs says “array_keysReturn all the keys of an array.” Unfortunately, that is as far as a lot of people read. However, there is an incredibly useful feature hidden in the optional parameters: “If the optional search_value is specified, then only the keys for that value are returned.” So apart from just dumbly getting all the keys the array contains, you can actually parse out some really useful stuff.

Use Case:

You have a table that lists a bunch of entities, and you have to give the user the ability to perform n actions where n > 1 and the actions are mutually exclusive. Simply defining a bunch of checkbox arrays won’t work, so you have to use radio buttons. (You could use checkboxes and use javascript to ensure the exclusivity, but then you’d be a jackass).

Radio buttons, as you know, group by name. So you have to have N radio buttons per row, each with a different value, and a name that indicates which entity the input pertains to. If you’re really smart, you could make each radio group a member of an array, where the array’s key indicates the ID of the entity:

<input type="radio" name="action[<?= $entity->id ?>]" value="delete" />

When the form is submitted, you’ll end up with $_POST['action'][12] => ‘delete’, etc.

Now that you’ve got your input in a nice, tidy array, you can start the magic:

$deletions = array_keys($_POST['action'], 'delete');
$approvals = array_keys($_POST['action'], 'approve');

Now you have just the ID’s of the entities that need to be deleted or approved. Unless you have some really shitty logic libraries, that should make yoru life incredibly easy. I obviously skipped stuff like validation for the sake of brevity and readability, but I would also recommend mapping your actions to an indexed array, so that your radio value were actually more like 0, 1, 2, 3, etc so as to minimize the amount of data posted. Some would also say that you shouldn’t use PHP short tags, but I really just don’t care.

I should also mention that in PHP 5 array_keys has a third parameter bool $strict, which causes it to use === comparison instead of == when parsing the array. Full manual here.

array_splice’s 4th optional parameter might be its most useful

I found myself somewhat dumbfounded when a co-worker asked me if there was a standard function in PHP to insert a set of values at a given point inside an indexed array – I couldn’t think of it! My intuition drew me towards array_splice, even though I knew that the default behavior of that function was the opposite. Having just used the aforementioned obscure feature of array_keys, I guessed that array_splice would either have a similar feature or would direct me to the manual page for its compliment function. The former was correct.

The 4th parameter of array_splice is $replacement. It is pretty self explanatory, but I’ll hold your hand like a small child and do an example anyway.

<?php
$alphabet = $alphabet_broken = range('a','z');
//default behavior. We just got rid of b and c
$missing = array_splice($alphabet_broken, 1, 2);
//same offset, 0 length so nothing is removed
//the missing letters in $replacemenent
array_splice($alphabet_broken, 1, 0, $missing);
//if I don't suck at life, the two should be identical
if (array_diff($alphabet, $alphabet_broken)) {
    echo "Something got fucked up!";
}
?>

The full manual entry for array_splice is here.

My C is more better now!

Friday, December 12th, 2008

I have pushed an update for my silly php extension to github for anyone that wants to have a looksie. I’ve optimized the human_interval_precise function to only declare two variables (the long to hold the number of seconds passed in from userspace and the char array that gets passed back). Still need to figure out a sensible max length for the return value and replace the arbitrary char[60] I have in there now.

I’ve also added a human_interval (not precise) function for approximating in the largest possible units. However, I just realized as I was writing this that I forgot to make it round in any way, so it probably comes up with fairly bogus results right now. Fail. Good thing this is just an exercise, and I’ll have another 3 hours on a shuttle to kill on Monday… and on Tuesday… and on Wednesday…

As I said before, I’m also working on a redo of the WordPress theme, as well as another more “grown up” C project.

Also, did not get laid off. Best of luck to all that did. Crazy times we live in.

PS: just saw on my nifty little wordpress toolbar that 2.7 is available. FUCKING ROCK! New design coming with the quickness.

caching: it’s not just for twitter

Saturday, October 18th, 2008

I had an argument with a co-worker the other day, in which his point was that “APC isn’t the solution to everything.” No shit. Not sure what the point of it all was, but what started it was my desire to CACHE MORE.

While caching seems to have become widely accepted by the Web 2.0 “I read Cal Henerson’s book, so I know how to scale” crowd (because Cal says to use it), the rest of the world has decided that it’s just not needed outside of that milieu.

I work in operations. My audience is limited by my company’s payroll, which is somewhere in the neighborhood of 15,000, of which only a fraction actually uses the shit I write (surprise! the HR department does not give a fuck about SLA misses). I don’t work (directly) on a social network. I’ve never built anything that generated crazy traffic. But I know that even our internal tools have gotten to the point where we have to load balance read queries to multiple slaves and our customers complain about load time.

Caching is a good idea

Let’s review what caching is: storing a chunk of already-processed data in shared memory, where it can be accessed via a simple key lookup. No processing, no querying, no magic sauce – very straight forward. What part of accessing data without ever hitting the persistent storage layer, never mind any sort of database, sounds like a bad fucking idea? I don’t care if your app never has more than 5 simultaneous users, it’s fucking common sense. I’ll also talk about how caching can drastically improve the user experience, even in a low traffic environment.

Caching in PHP – something about candy and babies

If, like me, you work on PHP, your life is sweet: you get to use memcached (what everybody else is using) AND a nifty little PECL extension called APC – Alternative PHP Cache. I won’t cover the specifics of how to use APC or memcache (there are search engines for that), but I will hold your hand like a small child and explain why, indeed, APC isn’t the solution to everything, and why most applications can benefit from using both.

APC is mostly known as an opcode cache – it caches the opcodes for your php files, so that your poor webserver doesn’t have to keep re-parsing your shitty code. That’s super nice, but that’s only half of APC. The other half is just a legit, in-memory key-value store with all the functions you’d expect: apc_store and apc_fetch. Can you guess how they work? (http://www.php.net/apc if you can’t…)

Using The Right Hammer

memcached and APC perform, in essence, the same function (there are obviously vast implementation differences, but we don’t care right now). The major difference is that APC lives inside PHP, hanging off the underside of apache (or lighty or nginx or what the fuck ever you hip kids use these days), while memcached is its own server. That difference is what determines what each one of these tools is best suited for.

Since APC runs as part of your webserver, things you store in it will only be accessible by other scripts running on that same physical server. The flipside is that you don’t have to waste a network roundtrip. Using APC for the things it works for to avoid the trip to the memcache server is one of those “premature optimizations” that isn’t “evil” because it’s so fucking easy.

As my co-worker so aptly pointed out, APC isn’t the answer to everything, and neither is memcached. You might have already figured out that each one has an appropriate use: APC is perfect for storing things that the webserver doesn’t need to share with anything else. memcache should be used for things that can be manipulated by multiple webservers. Here are some examples:

APC:

  • configuration
  • database-backed items used in generating frequently accessed parts of the site
  • lists that get hammered in bursts (i.e. autocomplete lookups)

memcached:

  • data structures containing data modified by our business logic (i.e., user data in the case of social networks or host data in the case of some sort of ops-y app like the shit I write)
  • state/session info, so that your app knows what your user is up to no matter which app server they get bounced to

Remember that your cache shouldn’t contain any information you’re hoping to hold on to longterm. It’s just a proxy to help reduce load on your persistent storage and improve load time. The second case can be the exception, but there you have to make sure your cache is redundant so that a failing memcache instance doesn’t interrupt your user’s workflow.

Naturally, at the outset, when your “startup” is in its infancy and you have just the one EC2 “instance” to work with until that “Series A” hits, it’s fine to dump everything into APC. That’s why it’s important to write or adopt a cache abstraction early on, and create two “identical” APC adapters – one for storing things that REALLY belong in APC, and one for things you will eventually want to move to memcached. The very early, untested stages of just such an abstraction can be found here.

Examples from the land of Operations

Like I said, I’m not about to tell you how to use caching to fix twitter – I haven’t tried it. But I’ll tell you how I use caching in our internal apps to remove bottlenecks. I have to be pretty generic, but I’ll try to make the examples clear anyway. All my examples involve APC because the app I’m working on fits into the above category: it’s an internal app, so we get an app server, a db server and a “fuck you” for a memcached server.

Lists

One of the forms we have contains two drop downs. The first represents a set of categories, while the second represents their sub-categories. Naturally, the second list depends on the first, and needs to be regenerated every time the first one changes. Initially, I just generated the first list in raw markup on the PHP end, then used some XHR sauce to regenerate the second list asyncronously. Using ajax makes my e-manhood feel heavier.

Not that I shouldn’t have been caching this already (these categories and subcategories change once in a blue moon), but I was given an additional reminder: the data for these drop downs came from an API of another internal application, which, I shit you not, takes as long as 5 seconds to process a simple request. All of a sudden, my page was choking hard.

Enter APC. Every time a user hits the form, it checks APC first and loads the first list from there. If not, it actually queries the ass-slow API and then stores it for the next user. I then have a proxy function that queries the API for the second list. This proxy also does the APC dance, so only the first time a certain sub-list is loaded do we have time to make tea and bake a delicious pastry.

To further optimize page load, I ended up making the first list load asyncronously too; now even the user that arrives to a clean cache will see the whole page right away with a “loading” graphic in place of the drop down, while the external API eats its way out of its own shit.

Naturally, since I’m relying on an external system, I could get myself in minor trouble if the list gets updated, and my cache doesn’t know. To avoid this, I only cache for 5 minutes.

Autocomplete

In another form, we have an autocomplete field (thanks, YUI) which, once again, queries an internal API. This API isn’t as slow, but it’s slow enough to ruin the autocomplete “update as you type” experience. Once again, APC to the rescue. This is one of those cases where it actually ended up easier to do the “processing” application side, leaving the database (in this case, the other app’s database) out in the cold alltogether. Luckily, the external application had a method for fetching ALL of the potential items for this autocomplete. What we do now is just cache that full list on first “lookup” and parse out the relevant autocomplete results using PHP in our own code. It is FAST (granted the list is only a few hundred entries). It is definitely faster than waiting for a full API call at every keystroke.

Once again, since the data is external, I’m risking staleness. Just like above, I only cache it for a few minutes. Here in particular, we mostly care about not hitting the persistent storage at every keystroke. As long as it’s cached while the user is typing, we’re ok. The important thing to remember is that this isn’t just useful for slow external API’s. This should be done for your own calls as well – if you’re querying your database every time a user hits a key, you’re doing it wrong. If I have time during the QA phase (as if), I might actually add an XHR call to the page that pokes the API in the background and caches the results for a few minutes. This way the cache is ripe when the typing starts. With data from your own application, you can avoid staleness by adding some sauce to your data logic to invalidate the cache whenever that data is updated.

Hopefully I’ve convinced you to stop being such a pansy and learn how to cache properly.