Archive for the ‘dev’ Category

Drizzle: Time To Get Excited

Tuesday, March 3rd, 2009

Brian Aker was in San Francisco on Monday night to talk about to the MySQL and PHP Meetup groups about Drizzle, the fork of MySQL aimed at meeting the needs of large web applications.

I had heard various tidbits here and there about Drizzle, including at MySQL2008, and had a pretty vague impression of how it was actually going to be different. The general theme seemed to be simpler, smaller, and faster.

If everything Brian described is true, it’s those things plus FUCKING AWESOME. I will join my friends (here, for example) in foaming at the mouth about it. Below is a rough list of things Brian mentioned that get me all hot’n'bothered about Drizzle, in no particular order:

  • No Query Cache. The best part here is the reasoning for its absence: “If you’re relying on the query cache, you probably should have just used memcached to begin with.”
  • In-Query Sharding Info. I’m not exactly sure of the details of the implementation, but it sounds like Drizzle will make it possible to spray queries through a to the right shard automatically.
  • Pluggable Authentication. Finally. Authentication can also be completely turned off.
  • Serializable Query Plans. This feature will allow the parser to be bypassed entirely on most queries. You simply send the query to MySQL, get the execution plan back, cache that, and send the execution plan back the next time you need the same query. That is Fucking Bad Ass (TM).
  • Fewer Locks. Getting rid of a lot of the more advanced and rarely used features introduced in MySQL 5/5.1 (views, stored procedures, etc.), as well as some of the more basic stuff like authentication, has allowed Drizzle to lose 2/3 of the locks present in 5.0 (at least that’s what I think Brian said… sounds surreal…) which obviously opens the door for vast improvements on multicore architectures. It also sounded like Brian made the decision during his talk to discontinue MyISAM support in Drizzle to be able to get rid of another huge lock. I’m OK with it…
  • Discontinued support for antique hardware. Lots of code ripped out because it’s no longer needed.
  • Everything is pluggable, most things are optional. The Drizzle kernel is about 115kloc. Amazing.

Brian was very coy about benchmarks, leaving it to independent sources to run them, but it sounds like Drizzle will leave MySQL in the dust for most of the common applications seen in webapps. I can’t wait to try it and to use it on a few things.

APIMuni by Danny Roa – Bringing NextMuni To The Masses

Saturday, February 28th, 2009

Danny Roa, whom I met at the last Django Meetup, has put out a quick API for accessing Nextbus data.

It’s hosted on the App Engine and can be found here.

His writeup is here.

He recycled the scraping code from yourmuni, props to him for giving props :) Of course, that just means that when Nextbus gets angry, they’re going to come after me first!

Developers don’t create API’s for nothing, so I am eagerly anticipating what Danny is going to use this API for.

Reusable Logging in Django Apps

Saturday, February 28th, 2009

I have 3 drafts sitting in my queue – 1 really long post and 2 short ones. I’ve been picking away at the long post on the shuttle rides, but in the meantime I’m gonna try to push out the two quick ones. This is one of the quick ones.

I was trying to figure out how to set up reusable logging in my apps and have it fairly decoupled from the overall project. Here’s what I came up with:

  1. Set up a logger object using these instructions in settings.py and store it in the LOGGER variable.
  2. Grab it inside apps using django.conf.settings like so:

from django.conf import settings
try:
    logging = settings.LOGGER
except AttributeError:
    import logging
Then just use logging.debug, logging.info etc. Thus, if a LOGGER is configured inside the project’s settings.py, we use that (django.conf.settings points to the settings.py for whatever project you’re working inside of, so you can move your app project to project no problem). Otherwise, we just use vanilla logging functions with the global logging configuration. Nice and sweet.

Suggestions on other ways to do this are, as always, welcome.

Example on django snippets: here.

Django Tip: Using Dictionaries For Model Method Parameters

Tuesday, February 3rd, 2009

I’ve been working a whole lot outside of my job, mostly writing Python and working with Django. I don’t have much energy for a real blog post about something awesome, but I do have a tip to share. Advanced “pythonistas” won’t be impressed, but I haven’t seen this documented prominently anywhere, so I’ll toss it up anyway.

As we all know, Python supports keyword arguments, and the Django ORM takes full advantage of this. When doing lookups, the ORM parses keyword parameters in order to determine what SQL query to execute. A typical ORM call will look like this:

all_oatmeal = Cookie.objects.filter(cookie_type='oatmeal')
That’s very cool and expressive. However, what if our search criteria depend somehow on user input? For example, what if we have a search form with multiple fields, but only want to search by the fields that a user entered something into.

We could have a series of convoluted if/else statements to determine which variables were set and have a corresponding .filter() call for each possibility, but that would be dumb, convoluted, and hard to read later. Also, dumb.

Instead, we can use an alternative way of passing keyword arguments provided by Python (details here): putting ** in front of a dictionary being passed to a function makes Python unpack the dictionary and pass the pairs as keyword arguments to the function. Using that technique, we can arbitrarily construct a dictionary of the search parameters, then pass it to a single .filter() call at the bottom.

An over-simplified example, in which I assume that our form fields match up exactly with model properties

for key in form_data:
    if form_data[key]=='':
        del form_data[key]
wanted_cookies = Cookie.objects.filter(**form_data)
I’m sure there’s a more elegant way to do the empty value stripping too, but that’s not our focus (comments on the subject are welcome, though, for to make me smarter). The point is this: this technique allows for very clean, easy to read, efficient code.

Creating model instances

cookie_data = {
    'cookie_type': 'oatmeal',
    'cookie_size': '3in',
    'cookie_touched': True
}
c = Cookie(**cookie_data)
c.save()
Obviously, this applies to more than just Django – there are many many use cases where this trick can come in handy. Enjoy!

yourmuni makes commuting easier

Saturday, January 17th, 2009
what PH's "to work" bookmark would look like

I am proud to present my latest app yourmuni. It is a cross between momuni.com and Paul Hammond’s minimuni. Its purpose is to make it easier for people to get to and from places they frequent, such as jobs, gyms, favorite spots, and bootycalls. yourmuni lets you define bookmarks which represent collections of transit stops, and then view the bus/train arrival information for each bookmark on a single page. For example, if Paul didn’t already have his highly personalized mimimuni app, he could log onto yourmuni, define a “To Work” bookmark, and assign to it the same stops that he currently scrapes. See the screenshot on the right for an example.

While it’s obviously not a “disruptive” innovation, I think it’s a nice incremental improvement on what most people do, which is look up multiple routes using momuni or nextbus.com while walking out the door. I know I’ve been using it, and it has saved me a tremendous amount of time/clicking around on my iPhone, looking like an idiot.

Though yourmuni was developed with my iPhone in mind, it appears to work just fine on most phones.

Still on the burner:

  • Instant stop lookup (ala momuni)
  • using other agencies that nextbus covers (including ones outside of NorCal)
  • deleting stops from bookmarks
  • better instructions while setting up bookmarks
  • cleaning up some code

yourmuni was demoed at the January Django Meetup, and everyone seemed to like it. I was very flattered by the positive feedback, since it’s a rather simple app.

Technical Details

yourmuni is written using the latest Django at the time of the start of the project, which was r9768. My previous post about getting the latest Django to work on the Google App Engine was the result of me setting yourmuni up on said App Engine, which is where it now lives. The source is on github. It’s far from perfect, as it was my first real Django/Python project, and I am aware of several precise places in the code that could use a minor rewrite. However, here are the parts that I put lots of thought into, and that I think might be useful to others.

App Engine userRequired Decorator

Since the login_required decorator from django.contrib is useless when using the App Engine, I wrote my own, which checks to see if the user is logged in and, if not,  redirects them to the Google Accounts login page, while saving the URL they were trying to access as the callback URL. Here’s the source for all to enjoy (gist here):

def userRequired(fn):
    """decorator for forcing a login"""
    def new(args, **kws):
        user = users.get_current_user()
        if not (user):
            r = args[0]
            return HttpResponseRedirect(users.create_login_url(
                                            r.build_absolute_uri()))
        else:
            return fn(args, **kws)
    return new

Encapsulating Slug Generation in Form Code

Since I only ask for one field (“Description”) when creating a bookmark and place no restrictions on that field (i want it to look like whatever the user wants to see in the interface), I need some way to generate an identifier for the bookmark. I could just give it a numeric or hash identifier, but then it would be useless to the user in terms of seeing it in their browser history (I want to allow the user to jump straight to the bookmark they want if their browser shows it as an option after they type ‘y’). I needed to create a slug. I could accept the “Description” field and then process it in my addBmark view, but instead I defined the form to have two fields, one optional, and used the contents of the description field to automatically populate the “name” field using Django’s built in slugify method available in the template API (thanks to the folks at the Django Meetup who pointed this out). This allows me to encapsulate the validation within the form, so my view code looks very clean – I just have to call the is_valid() method on the form, and the form then has two properties that give me everything I need to create the bookmark. Here’s the code (full source here).

class AddBmarkForm(forms.Form):
    name = forms.CharField(max_length=50, required=False)
    description = forms.CharField(max_length=255, required=True)

def clean_description(self):
    desc = self.cleaned_data['description']
    name = slugify(desc).decode()
    q = db.Query(Bmark)
    q.filter('name =', name)
    q.filter('user =', users.get_current_user())
    if (q.get()):
        raise forms.ValidationError(_("A bookmark with that \
                    name exists already"))
    else:
        self.cleaned_data['name'] = name
        return desc</pre>

Scraping Nextbus

For an unclear reason, nextbus does not have a clean, public API. My assumption is that they want to sell their data, but that's sort of pointless since they provide a free, publicly accessible website everywhere they provide service. It just sucks. So in order to make something better, I basically had to scrape that same publicly accessible website. It wasn't easy, as apparently nextbus hired a live bear to write their markup. Though all of their pages look almost identical, each has its own qurky combination of li, a, nobr, and font tags. I still managed to write a single scrape function to handle all of them, but it ended up being a bit more complex than it needed to be. Thank the powers that be for the BeautifulSoup library. The scrape code is here.

Misc Stuff

As I had mentioned before, I used the latest Django avaialble to me at the start of development. Though I don't get to play with the cool ORM stuff that's been added recently, I did get to use some of the new template tags, such as the {% empty %} tag to specify the behavior in the event of an empty {% for %} loop (docs here, used here).

All in all, I hope this helps people get to and from wherever it is they're going easier. This is the first project I've actually launched in a very long time, and certainly the most useful one.

Django SVN on Google App Engine

Sunday, January 4th, 2009

The Google How-To for using a version of Django other than the built in 0.96 appears to be a bit out of date, as signal handling has changed since the writing. Here’s what needs to be changed:

NOTE: These changes apply to this document dated April 2008. If the date at the top of the document is different, the how-to may have been updated since the writing of this post. Django Rev 9699 is what I’m using.

The main.py portion of the how-to says that two signal handlers need to be changed. Here’s the original source:

# Log errors.
django.dispatch.dispatcher.connect(
   log_exception, django.core.signals.got_request_exception)

# Unregister the rollback event handler. django.dispatch.dispatcher.disconnect(     django.db._rollback_on_exception,     django.core.signals.got_request_exception)

Since the writing of the how-to, the connect and disconnect methods have been moved to the Signal object itself; the legacy functions have been removed (see diff). The code SHOULD be:
# Log errors.
django.core.signals.got_request_exception.connect(log_exception)

Unregister the rollback event handler.

django.core.signals.got_request_exception.disconnect(     django.db._rollback_on_exception)

TADA! Your shit should work on at least revision 9699 of Django.

Gettin My Mits on YUI 3 Widgets: Piemenu

Tuesday, December 23rd, 2008

I was given a small assignment at work – to develop a jazzed up front page for my group with links to all the apps/systems we offer (for those catching up, I work in Yahoo! Ops, making internal tools).

My manager suggested a pie menu as one of the ideas. “That’s crazy talk,” I thought at first. Pie menus are for non-profitable european video social networks. Then I thought about it, and decided to just write it – see if I’ve got some javascript chops (turns out, I do). If Schill and Dimitry can make pages with Javascript that look like they were made in Flash, so can I, damn it!

I’d been taking an occasional look at YUI3 PR2, and decided I’d try it on for size for this project. I must say, it fits. Extending the Widget class properly using Y.mix to add attributes and Y.extend to add methods ends up handling a lot of the dirty work for you, such as parameters and defaults. This is definitely a case where the framework is well designed, and staying within its bounds has enormous benefits.

Though I’m a huge code hoarder by nature, I’m just gonna throw the initial wireframe up on Github and link it here, feedback is welcome.

Docs on extending the widget class are here.

A small sample widget is here.

UPDATE: a simple proof of concept is here (click on the wrenches, and don’t expect too much)

Define Failure…

Saturday, December 20th, 2008

failure graphicJust saw a salient example of something I notice quite a bit in various documentation sources.

Lots of manuals and API references will say stuff like “Returns FALSE on failure” with little to no clarification as to what that means. Though it is usually intuitive, there are frequent cases where a little more attention should be paid. Example: PHP manual page for Memcache::delete.

The method takes two parameters: the key to be deleted and the optional timeout. The second parameter specifies how long Memcache should wait before deleting the key. Like many others, the function “Returns TRUE on success or FALSE on failure.”

The primary use case of this method is obviously just deleting a value stored in memcached. But let’s actually examine the possible outcomes.

  • Key is found and successfully deleted. Obvious Success.
  • Memcached server cannot be reached. Obvious Failure.
  • Key is found, but some sort of network or server glitch prevents it from being deleted. Obvious Failure.
  • Key doesn’t exist. How do we classify this? On one hand, the “goal” of calling the method is accomplished – the key is not in Memcache. However, the method itself didn’t technically do what it was supposed to do. Its purpose is to delete a key, and it didn’t. Furthermore, it appears that the developer was mistaken as to the state of the cache. Though in a lot of cases that’s fine, what if the developer is privately counting on the key actually being there (this will be more fleshed out in the second usecase). Matters are further complicated by things like memcachedb – a persistent storage backend that uses the memcache protocol. Here, a missing key could present a serious problem, and the developer should definitely know about it, granted this could just be another argument against putting persistent storage behind a protocol meant for the opposite. One way or another, it’s not clear from the documentation what the method would return in this event.

Things get a little more complicated once the second parameter is invoked. The timeout parameter allows the developer to delay the deletion of the key. The first three bullets above roughly apply the same way with minor obvious adjustments (i.e, in the third bullet replace “being deleted” with “having its ttl adjusted”).

The fourth bullet, however, is even more salient. The setting of a timeout value implies that the developer indeed not only expects the key to be there, but is counting on the key to be there n seconds later.

Naturally, I fully understand that one should rarely COUNT on certain things being in the cache, so the aforementioned concerns will likely be irrelevant in most applications. However, I’m not singling out PHP or Memcache: the same concern applies to plenty of other APIs. I remember wondering what “Failure” meant to the YUI Get utility (I thought it was just a non-200 HTTP code, but directing it to a non-existent URL didn’t seem to trigger it, so it’s unclear), and there are plenty of other cases.

I’m not a fan of returning error codes and having to use huge switch statements to determine what to do in the event of every failure in the application, nor do I advocate throwing finegrained exceptions left and right (the two are nearly identical in my mind). However, I do believe that more care should be taken to document what constitutes a failure and, for functions that don’t have a clearly defined return value, success. Perhaps @failure and @success tags can be added to the docblock spec to facilitate such documentation. For every place in a method where false is returned to signify failure, a @failure block would be added, and likewise for success.

PS: if anybody happens to know the correct answer to my Memcache::delete question, let me know!

A Couple of Quick PHP Tricks

Friday, December 19th, 2008

More seasoned PHP hackers probably already know this stuff, but I thought I’d share a couple things that made me think “Man this is a cool fucking feature” when I used them in my work yesterday.

array_keys is smarter than it looks

The short description for this function in the PHP docs says “array_keysReturn all the keys of an array.” Unfortunately, that is as far as a lot of people read. However, there is an incredibly useful feature hidden in the optional parameters: “If the optional search_value is specified, then only the keys for that value are returned.” So apart from just dumbly getting all the keys the array contains, you can actually parse out some really useful stuff.

Use Case:

You have a table that lists a bunch of entities, and you have to give the user the ability to perform n actions where n > 1 and the actions are mutually exclusive. Simply defining a bunch of checkbox arrays won’t work, so you have to use radio buttons. (You could use checkboxes and use javascript to ensure the exclusivity, but then you’d be a jackass).

Radio buttons, as you know, group by name. So you have to have N radio buttons per row, each with a different value, and a name that indicates which entity the input pertains to. If you’re really smart, you could make each radio group a member of an array, where the array’s key indicates the ID of the entity:

<input type="radio" name="action[<?= $entity->id ?>]" value="delete" />

When the form is submitted, you’ll end up with $_POST['action'][12] => ‘delete’, etc.

Now that you’ve got your input in a nice, tidy array, you can start the magic:

$deletions = array_keys($_POST['action'], 'delete');
$approvals = array_keys($_POST['action'], 'approve');

Now you have just the ID’s of the entities that need to be deleted or approved. Unless you have some really shitty logic libraries, that should make yoru life incredibly easy. I obviously skipped stuff like validation for the sake of brevity and readability, but I would also recommend mapping your actions to an indexed array, so that your radio value were actually more like 0, 1, 2, 3, etc so as to minimize the amount of data posted. Some would also say that you shouldn’t use PHP short tags, but I really just don’t care.

I should also mention that in PHP 5 array_keys has a third parameter bool $strict, which causes it to use === comparison instead of == when parsing the array. Full manual here.

array_splice’s 4th optional parameter might be its most useful

I found myself somewhat dumbfounded when a co-worker asked me if there was a standard function in PHP to insert a set of values at a given point inside an indexed array – I couldn’t think of it! My intuition drew me towards array_splice, even though I knew that the default behavior of that function was the opposite. Having just used the aforementioned obscure feature of array_keys, I guessed that array_splice would either have a similar feature or would direct me to the manual page for its compliment function. The former was correct.

The 4th parameter of array_splice is $replacement. It is pretty self explanatory, but I’ll hold your hand like a small child and do an example anyway.

<?php
$alphabet = $alphabet_broken = range('a','z');
//default behavior. We just got rid of b and c
$missing = array_splice($alphabet_broken, 1, 2);
//same offset, 0 length so nothing is removed
//the missing letters in $replacemenent
array_splice($alphabet_broken, 1, 0, $missing);
//if I don't suck at life, the two should be identical
if (array_diff($alphabet, $alphabet_broken)) {
    echo "Something got fucked up!";
}
?>

The full manual entry for array_splice is here.

My C is more better now!

Friday, December 12th, 2008

I have pushed an update for my silly php extension to github for anyone that wants to have a looksie. I’ve optimized the human_interval_precise function to only declare two variables (the long to hold the number of seconds passed in from userspace and the char array that gets passed back). Still need to figure out a sensible max length for the return value and replace the arbitrary char[60] I have in there now.

I’ve also added a human_interval (not precise) function for approximating in the largest possible units. However, I just realized as I was writing this that I forgot to make it round in any way, so it probably comes up with fairly bogus results right now. Fail. Good thing this is just an exercise, and I’ll have another 3 hours on a shuttle to kill on Monday… and on Tuesday… and on Wednesday…

As I said before, I’m also working on a redo of the WordPress theme, as well as another more “grown up” C project.

Also, did not get laid off. Best of luck to all that did. Crazy times we live in.

PS: just saw on my nifty little wordpress toolbar that 2.7 is available. FUCKING ROCK! New design coming with the quickness.