2010 – One Crazy Fucking Year

January 1st, 2011

In hopes of keeping these yearly roundup posts a tradition, here’s one for this crazy roller coaster of a year.

First, a word on resolutions

My resolutions are always the same, so here are my grades for how I did on all of them this year, in no particular order:

Fitness: B+

Started playing much more soccer, including an incredibly competitive tuesday night league; got a podiatrist to actually look at my feet and make some adjustments that have had amazing benefits; got contacts, so I can actually see the ball coming; had stretches of decent gym attendance, including late in the year; didn’t lose or gain any weight, still feel a bit heavy around the middle, and generally feel slow, so lots of work to do and lots of discipline to make up on that front

Travel: D+

Didn’t really go anywhere new, unless you count Brooklyn; did spend a couple of days in South Lake Tahoe.

Work: A-

Actually found a job that made quitting Flickr an easy decision – not easy, considering how awesome it was to work at Flickr; spoke at two conferences – one talk went well, the other not so much – lots of room for improvement there;

Writing: C

I wrote a few entries in 2010, but only a couple of them seem interesting even to me in retrospect.

People: C

I’ve started making a concerted effort at keeping up with people, and have reestablished contact with a few people; still not good at making lasting connections with new people I meet and generally remembering people/faces.

Music: F

Have not been playing my guitar at all; still have zero understanding of music theory, and have forgotten even the songs I did know how to play before; really dropped the ball here.

General Learning: C-

Started reading more (in part, thanks to the Kindle); learned a lot as part of completely switching specialties when joining SimpleGeo; still not doing as much math and CS learning as I should be; still not doing enough learning in other areas, like business; not doing enough with French, probably slowly losing that language.

Stuff that happened

  • Alexa and I spent a week in Brooklyn for our 2 year anniversary; I loved exploring that part of NYC a little bit and getting to see all of our friends
  • Flickr got back to its shipping ways and delivered a bunch of new features and enhancements, including the big photo page redesign. That whole stretch right in the middle of the year was just amazing to be part of. It really felt like we were Getting Shit Done.
  • During that same time period, Flickr also hired a whole bunch of new engineers all of whom were awesome to meet.
  • I spoke about Flickr’s approach to spam at Web2.0 Expo in SF, and that went pretty well. I also gave a fairly forgettable talk about Flickr’s engineering practices at the Strangeloop 2010 conference in St. Louis. Definitely a “this is how you don’t do it” experience for me.
  • I left Flickr for SimpleGeo, switching from general web development/API mushing to systems-level database programming. It’s been insane. I’ve also started writing Java.
  • Alexa’s father Carlos passed away at the end of the year unexpectedly at the age of 63, making 2010 a tremendously sad year overall. He was a great man, and I only wish I had more time to get to know him better. A scholarship in his name was created shortly before his passing and can be found here.

Goals for 2011

These are largely based on the grades given above, and are but a slight adjustment from the goals I set for myself in the beginning of 2010.

  • More conferences, both speaking at and attending.
  • More traveling – hopefully will go hand in hand with the conferences.
  • More guitar – I’m going to try to set aside at least 15-30 minutes a day to just pick at it, if not practice systematically; I’ve even picked up a basic scale book to try to put some fundamentals underneath my fingers; who knows, maybe I’ll even go crazy and take a lesson or two.
  • More email, phonecalls, and generally keeping up with people. There really are no excuses for how bad I’ve been at it.
  • More reading and studying; I’m starting to consider some off-hours schooling to brush up on some things I largely ignored during college.
  • More writing; I’m working on and learning about some pretty awesome stuff, so I’m going to try to find time to write about it more.

A year in photos

Random ass photos from this past year:

Schill's Birthday Cal's Bday Trevor's Party @comradet participating in a blue hill tradition Sunrise over SFO You can do it, @ph Brooklyn with Alexa Brooklyn with Alexa Flickr team building? Comb at ATM... Weird Derby Party 2010 Derby Party 2010 Simon and Tammy Simon and Tammy Memorial Day Panhandle BBQ of Justice Muprhy's Pigroast "Honey! I think my memory is breaking!" Stinson Beach / Bolinas / Highway 1 Two Laurens, One Cup IMAG0023.jpg In st louis Tahoe Getaway Nov 2010 Hawks 1 - Blues 3

All my 2010 photos can be found on flickr.

One Solution for IntelliJ IDEA Freezing

October 8th, 2010

DISCLAIMER: I do not pretend to understand all the JVM options, much less how the JVM manages memory. The following is just a report on my experience that hopes to help others having similar problems.

I’m gonna post something quick before I forget about it because it was a major pain in the ass for me.

Since I’ve been writing primarily Java at SimpleGeo, I’ve been using IntelliJ IDEA, which is a great IDE. However, Malone and I started noticing that as time went along, the longer we had it open, the more it would start randomly stall.

Our guess is that because the IDE is written in Java, what we’re experiencing are GC storms and heap resizes, which bring the entire application to a grinding halt.

After googling around a bit, I was able to stop this from happening by adding the following runtime options

-Xms1024m -Xmx1024m -XX:MaxPermSize=1024m

Those 3 options basically tell IntelliJ to just allocate 1GB memory for itself no matter what. This is fine for us because we have 4GB laptops and the primary thing we’re doing on them is writing Java. Obviously any other size is fine, but the key is to make it allocate enough memory once and just stick to that. We are working with a decent-sized codebase, so 1GB suits us well. IntelliJ’s developers stated essentially the opposite here a few years back, but my experience does not correspond with their predictions. I’m guessing that if I ever get it to 1GB, THEN these settings will cause major problems, but I’ve yet to come even close.

After reading that blog post and doing a bit more research, I’ve also added the -server option, though it’s not clear whether that’ll actually have any benefit. I’ll report back once I’ve had some time using it. More info here here and here.

It seems that Windows users may also benefit from -Dsun.awt.keepWorkingSetOnMinimize=true which is supposed to prevent IntelliJ from becoming unresponsive when losing focus. Forum post explanation here

Adding on Mac

The options are defined in the Info.plist file which is part of the .app bundle (for example, mine is in /Applications/IntelliJ IDEA 9.0.3 CE.app/Contents/Info.plist).

Find the VMOptions key and add the options to the value, like so:

<key>VMOptions</key>
<string>-Xms1024m -Xmx1024m -XX:MaxPermSize=1024m -ea -Xverify:none -server -XX:+UseCompressedOOPS -Xbootclasspath/a:../lib/boot.jar</string>

I don’t have a Windows machine handy, so someone else will have to come up with those instructions.

Leaving Flickr, Joining SimpleGeo

August 2nd, 2010

Flickr has been a wonderful experience, as has Yahoo! as a whole. Some time soon I’ll write a longer, more thoughtful blogpost about it.

For now, I’ll just say that I’m sad to be leaving an amazing team, but excited about a fresh start with another group of brilliant people.

Django Admin: Sorting on Related Object’s Property

June 20th, 2010

This is mostly a note to myself for the future, but I’m sure someone else out there will find it helpful.

The Problem

In one of the apps I work on, I have a pretty standard setup for extending the User object in Django: I have a UserProfile model with a ForeignKey to the auth.User model and have AUTH_PROFILE_MODULE pointing at that model, so that the appropriate row gets returned when I call user.get_profile()

I’ve populated the UserProfile model with a bunch of fields that the client wanted to be able to see in the admin view. Nice and easy so far.

I was then asked to add the date the user joined the site to that view, and also make it sortable. “Easy!” I thought. I immediately went and added user__date_joined to the ModelAdmin‘s list_display Turns out it’s not that easy! Doing this causes a 500.

At first I was baffled that a simple relationship could not be traversed. I went into the Django source and figured out that the error was being thrown by admin.validation.validate() I took out the check to see what would happen if it the field were just allowed to be in there (I was half expecting it to just work, since putting user__date_joined in search_fields worked fine.. I thought maybe the check was accidentally too strict) and that’s when I understood why it’s not allowed: by allowing a field on another model to be included in a certain model’s admin class, an assumption is introduced that 1. that model/field have all the methods required to be displayed and 2. that those methods do what the writer of the ModelAdmin in question expects. Not a good idea.

The Solution

I brought this up in a django IRC channel, and Zain quickly suggested that I just add a callable to the ModelAdmin that simply returns the user’s date_joined. I already knew I could do that, and the reason I hadn’t was that I needed to be able to sort by that field – something that is impossible if it’s generated by a callable.

That’s when Zain pointed out admin_order_field to me – turns out you can tell the admin site to use a specific column to back sort queries against a callable. The resulting code looks like this:

class UserProfileAdmin(admin.ModelAdmin):
    list_select_related = True
    list_display = ( [...] 'date_joined')

    def date_joined(self, profile):
        return profile.user.date_joined

    date_joined.admin_order_field = 'user__date_joined'

That’s all there is to it. Enjoy!

“Fighting Spam at Flickr” at Web2.0Expo

May 15th, 2010

I recently had the giddy honor of speaking at the 2010 Web2.0Expo in San Francisco. The topic was simple – spam. I shared some insights (or I hope they were insights, anyway) about combating the spam problem on a social website – something I had been doing quite a lot of since joining Flickr. The slides are now on Slideshare and embedded below.

Thanks to Brady and the rest of the w2e team for putting together a great conference. I didn’t get to go to as many sessions as I would have liked due to having to spend most of my time in the speakers lounge preparing, but the ones I did go to were excellent.

Things I forgot to say in the talk/slides that are important:

  • Keep track of recent rates for ALL activity that your users do. This gets a bit expensive in terms of storage, but if you prune the data furiously, it can be made sustainable. Having that information is key – it can be used at pretty much every step of spam mitigation. Also, be smart about this – if messages can be deleted from a table, don’t use that table to do the counting. Nobody I know has EVER done that……

  • Rate limit everything. There’s usually a sweetspot right between what 99% of real users will actually ever do and spam-land.

Anyway, here are the slides. Enjoy!

ishmael: A UI for mk-query-digest

April 4th, 2010
screenshot
(queries obscured)

UPDATE: Richard is far more clever and is generally on a roll with naming projects recently, so he suggested that the name should be “Ishmael” in honor of one of the world’s most famous whale hunters.

I’m not feeling very creative, so my latest project is exactly what its name implies: mk-query-digest-ui is a simple UI on top of the data that mk-query-digest produces. The project was born of me and Tim Denike, the Flickr DBA, spending hours and hours staring at the the tool’s plaintext output while hunting for whale queries to optimize. Now that I think about it, I should have called it “Whale Hunter.”

The UI simply lets you sort the queries in the report by a few useful characteristics and facilitates more convenient access to data that is useful during the optimization process. As we keep using it, we’ll keep adding features.

I made it a point to work on this tool in my spare time so that I could release it without the normal ass-ache associated with open-sourcing something at a big company. Thus, the code is on github: http://github.com/mihasya/ishmael. Patches and feature requests are welcome.

Enjoy!

Kibera OSM Tiles on Flickr Maps

April 1st, 2010

Kibera on Flickr I’m not going to be able to make it to where 2.0, but I was there in spirit. I distracted a few other Flickr employees and made Aaron hold my hand like a small child while we pulled down new OSM tiles for Nairobi and put them up on the Flickr map in time for Mikel’s talk about the Map Kibera Project.

You can read the Flickr blog post about it here.

You can go see the map on Flickr here.

You can find out way more about the Map Kibera project here.

Enjoy!

Abusing MySQL: The Federated Engine

February 8th, 2010

I don’t have quite the experience that Kellan and Richard do with wrangling databases (yet), but I have seen some relatively unorthodox stuff. I’ll write a quick note about something quirky we did back when I worked on internal tools at Yahoo!

The Problem

We had one central database that contained a lot of the information about the company’s infrastructure (say, db_central). Among other things, it contained information about users, user groups, and inventory, for some definition of that word.

There were several other tools built around it that had their own databases, but still relied on some of the same data, particularly the user-related tables. We wanted to be able to do joins across the databases, but you can’t do that easily when the databases are on different physical boxes.

You could set up replication on the box that the auxilary database is on (say, db_app1), but we couldn’t do that either – we were doing dual-master replication for HA, and a single MySQL instance can only slave from one host at a time.

Federated Engine

One of the advantages of being an internal team is the ability to stay on top of the ‘new hotness’. Since our datasets and userbases were always relatively small, we were able to upgrade frequently; we were on 5.0 and 5.1 fairly soon after they were released.

With 5.0 came the Federated Engine. It allows you to create a sort of shim for a table on a remote machine and access it as if it were a local table; most notably it allows you to join against the remote table.

Obviously, this sounds like a performance nightmare. Though we never tested it in a straight-forward setup (you’ll see what I mean in a second), and it might have turned out OK for this particular use-case, even at our relatively small size and low traffic, slow joins were a serious problem (at Yahoo!’s size, even the internal apps had multi-million row tables). Adding network latency to that was not something we were interested in.

The Prestige

This is where one of the (other) crazy Russian guys on the team thought of something awesome (can’t remember if it was Andrey or Alex.. neither has a blog, unfortunately).

We would set up an additional instance of MySQL on another port on each of the db_app1 masters (call them db_app1_plus). This instance would be a slave for db_central. Then, on the db_app1 instance, we would set up a Federated Engine table (aha!) that would point to the db_central replica on the db_app1_plus port over localhost. Though we never scientifically benchmarked this setup, in our limited testing it worked like a charm and performed beautifully.

Of course, as I said before, this would not work for a high-traffic production setup. However, it did allow us to simplify the code in our internal apps (and you always want that code to be simple and readable) and did not cause any noticeable performance degradation or additional operational headaches. As far as I know, that setup is still in place.

I’ve recently come back to this idea for some uses at Flickr (mainly background data-mining jobs) and keep forgetting to talk to Kellan about it. Let’s see what he says :D

2010: One More Than 2009

January 2nd, 2010

This seems like a good time to reflect and take stock.

One year ago, I was building internal tools at Yahoo! and starting to toy around with Django on the side. I was working on Yourmuni, which was later presented at the January 2009 Django SF Meetup. I was about to start working on a sizable side project (having met the founder at said meetup).

Today, I’m working at Flickr, inching ever so close to a launch of the aforementioned side project, working on an iPhone app, and toying around with all sorts of exciting stuff. Things feel very different. This is the first time that I actually CAN believe that a year had passed, because a lot of things have happened. I changed jobs, learning a ton and growing a whole lot as an engineer in the process; I took down Flickr, all by my self; I finally visited France after having been the only French major in my class not to have done so; we moved – it was the first time I painted a place; I got my US citizenship; my brother got married; I met a whole ton of awesome people, though I feel like I’ve fallen out of touch with more; I got my first real DSLR camera (thanks, Alexa!) I submitted minor patches to a couple open-source projects, gave some small presentations, and submitted my first proposal for a talk at a conference – fingers crossed!

I feel good about 2010, and hope that it will be even more eventful and crazy.

Happy new year, everyone!

Awesome quotation from the MySQL Performance Blog

November 12th, 2009
“No matter how much you want it to fit, some things may not work (like the Godfather 3).”

http://www.mysqlperformanceblog.com/2009/10/15/mysql-memcached-or-nosql-tokyo-tyrant-part-1/