One Solution for IntelliJ IDEA Freezing

October 8th, 2010

DISCLAIMER: I do not pretend to understand all the JVM options, much less how the JVM manages memory. The following is just a report on my experience that hopes to help others having similar problems.

I’m gonna post something quick before I forget about it because it was a major pain in the ass for me.

Since I’ve been writing primarily Java at SimpleGeo, I’ve been using IntelliJ IDEA, which is a great IDE. However, Malone and I started noticing that as time went along, the longer we had it open, the more it would start randomly stall.

Our guess is that because the IDE is written in Java, what we’re experiencing are GC storms and heap resizes, which bring the entire application to a grinding halt.

After googling around a bit, I was able to stop this from happening by adding the following runtime options

-Xms1024m -Xmx1024m -XX:MaxPermSize=1024m

Those 3 options basically tell IntelliJ to just allocate 1GB memory for itself no matter what. This is fine for us because we have 4GB laptops and the primary thing we’re doing on them is writing Java. Obviously any other size is fine, but the key is to make it allocate enough memory once and just stick to that. We are working with a decent-sized codebase, so 1GB suits us well. IntelliJ’s developers stated essentially the opposite here a few years back, but my experience does not correspond with their predictions. I’m guessing that if I ever get it to 1GB, THEN these settings will cause major problems, but I’ve yet to come even close.

After reading that blog post and doing a bit more research, I’ve also added the -server option, though it’s not clear whether that’ll actually have any benefit. I’ll report back once I’ve had some time using it. More info here here and here.

It seems that Windows users may also benefit from -Dsun.awt.keepWorkingSetOnMinimize=true which is supposed to prevent IntelliJ from becoming unresponsive when losing focus. Forum post explanation here

Adding on Mac

The options are defined in the Info.plist file which is part of the .app bundle (for example, mine is in /Applications/IntelliJ IDEA 9.0.3 CE.app/Contents/Info.plist).

Find the VMOptions key and add the options to the value, like so:

<key>VMOptions</key>
<string>-Xms1024m -Xmx1024m -XX:MaxPermSize=1024m -ea -Xverify:none -server -XX:+UseCompressedOOPS -Xbootclasspath/a:../lib/boot.jar</string>

I don’t have a Windows machine handy, so someone else will have to come up with those instructions.

Leaving Flickr, Joining SimpleGeo

August 2nd, 2010

Flickr has been a wonderful experience, as has Yahoo! as a whole. Some time soon I’ll write a longer, more thoughtful blogpost about it.

For now, I’ll just say that I’m sad to be leaving an amazing team, but excited about a fresh start with another group of brilliant people.

Django Admin: Sorting on Related Object’s Property

June 20th, 2010

This is mostly a note to myself for the future, but I’m sure someone else out there will find it helpful.

The Problem

In one of the apps I work on, I have a pretty standard setup for extending the User object in Django: I have a UserProfile model with a ForeignKey to the auth.User model and have AUTH_PROFILE_MODULE pointing at that model, so that the appropriate row gets returned when I call user.get_profile()

I’ve populated the UserProfile model with a bunch of fields that the client wanted to be able to see in the admin view. Nice and easy so far.

I was then asked to add the date the user joined the site to that view, and also make it sortable. “Easy!” I thought. I immediately went and added user__date_joined to the ModelAdmin‘s list_display Turns out it’s not that easy! Doing this causes a 500.

At first I was baffled that a simple relationship could not be traversed. I went into the Django source and figured out that the error was being thrown by admin.validation.validate() I took out the check to see what would happen if it the field were just allowed to be in there (I was half expecting it to just work, since putting user__date_joined in search_fields worked fine.. I thought maybe the check was accidentally too strict) and that’s when I understood why it’s not allowed: by allowing a field on another model to be included in a certain model’s admin class, an assumption is introduced that 1. that model/field have all the methods required to be displayed and 2. that those methods do what the writer of the ModelAdmin in question expects. Not a good idea.

The Solution

I brought this up in a django IRC channel, and Zain quickly suggested that I just add a callable to the ModelAdmin that simply returns the user’s date_joined. I already knew I could do that, and the reason I hadn’t was that I needed to be able to sort by that field – something that is impossible if it’s generated by a callable.

That’s when Zain pointed out admin_order_field to me – turns out you can tell the admin site to use a specific column to back sort queries against a callable. The resulting code looks like this:

class UserProfileAdmin(admin.ModelAdmin):
    list_select_related = True
    list_display = ( [...] 'date_joined')

    def date_joined(self, profile):
        return profile.user.date_joined

    date_joined.admin_order_field = 'user__date_joined'

That’s all there is to it. Enjoy!

“Fighting Spam at Flickr” at Web2.0Expo

May 15th, 2010

I recently had the giddy honor of speaking at the 2010 Web2.0Expo in San Francisco. The topic was simple – spam. I shared some insights (or I hope they were insights, anyway) about combating the spam problem on a social website – something I had been doing quite a lot of since joining Flickr. The slides are now on Slideshare and embedded below.

Thanks to Brady and the rest of the w2e team for putting together a great conference. I didn’t get to go to as many sessions as I would have liked due to having to spend most of my time in the speakers lounge preparing, but the ones I did go to were excellent.

Things I forgot to say in the talk/slides that are important:

  • Keep track of recent rates for ALL activity that your users do. This gets a bit expensive in terms of storage, but if you prune the data furiously, it can be made sustainable. Having that information is key – it can be used at pretty much every step of spam mitigation. Also, be smart about this – if messages can be deleted from a table, don’t use that table to do the counting. Nobody I know has EVER done that……

  • Rate limit everything. There’s usually a sweetspot right between what 99% of real users will actually ever do and spam-land.

Anyway, here are the slides. Enjoy!

ishmael: A UI for mk-query-digest

April 4th, 2010
screenshot
(queries obscured)

UPDATE: Richard is far more clever and is generally on a roll with naming projects recently, so he suggested that the name should be “Ishmael” in honor of one of the world’s most famous whale hunters.

I’m not feeling very creative, so my latest project is exactly what its name implies: mk-query-digest-ui is a simple UI on top of the data that mk-query-digest produces. The project was born of me and Tim Denike, the Flickr DBA, spending hours and hours staring at the the tool’s plaintext output while hunting for whale queries to optimize. Now that I think about it, I should have called it “Whale Hunter.”

The UI simply lets you sort the queries in the report by a few useful characteristics and facilitates more convenient access to data that is useful during the optimization process. As we keep using it, we’ll keep adding features.

I made it a point to work on this tool in my spare time so that I could release it without the normal ass-ache associated with open-sourcing something at a big company. Thus, the code is on github: http://github.com/mihasya/ishmael. Patches and feature requests are welcome.

Enjoy!

Kibera OSM Tiles on Flickr Maps

April 1st, 2010

Kibera on Flickr I’m not going to be able to make it to where 2.0, but I was there in spirit. I distracted a few other Flickr employees and made Aaron hold my hand like a small child while we pulled down new OSM tiles for Nairobi and put them up on the Flickr map in time for Mikel’s talk about the Map Kibera Project.

You can read the Flickr blog post about it here.

You can go see the map on Flickr here.

You can find out way more about the Map Kibera project here.

Enjoy!

Abusing MySQL: The Federated Engine

February 8th, 2010

I don’t have quite the experience that Kellan and Richard do with wrangling databases (yet), but I have seen some relatively unorthodox stuff. I’ll write a quick note about something quirky we did back when I worked on internal tools at Yahoo!

The Problem

We had one central database that contained a lot of the information about the company’s infrastructure (say, db_central). Among other things, it contained information about users, user groups, and inventory, for some definition of that word.

There were several other tools built around it that had their own databases, but still relied on some of the same data, particularly the user-related tables. We wanted to be able to do joins across the databases, but you can’t do that easily when the databases are on different physical boxes.

You could set up replication on the box that the auxilary database is on (say, db_app1), but we couldn’t do that either – we were doing dual-master replication for HA, and a single MySQL instance can only slave from one host at a time.

Federated Engine

One of the advantages of being an internal team is the ability to stay on top of the ‘new hotness’. Since our datasets and userbases were always relatively small, we were able to upgrade frequently; we were on 5.0 and 5.1 fairly soon after they were released.

With 5.0 came the Federated Engine. It allows you to create a sort of shim for a table on a remote machine and access it as if it were a local table; most notably it allows you to join against the remote table.

Obviously, this sounds like a performance nightmare. Though we never tested it in a straight-forward setup (you’ll see what I mean in a second), and it might have turned out OK for this particular use-case, even at our relatively small size and low traffic, slow joins were a serious problem (at Yahoo!’s size, even the internal apps had multi-million row tables). Adding network latency to that was not something we were interested in.

The Prestige

This is where one of the (other) crazy Russian guys on the team thought of something awesome (can’t remember if it was Andrey or Alex.. neither has a blog, unfortunately).

We would set up an additional instance of MySQL on another port on each of the db_app1 masters (call them db_app1_plus). This instance would be a slave for db_central. Then, on the db_app1 instance, we would set up a Federated Engine table (aha!) that would point to the db_central replica on the db_app1_plus port over localhost. Though we never scientifically benchmarked this setup, in our limited testing it worked like a charm and performed beautifully.

Of course, as I said before, this would not work for a high-traffic production setup. However, it did allow us to simplify the code in our internal apps (and you always want that code to be simple and readable) and did not cause any noticeable performance degradation or additional operational headaches. As far as I know, that setup is still in place.

I’ve recently come back to this idea for some uses at Flickr (mainly background data-mining jobs) and keep forgetting to talk to Kellan about it. Let’s see what he says :D

2010: One More Than 2009

January 2nd, 2010

This seems like a good time to reflect and take stock.

One year ago, I was building internal tools at Yahoo! and starting to toy around with Django on the side. I was working on Yourmuni, which was later presented at the January 2009 Django SF Meetup. I was about to start working on a sizable side project (having met the founder at said meetup).

Today, I’m working at Flickr, inching ever so close to a launch of the aforementioned side project, working on an iPhone app, and toying around with all sorts of exciting stuff. Things feel very different. This is the first time that I actually CAN believe that a year had passed, because a lot of things have happened. I changed jobs, learning a ton and growing a whole lot as an engineer in the process; I took down Flickr, all by my self; I finally visited France after having been the only French major in my class not to have done so; we moved – it was the first time I painted a place; I got my US citizenship; my brother got married; I met a whole ton of awesome people, though I feel like I’ve fallen out of touch with more; I got my first real DSLR camera (thanks, Alexa!) I submitted minor patches to a couple open-source projects, gave some small presentations, and submitted my first proposal for a talk at a conference – fingers crossed!

I feel good about 2010, and hope that it will be even more eventful and crazy.

Happy new year, everyone!

Awesome quotation from the MySQL Performance Blog

November 12th, 2009
“No matter how much you want it to fit, some things may not work (like the Godfather 3).”

http://www.mysqlperformanceblog.com/2009/10/15/mysql-memcached-or-nosql-tokyo-tyrant-part-1/

Cluelessness, expressed in under 140 characters

October 20th, 2009

ORLY?!

In other news, this is post #100 on this blog.