<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mikhail panchenko / blog &#187; perl</title>
	<atom:link href="http://mihasya.com/blog/tag/perl/feed/" rel="self" type="application/rss+xml" />
	<link>http://mihasya.com/blog</link>
	<description>good things now come in packages of three</description>
	<lastBuildDate>Mon, 02 Aug 2010 18:47:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Streamstats status update</title>
		<link>http://mihasya.com/blog/streamstats-status-update/</link>
		<comments>http://mihasya.com/blog/streamstats-status-update/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 04:30:51 +0000</pubDate>
		<dc:creator>mihasya</dc:creator>
				<category><![CDATA[dev]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[streamstats]]></category>

		<guid isPermaLink="false">http://mihasya.com/blog/?p=310</guid>
		<description><![CDATA[Just wanted to post a quick note about the state of streamstats, the little tool I&#8217;ve been working on for analyzing logs/data files. Things stalled a bit when I started trying to implement time awareness, as it turned out that Python&#8217;s time parsing capabilities are limited, to put it nicely. I even tried to use [...]]]></description>
			<content:encoded><![CDATA[<p>Just wanted to post a quick note about the state of streamstats, the little tool I&#8217;ve been working on for analyzing logs/data files. Things stalled a bit when I started trying to implement time awareness, as it turned out that Python&#8217;s time parsing capabilities are limited, to put it nicely. I even tried to use regex to find a matching pattern before parsing the date, but I was unable to parse common date formats found in the logs this tool is intended to parse (namely, apache logs; and no, changing the date format for all of Flickr&#8217;s hosts is not a fucking option, ok?) This was unacceptable.</p>

<p>I quickly recreated the basic functionality in PHP, using the famed strtotime function. However, then I looked at the getopt() implementation available stock with PHP and realized I was either going to have to package a third party option (the pickings there were also slim), write my own lib to do it, or write a whole shitload of custom code specifically for streamstats. However, the first option was not attractive due to the fact that I&#8217;d have to create a package for it for Yahoo!&#8217;s packaging system, and the other two are unattractive because&#8230; well, I&#8217;m trying to write a fucking stats analysis function, not options handling code.</p>

<p>That means streamstats is being rewritten, for the 3rd time. in Perl. I&#8217;ll be using Getopt::Long and Date::Manip to keep the auxiliary logic out of the script. Luckily, the basic functionality won&#8217;t take long to recreate, and the features I&#8217;ve been trying to add shouldn&#8217;t be too bad either. Plus, I get to finally re-learn Perl.</p>

<h3>Things to look forward to:</h3>

<ul>
<li>Time awareness (i.e. calculating how many times per second a certain value occurs, a distribution of frequencies, limiting the timeframe of interest)</li>
<li>proper histograms, with buckets etc</li>
<li>custom patterns (i.e., being able to specify which part of the incoming string is the time and which part is the value, for those that don&#8217;t want to use grep/awk to narrow it down beforehand</li>
<li>multiple column comparison and relevant stats (gonna have to bust out the textbook on this one)</li>
</ul>

<p>The next couple of months promise to be exciting in terms of shipping things, both at Flickr and outside. I&#8217;m looking forward to posting actual code for streamstats.pl.</p>
]]></content:encoded>
			<wfw:commentRss>http://mihasya.com/blog/streamstats-status-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
