Paul Hammond's Journal
Some things of interest to Paul Hammond, arranged in a reverse chronological order with a feed
Version 0.2 of jp is out. It fixes a small bug where significant whitespace was
removed from invalid JSON documents. More importantly it also adds some color to
Color didn’t make it into the first version of jp, but it was one of the
reasons I built a new parser when I started writing the code. Adding
ANSI Escape codes to one of
the many existing general-purpose JSON libraries would be possible, but I’m
not convinced it would be the right thing to do. Most JSON code is now
part of the core library for any language and adding extra code to every JSON
generating application to handle this one specific use case is a waste of CPU
and future developer debugging time. Even if a patch were appropriate, it
would take a really long time before I could rely on the functionality being
on a system, so I wrote a new library for this use case.
In general it’s hard to argue against code reuse as a concept; none of the
computers systems we use today would be possible without it. But sometimes we
take that concept too far, and try to reuse code in a context it wasn’t
designed for, or write code to handle every use case when just one is needed.
“What We Actually Know About Software Development”
is, in my view, one of the best presentations ever given about code. If you
haven’t watched it, you should. Around 32 minutes in he talks about some
research from Boeing that suggests “if you have to rewrite more than about a
quarter of a software component, you’re actually better off rewriting it from
scratch”. This seems like a small but useful example of that effect in action.
But I digress. If you want your JSON in color
you should get the latest jp.
Logtime is a small service that makes timestamps human readable.
Anyone who's spent any time debugging production systems has had the frustrating experience of trying to correlate the timestamps in a log file with something that happened in the real world.
The log files are usually in UTC when you want them in localtime, or worse, the other way around. Even if you can remember that San Francisco is 8 hours ahead of UTC in the winter actually doing the mental arithmetic is annoying. And some log files helpfully use unreadable timestamps like @4000000052d7c9e300000000 or 1389873600. If you're lucky you can remember the right incantations to the
date command to convert what you want; I can't so I made something instead.
It's not quite done. I'm sure I've missed a few common time formats, and I'd like to see how, and if, it gets used before working out what to add next.
s3simple is a small Bash script to fetch files from and upload files to Amazon’s S3 service.
It was written to help with a fairly common case: you want to download some data from a protected S3 bucket but you don’t have a full featured configuration management system to help you install the software to do so. Perhaps the data is part of your server bootstrapping process, perhaps you just want to download and run an application tarball on a temporary server as quickly as possible, perhaps this is part of a build script that needs to run on developer laptops.
Usually in this scenario s3cmd is used. S3cmd is great; it’s powerful, feature complete, flexible and available in most distributions. But it’s not set up to use in an adhoc way. To run it with a set of one-off IAM keys you need to create a configuration file, then run s3cmd with extra command line options pointing to that configuration file, then clean up the configuration file when you’re done.
In comparison, s3simple takes an S3 url and uses the same two
AWS_SECRET_ACCESS_KEY environment variables as most AWS tools. It only has two dependencies (openssl and curl) which are installed or easily available on all modern unixes. And it’s a simple bash function that’s easy to integrate into a typical bootstrapping script.
I’ve found it useful. I hope you do too.
Most JSON APIs return data in a format optimized for speed over human
readability. This is good, except when things are broken.
But that’s OK, because we have Unix Pipes and we can pipe the JSON into a prettifier. But all of the prettifiers start with a JSON parser so they don’t work if the data is invalid. Most of them also make small changes to the data as it passes through – strings get normalized, numbers get reformatted, and so on – which means you can’t be totally sure if your output is accurate. And they’re usually slow.
Last night I wrote a tool called jp which doesn’t have these problems. It works
by doing a single pass through the JSON data adding or removing whitespace as needed.
It’s fast. Insanely fast. Reformatting a 161M file on my laptop takes 13 seconds, compared to 44 seconds for JSONPP, or several minutes for the Ruby or Python alternatives. And it’s accurate; it doesn’t change any of the data, just the spaces around it.
You should use it.
I just released version 0.6 of webkit2png.
It’s been 4 years since the last release of webkit2png. That’s too long, particularly when you consider how many people have contributed fixes and features since the code moved to GitHub. In no particular order:
I’m amazed and grateful that every one of these people took their time to make webkit2png better. Thank you.
Oh, and there’s one more experimental feature that didn’t make the release. I’d love to hear if it works for anyone.
Today is my last day at Typekit. When I joined the team I talked about how web typography was underappreciated and ignored. Less than three years later that idea seems almost unimaginable. We set out to change the web, and we succeeded.
It’s been a privilege to work alongside so many incredibly talented people. They are all smart, funny, passionate, generous, and humble. Everyone on the team is amazing at what they do, and I've learned a lot from all of them. It is because of them that I feel confident enough to do what's next.
It’s time for me to start working for myself.
Some common themes have emerged from my work at both Flickr and Typekit: infrastructure engineering, development processes, multi-disciplinary collaboration, and — for want of a better word — devops. I want to continue to explore these ideas, but in new contexts. I’ve got some product sketches I want to develop, and plans to collaborate with some co-conspirators on a few projects.
I’m also interested in working with other companies who are trying to understand this space. To be clear, I’m not ready to take a full time role at your startup, but if you've got some challenging infrastructure or process problems, and you think I might be able to help out, please get in touch.
I can’t wait to find out what happens next.
I talked about job queues today at the 2012 Velocity London Conference.
I wanted to give the talk I wish someone had given a few years ago, before I learned a bunch of lessons the hard way. In particular I don't know of any examples of people talking about how to handle errors in a job queue, which seems important. I also talked about why idempotency is awesome, and how to monitor your queues.
The talk wasn't recorded, but the full set of slides is online as a 3MB HTML page or a 900KB PDF file.
The Internet wasn’t always this way.
I think it’s hard to work on the web for over a decade and not have some nostalgia for the way it used to be. Nobody knew what they were doing but we had youth, naïveté, ignorance, and optimism on our side. The medium was not yet fully formed and we were still learning what it was capable of.
Conferences weren’t always this way either. Those who went to early iterations of SXSW interactive and Etech (or was it Etcon?) reminisce about a space where all that mattered was bouncing around interesting ideas. Everyone there was doing interesting things and the conversations outside were often more inspiring than the sessions themselves.
Back then things were simpler, more honest, and more fun.
Many want to keep that spirit alive. But we’ve got too experienced, too good at what we do, too good at deconstructing ideas, too absorbed in the implementation details, too good at making money, and too jaded. And the world has changed around us, the web has become ubiquitous, well understood, and somewhat boring.
And, of course, the past was never actually as good as we remember it.
This creates an impossibly high bar for any conference that aims to talk about more than just technical implementation. It doesn’t matter how inspiring your speakers are or how good the hallway track is you still won’t live up to the legend of previous events that never really happened.
But somehow Andy and Andy managed the impossible with XOXO. It was, by far, the best conference I have ever been to, even the imaginary ones.
Everything about the setting contributed perfectly to the tone of the conference. The City of Portland, the low-road architecture of the Yale Union, the custom-made signs, the afternoon sunlight, the distant passing trains, and even the fancy bathroom created a great sense of space for the event. It could not have happened anywhere else.
It was obvious how much care had been put into every aspect of the production from the visual identity, signs (again) and badges through to the AV setup, WIFI, and other logistics. The high quality reinforced so many of the themes of the conference itself.
They got exactly the right balance between structured sessions and room to talk, between the fringe and the conference, between the planned and the improvised. The schedule covered a huge range of topics but established a compelling and inspiring narrative without being repetitive. It was authentic and honest. There was sponsorship but at no point did I feel like I was being marketed to. There was debate but no cynicism or snark. Above all the stories told were so very inspiring; I’m sure more than a few attendees are already planning to quit their jobs.
Ultimately the attendees made the event. James Duncan Davidson said “The attendee list at XOXO couldn’t have been better curated if one had tried to”. The unique way in which the conference was funded created a unique audience. I caught up with people I hadn’t seen in over five years, made many new friends, and finally got to meet some of the people that inspired me to start working on the web. Everyone was engaged and excited with interesting thoughts, commentary and ideas.
But the way the tickets were sold was also the event’s biggest problem. As great as the crowd was, almost everyone there had committed to spending a huge chunk of cash on tickets and travel with less than 48 hours notice. This inevitably led to a lack of diversity, and was especially worrying when money was one of the primary themes of the conference. I have to assume it also led to a lack of participation from some of the people the conference was notionally aimed at. Even then it still feels like the experiment was worthwhile, and I know the Andys will be thinking about ways to avoid this if they run the event again.
And that leads to the other problem: I can’t see how it’s possible to make this happen again. It was just too good.
Listening to the Kelptones fill the dancefloor, the surprisingly emotional “Indie Game the Movie”, Dan Harmon’s lecture on money & creativity and Julia Nunes’s beautiful story of luck and siezed opportunities. Someone paying for everyone’s drinks on Friday, the unattended iPad with Square on the Cards Against Humanity stall, the spontaneous applause when Ron Carmel said he didn’t need to work for a decade, hiding under the table with Finn, and everything else that happened. I have never laughed or cried so much at a conference, and it’s a long time since I’ve felt so proud to be part of this community.
Thank you Andy and Andy and everyone else for making XOXO possible. It was incredible.
This morning I talked at the 2012 Velocity Conference about some of the lessons
we learned building the infrastructure for Typekit.
The talk was 90 minutes long and covered a lot of ground, some specific and some theoretical, so is hard to summarize. But there were two main
points I wanted to make.
The first is that the important lesson to get from any startup scaling story
is not the specific mistakes they made, but the process used to solve problems.
Optimizing for change is more important than getting everything right first
The second is that it's easy to watch talks at a conference like Velocity and
assume you have to build huge amounts of supporting infrastructure before you
can launch. The reality is that, with a team of 2 or 3 you just don't have
time, and that the actual minimum viable infrastructure is a lot simpler than
The full set of slides is online as a very long HTML page or a 8MB PDF file.
The recently rebuilt Favcol presented an surprisingly interesting challenge: how to analyze the images.
Image processing at scale is effectively a solved problem. The algorithms are well optimized, and it's trivial to scale horizontally by adding more hardware to your image processing cluster. Sites like Flickr and Picasa have optimized the process enough to resize images on the fly if needed while serving thousands of requests a second.
Scaling image processing down is a different story. I think everyone I've ever talked to about processing images on a small site has a horror story. The story of Favcol is fairly typical.
The first version of Favcol was a Rails application, and used RMagick to manipulate images in memory. It was a disaster. Memory leaks caused processes to grow until the box crashed hard. Reaping processes helped a little, but the server I was running it on was supposed to be doing other things at the same time, and couldn't really wait 60 seconds to recover.
The next version shelled out to the
gm GraphicsMagick command to manipulate files, then read the results back from disk. In theory this should have been slower and more expensive, in practise it was significantly more efficient. If there's one piece of advice I can give to anyone thinking about doing any kind of handling of large images, it's to do the hard work in a seperate process unless you really know what you're doing. And if you think you know what you're doing, do the hard work in a seperate process anyway because you're probably wrong.
Even so reading a few hundred huge files every five minutes was still killing my server. One day Favcol crashed the machine again. The cron job got disabled. The intent was to fix it quickly but kids and work and life got in the way and that never happened.
Eventually I started looking at alternatives. Upgrading my virtual server was more expensive than I'm willing to pay to host something like Favcol. I could make the bills cheaper by bringing up an EC2 instance to batch process images for half an hour each day, but part of the fun of favcol is seeing your photo appear within a few minutes. I looked for online services for image processing, and found many different ways to resize or post-process images and no services to give me an average color. I even briefly considered doing the work on visitor's computers with
Google App Engine kept bubbling up as a potential solution - it's free if you stay below a quota and has a built in image manipulation API. The only problem was that App Engine offers no easy way to get at the raw pixel data for an image that has been processed, which is the only data I needed.
Eventually I realized there is a workaround.
The trick is that PNG files are an easy to read, even from high level scripting languages like Python. So you can use the App Engine Image Manipulation Service to convert an image into a smallish PNG, then read the raw data using a pure python library like pypng:
# go grab the image
result = urllib2.urlopen(url)
# resize to a 20px thumbnail
img = images.Image(result.read())
thumbnail = img.execute_transforms(
# read the thumbnail
r = png.Reader(bytes = thumbnail)
png_w,png_h,pixels,info = r.asDirect()
It's a hack, but it works well enough to process a few thousand images throughout the day without costing me any money.
The full code I use is up on github. It only does basic RGB mean average at the moment, but it should be easy to add other metrics like dominant colour.
I hope it's useful.