Caffeinated Simpleton

6th: Clojure Agents

Flockr is slow. I would profile it, but profiling in Java is a pain, so I’m going to commit a cardinal sin and guess as to what is causing the slowdown. Every time a user loads their flockr page, a bunch of synchronous http requests go out to twitter, return, and the responses are rendered. This could easily happen in a parallel fashion, and my guess is that the slowdown is caused by doing these requests synchronously.

“The Right Way” to solve this problem is using non-blocking I/O. However, that doesn’t let me play with fancy clojure features. Clojure has Agents built in. Agents are bits of code that execute asynchronously and in a thread safe manner. They allow concurrency without any of the usual pains.

Internally, I believe agents are simply functions distributed to a thread pool. Since state is immutable in clojure, it’s pretty trivial to make that arrangement thread safe. However, this is not the optimal solution for my problem as it is limited by the capabilities of the thread pool. The highest throughput would be through non-blocking IO, but we’ll use agents for now for the sake of learning Clojure.

Ok. This might get complicated. However, Berlin Brown was very helpful in getting this all figured out.

First we need to construct the agents. Since there are a dynamic number of channels, we need to put them in a data structure. I just construct a single agent for every twitter channel in column 1 and stick those in a list (the output of map), and then the same thing for the second column.

Constructing the agents is pretty easy. All we need to do is delcare an agent with agent and its default value. If I were to read it right when I created it (by doing @<agent-ref>), I would get the empty string.

I then pass the freshly constructed agent to send-off. send-off takes an agent and the function that will modify the agent. The return value of the generic function that I pass in will become the new value of the agent at some unspecified time in the future. send-off itself returns a reference to its agent immediately.

After running those first two maps I have two lists of agents, which represent the two columns of content. I then need to wait for all that content to get filled in. To do that, I use await. await takes any number of agents that it will wait for before continuing. If I did not wait for the agents to finish, I would return a blank page to the user! Not wanting to do that, I take my two lists of agents, concatenate them, and them use them as the arguments to await using apply.

After that, it’s easy! I have all the rendered channels in my lists of agents, so I iterate through each one, stick them in their columns by dereferencing (@ch-agent) and then send the whole thing off.

The question is whether it really improves performance. Without understanding the internal implementation of the thread pool, we are still limited by the slowest response from Twitter. This is very unfortunate, and in the end, this should be handled on the client side with JavaScript. That’s no fun though, so we’ll just optimize the server side and see what kind of performance we can squeeze out of the thread pool (I’m pretty sure mine is still sub-optimal). Even this fairly straightforward default configuration did cut average response time in half, however, so that’s a pretty good start.

As always, the entirety of the code is available on github.

comments powered by Disqus