A better "Who's Online" with Redis & Python

Updated on December 17, 2011: I found a better solution. Head on over to the new post to check it out.

Who's What?

My website, like many others, has a "who's online" feature. It displays the names of authenticated users that have been seen over the course of the last ten minutes or so. It may seem a minor feature at first, but I find it really does a lot to "humanize" the site and make it seem more like a community gathering place.

My first implementation of this feature used the MySQL database to update a per-user timestamp whenever a request from an authenticated user arrived. Actually, this seemed excessive to me, so I used a strategy involving an "online" cookie that has a five minute expiration time. Whenever I see an authenticated user without the online cookie I update their timestamp and then hand them back a cookie that will expire in five minutes. In this way I don't have to hit the database on every single request.

This approach worked fine but it has some aspects that didn't sit right with me:

  • It seems like overkill to use the database to store temporary, trivial information like this. It doesn't feel like a good use of a full-featured relational database management system (RDBMS).
  • I am writing to the database during a GET request. Ideally, all GET requests should be idempotent. Of course if this is strictly followed, it would be impossible to create a "who's online" feature in the first place. You'd have to require the user to POST data periodically. However, writing to a RDBMS during a GET request is something I feel guilty about and try to avoid when I can.

Redis

Enter Redis. I discovered Redis recently, and it is pure, white-hot awesomeness. What is Redis? It's one of those projects that gets slapped with the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes sense to me when described as a lightweight data structure server. Memcached can store key-value pairs very fast, where the value is always a string. Redis goes one step further and stores not only strings, but data structures like lists, sets, and hashes. For a great overview of what Redis is and what you can do with it, check out Simon Willison's Redis tutorial.

Another reason why I like Redis is that it is easy to install and deploy. It is straight C code without any dependencies. Thus you can build it from source just about anywhere. Your Linux distro may have a package for it, but it is just as easy to grab the latest tarball and build it yourself.

I've really come to appreciate Redis for being such a small and lightweight tool. At the same time, it is very powerful and effective for filling those tasks that a traditional RDBMS is not good at.

For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py client library. It can be installed with a simple

$ sudo pip install redis

Who's Online with Redis

Now that we are going to use Redis, how do we implement a "who's online" feature? The first step is to get familiar with the Redis API.

One approach to the "who's online" problem is to add a user name to a set whenever we see a request from that user. That's fine but how do we know when they have stopped browsing the site? We have to periodically clean out the set in order to time people out. A cron job, for example, could delete the set every five minutes.

A small problem with deleting the set is that people will abruptly disappear from the site every five minutes. In order to give more gradual behavior we could utilize two sets, a "current" set and an "old" set. As users are seen, we add their names to the current set. Every five minutes or so (season to taste), we simply overwrite the old set with the contents of the current set, then clear out the current set. At any given time, the set of who's online is the union of these two sets.

This approach doesn't give exact results of course, but it is perfectly fine for my site.

Looking over the Redis API, we see that we'll be making use of the following commands:

  • SADD for adding members to the current set.
  • RENAME for copying the current set to the old, as well as destroying the current set all in one step.
  • SUNION for performing a union on the current and old sets to produce the set of who's online.

And that's it! With these three primitives we have everything we need. This is because of the following useful Redis behaviors:

  • Performing a SADD against a set that doesn't exist creates the set and is not an error.
  • Performing a SUNION with sets that don't exist is fine; they are simply treated as empty sets.

The one caveat involves the RENAME command. If the key you wish to rename does not exist, the Python Redis client treats this as an error and an exception is thrown.

Experimenting with algorithms and ideas is quite easy with Redis. You can either use the Python Redis client in a Python interactive interpreter shell, or you can use the command-line client that comes with Redis. Either way you can quickly try out commands and refine your approach.

Implementation

My website is powered by Django, but I am not going to show any Django specific code here. Instead I'll show just the pure Python parts, and hopefully you can adapt it to whatever framework, if any, you are using.

I created a Python module to hold this functionality: whos_online.py. Throughout this module I use a lot of exception handling, mainly because if the Redis server has crashed (or if I forgot to start it, say in development) I don't want my website to be unusable. If Redis is unavailable, I simply log an error and drive on. Note that in my limited experience Redis is very stable and has not crashed on me once, but it is good to be defensive.

The first important function used throughout this module is a function to obtain a connection to the Redis server:

import logging
import redis

logger = logging.getLogger(__name__)

def _get_connection():
    """
    Create and return a Redis connection. Returns None on failure.
    """
    try:
        conn = redis.Redis(host=HOST, port=PORT, db=DB)
        return conn
    except redis.RedisError, e:
        logger.error(e)

    return None

The HOST, PORT, and DB constants can come from a configuration file or they could be module-level constants. In my case they are set in my Django settings.py file. Once we have this connection object, we are free to use the Redis API exposed via the Python Redis client.

To update the current set whenever we see a user, I call this function:

# Redis key names:
USER_CURRENT_KEY = "wo_user_current"
USER_OLD_KEY = "wo_user_old"

def report_user(username):
 """
 Call this function when a user has been seen. The username will be added to
 the current set.
 """
 conn = _get_connection()
 if conn:
     try:
         conn.sadd(USER_CURRENT_KEY, username)
     except redis.RedisError, e:
         logger.error(e)

If you are using Django, a good spot to call this function is from a piece of custom middleware. I kept my "5 minute cookie" algorithm to avoid doing this on every request although it is probably unnecessary on my low traffic site.

Periodically you need to "age out" the sets by destroying the old set, moving the current set to the old set, and then emptying the current set.

def tick():
    """
    Call this function to "age out" the old set by renaming the current set
    to the old.
    """
    conn = _get_connection()
    if conn:
       # An exception may be raised if the current key doesn't exist; if that
       # happens we have to delete the old set because no one is online.
       try:
           conn.rename(USER_CURRENT_KEY, USER_OLD_KEY)
       except redis.ResponseError:
           try:
               del conn[old]
           except redis.RedisError, e:
               logger.error(e)
       except redis.RedisError, e:
           logger.error(e)

As mentioned previously, if no one is on your site, eventually your current set will cease to exist as it is renamed and not populated further. If you attempt to rename a non-existent key, the Python Redis client raises a ResponseError exception. If this occurs we just manually delete the old set. In a bit of Pythonic cleverness, the Python Redis client supports the del syntax to support this operation.

The tick() function can be called periodically by a cron job, for example. If you are using Django, you could create a custom management command that calls tick() and schedule cron to execute it. Alternatively, you could use something like Celery to schedule a job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope to explore in the near future).

Finally, you need a way to obtain the current "who's online" set, which again is a union of the current and old sets.

def get_users_online():
    """
    Returns a set of user names which is the union of the current and old
    sets.
    """
    conn = _get_connection()
    if conn:
        try:
            # Note that keys that do not exist are considered empty sets
            return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY])
        except redis.RedisError, e:
            logger.error(e)

    return set()

In my Django application, I calling this function from a custom inclusion template tag .

Conclusion

I hope this blog post gives you some idea of the usefulness of Redis. I expanded on this example to also keep track of non-authenticated "guest" users. I simply added another pair of sets to track IP addresses.

If you are like me, you are probably already thinking about shifting some functions that you awkwardly jammed onto a traditional database to Redis and other "NoSQL" technologies.

Read and Post Comments

Blogofile, reStructuredText, and Pygments

Blogofile has support out-of-the-box for reStructuredText and Pygments. Blogofile's syntax_highlight.py filter wants you to mark your code blocks with a token such as $$code(lang=python). I wanted to use the method I am more familiar with, by configuring reStructuredText with a custom directive. Luckily this is very easy. Here is how I did it.

First of all, I checked what version of Pygments I had since I used Ubuntu's package manager to install it. I then visited Pygments on BitBucket, and switched to the tag that matched my version. I then drilled into the external directory. I then saved the rst-directive.py file to my blog's local repository under the name _rst_directive.py. I named it with a leading underscore so that Blogofile would ignore it. If this bothers you, you could also add it to Blogofile's site.file_ignore_patterns setting.

Next, I tweaked the settings in _rst_directive.py by un-commenting the linenos variant.

All we have to do now is to get Blogofile to import this module. This can be accomplished by making use of the pre_build() hook in your _config.py file. This is a convenient place to hang custom code that will run before your blog is built. I added the following code to my _config.py module

def pre_build():
    # Register the Pygments Docutils directive
    import _rst_directive

This allows me to embed code in my .rst files with the sourcecode directive. For example, here is what I typed to create the source code snippet above:

.. sourcecode:: python

   def pre_build():
       # Register the Pygments Docutils directive
       import _rst_directive

Of course to get it to look nice, we'll need some CSS. I used this Pygments command to generate a .css file for the blog.

$ pygmentize -f html -S monokai -a .highlight > pygments.css

I saved pygments.css in my css directory and updated my site template to link it in. Blogofile will copy this file into my _site directory when I build the blog.

Here is what I added to my blog's main .css file to style the code snippets. The important thing for me was to add an overflow: auto; setting. This will ensure that a scrollbar will appear on long lines instead of the code being truncated.

.highlight {
   width: 96%;
   padding: 0.5em 0.5em;
   border: 1px solid #00ff00;
   margin: 1.0em auto;
   overflow: auto;
}

That's it!

Read and Post Comments

Blog reboot with Blogofile

Welcome to my new blog. I've been meaning to start blogging again for some time, especially since the new version of SurfGuitar101.com went live almost two months ago. But the idea of dealing with WordPress was putting me off. Don't get me wrong, WordPress really is a nice general purpose blogging platform, but it didn't really suit me anymore.

I considered creating a new blog in Django, but I really want to spend all my time and energy on improving SurfGuitar101 and not tweaking my blog. I started thinking about doing something simpler.

Almost by accident, I discovered Blogofile by seeing it mentioned in my Twitter feed. Blogofile is a static blog generator written in Python. After playing with it for a while, I decided to use it for a blog reboot. It is simple to use, Pythonic, and very configurable. The advantages for me to go with a static blog are:

  1. No more dealing with WordPress and plugin updates. To be fair, WordPress is very easy to update these days. Plugins are still a pain, and are often needed to display source code.
  2. I can write my blog posts in Markdown or reStructuredText using my favorite editor instead of some lame Javascript editor. Formatting source code is dead simple now.
  3. All of my blog content is under version control.
  4. Easier to work offline.
  5. Easier to deploy. Very little (if any) server configuration.
  6. I can use version control with a post-commit hook to deploy the site.

Disadvantages:

  1. Not as "dynamic". For my blog, this isn't really a problem. Comments can be handled by a service like Disqus.
  2. Regenerating the entire site can take time. This is only an issue if you have a huge blog with years of content. A fresh blog takes a fraction of a second to build, and I don't anticipate this affecting me for some time, if ever. I suspect Blogofile will be improved to include caching and smarter rebuilds in the future.

It should be noted that Blogofile seems to require Python 2.6 or later. My production server is still running 2.5, and I can't easily change this for a while. This really only means I can't use Mercurial with a changegroup hook to automatically deploy the site. This should only be a temporary issue; I hope to upgrade the server in the future.

Blogofile comes with some scripts for importing WordPress blogs. Looking over my old posts, some of them make me cringe. I think I'll save importing them for a rainy day.

The bottom line is, this style of blogging suits me as a programmer. I get to use all the same tools I use to write code: a good text editor, the same markup I use for documentation, and version control. Deployment is a snap, and I don't have a database or complicated server setup to maintain. Hopefully this means I will blog more.

Finally, I'd like to give a shout-out to my friend Trevor Oke who just switched to a static blog for many of the same reasons.

Read and Post Comments