Integrating Django and MoinMoin with Redis
We want a Wiki!
Over at SurfGuitar101.com, we decided we'd like to have a wiki to capture community knowledge. I briefly looked at candidate wiki engines with an eye towards integrating them with Django, the framework that powers SurfGuitar101.com. And of course I was biased towards a wiki solution that was written in Python. I had tried a few wikis in the past, including the behemoth MediaWiki. MediaWiki is a very powerful piece of software, but it is also quite complex, and I didn't want to have to maintain a PHP infrastructure to run it.
Enter MoinMoin. This is a mature wiki platform that is actively maintained and written in Python. It is full featured but did not seem overly complex to me. It stores its pages in flat files, which seemed appealing for our likely small wiki needs. It turns out I had been a user of MoinMoin for many years without really knowing it. The Python.org wiki, Mercurial wiki, and omniORB wiki are all powered by MoinMoin. We'd certainly be in good company.
Single Sign-On
The feature that clinched it was MoinMoin's flexible authentication system. It would be very desirable if my users did not have to sign into Django and then sign in again to the wiki with possibly a different username. Managing two different password databases would be a real headache. The ideal solution would mean signing into Django would log the user into MoinMoin with the same username automatically.
MoinMoin supports this with their external cookie authentication mechanism. The details are provided in the previous link; basically a Django site needs to perform the following:
- Set an external cookie for MoinMoin to read whenever a user logs into Django.
- To prevent spoofing, the Django application should create a record that the cookie was created in some shared storage area accessible to MoinMoin. This allows MoinMoin to validate that the cookie is legitimate and not a fake.
- When the user logs out of the Django application, it should delete this external cookie and the entry in the shared storage area.
- Periodically the Django application should expire entires in the shared storage area for clean-up purposes. Otherwise this storage would grow and grow if users never logged out. Deleting entries older than the external cookie's age should suffice.
My Django Implementation
There are of course many ways to approach this problem. Here is what I came up with. I created a Django application called wiki to hold this integration code. There is quite a lot of code here, too much to conveniently show in this blog post. I will post snippets below, but you can refer to the complete code in my Bitbucket repository. You can also view online the wiki application in bitbucket for convenience.
Getting notified of when users log into or out of Django is made easy thanks to Django's login and logout signals. By creating a signal handler I can be notified when a user logs in or out. The signal handler code looks like this:
import logging
from django.contrib.auth.signals import user_logged_in, user_logged_out
from wiki.constants import SESSION_SET_MEMBER
logger = logging.getLogger(__name__)
def login_callback(sender, request, user, **kwargs):
"""Signal callback function for a user logging in.
Sets a flag for the middleware to create an external cookie.
"""
logger.info('User login: %s', user.username)
request.wiki_set_cookie = True
def logout_callback(sender, request, user, **kwargs):
"""Signal callback function for a user logging in.
Sets a flag for the middleware to delete the external cookie.
Since the user is about to logout, her session will be wiped out after
this function returns. This forces us to set an attribute on the request
object so that the response middleware can delete the wiki's cookie.
"""
if user:
logger.info('User logout: %s', user.username)
# Remember what Redis set member to delete by adding an attribute to the
# request object:
request.wiki_delete_cookie = request.session.get(SESSION_SET_MEMBER)
user_logged_in.connect(login_callback, dispatch_uid='wiki.signals.login')
user_logged_out.connect(logout_callback, dispatch_uid='wiki.signals.logout')
When a user logs in I want to create an external cookie for MoinMoin. But cookies can only be created on HttpResponse objects, and all we have access to here in the signal handler is the request object. The solution here is to set an attribute on the request object that a later piece of middleware will process. I at first resisted this approach, thinking it was kind of hacky. I initially decided to set a flag in the session, but then found out that in some cases the session is not always available. I then reviewed some of the Django supplied middleware classes and saw that they also set attributes on the request object, so this must be an acceptable practice.
My middleware looks like this.
class WikiMiddleware(object):
"""
Check for flags on the request object to determine when to set or delete an
external cookie for the wiki application. When creating a cookie, also
set an entry in Redis that the wiki application can validate to prevent
spoofing.
"""
def process_response(self, request, response):
if hasattr(request, 'wiki_set_cookie'):
create_wiki_session(request, response)
elif hasattr(request, 'wiki_delete_cookie'):
destroy_wiki_session(request.wiki_delete_cookie, response)
return response
The create_wiki_session() function creates the cookie for MoinMoin and stores a hash of the cookie in a shared storage area for MoinMoin to validate. In our case, Redis makes an excellent shared storage area. We create a sorted set in Redis to store our cookie hashes. The score for each hash is the timestamp of when the cookie was created. This allows us to easily delete expired cookies by score periodically.
def create_wiki_session(request, response):
"""Sets up the session for the external wiki application.
Creates the external cookie for the Wiki.
Updates the Redis set so the Wiki can verify the cookie.
"""
now = datetime.datetime.utcnow()
value = cookie_value(request.user, now)
response.set_cookie(settings.WIKI_COOKIE_NAME,
value=value,
max_age=settings.WIKI_COOKIE_AGE,
domain=settings.WIKI_COOKIE_DOMAIN)
# Update a sorted set in Redis with a hash of our cookie and a score
# of the current time as a timestamp. This allows us to delete old
# entries by score periodically. To verify the cookie, the external wiki
# application computes a hash of the cookie value and checks to see if
# it is in our Redis set.
h = hashlib.sha256()
h.update(value)
name = h.hexdigest()
score = time.mktime(now.utctimetuple())
conn = get_redis_connection()
try:
conn.zadd(settings.WIKI_REDIS_SET, score, name)
except redis.RedisError:
logger.error("Error adding wiki cookie key")
# Store the set member name in the session so we can delete it when the
# user logs out:
request.session[SESSION_SET_MEMBER] = name
We store the name of the Redis set member in the user's session so we can delete it from Redis when the user logs out. During logout, this set member is retrieved from the session in the logout signal handler and stored on the request object. This is because the session will be destroyed after the logout signal handler runs and before the middleware can access it. The middleware can check for the existence of this attribute as its cue to delete the wiki session.
def destroy_wiki_session(set_member, response):
"""Destroys the session for the external wiki application.
Delete the external cookie.
Deletes the member from the Redis set as this entry is no longer valid.
"""
response.delete_cookie(settings.WIKI_COOKIE_NAME,
domain=settings.WIKI_COOKIE_DOMAIN)
if set_member:
conn = get_redis_connection()
try:
conn.zrem(settings.WIKI_REDIS_SET, set_member)
except redis.RedisError:
logger.error("Error deleting wiki cookie set member")
As suggested in the MoinMoin external cookie documentation, I create a cookie whose value consists of the username, email address, and a key separated by the # character. The key is just a string of stuff that makes it difficult for a spoofer to recreate.
def cookie_value(user, now):
"""Creates the value for the external wiki cookie."""
# The key part of the cookie is just a string that would make things
# difficult for a spoofer; something that can't be easily made up:
h = hashlib.sha256()
h.update(user.username + user.email)
h.update(now.isoformat())
h.update(''.join(random.sample(string.printable, 64)))
h.update(settings.SECRET_KEY)
key = h.hexdigest()
parts = (user.username, user.email, key)
return '#'.join(parts)
Finally on the Django side we should periodically delete expired Redis set members in case users do not log out. Since I am using Celery with my Django application, I created a Celery task that runs periodically to delete old set members. This function is a bit longer than it probably needs to be, but I wanted to log how big this set is before and after we cull the expired entries.
@task
def expire_cookies():
"""
Periodically run this task to remove expired cookies from the Redis set
that is shared between this Django application & the MoinMoin wiki for
authentication.
"""
now = datetime.datetime.utcnow()
cutoff = now - datetime.timedelta(seconds=settings.WIKI_COOKIE_AGE)
min_score = time.mktime(cutoff.utctimetuple())
conn = get_redis_connection()
set_name = settings.WIKI_REDIS_SET
try:
count = conn.zcard(set_name)
except redis.RedisError:
logger.error("Error getting zcard")
return
try:
removed = conn.zremrangebyscore(set_name, 0.0, min_score)
except redis.RedisError:
logger.error("Error removing by score")
return
total = count - removed
logger.info("Expire wiki cookies: removed %d, total is now %d",
removed, total)
MoinMoin Implementation
As described in the MoinMoin external cookie documentation, you have to configure MoinMoin to use your external cookie authentication mechanism. It is also nice to disable the ability for the MoinMoin user to change their username and email address since that is being managed by the Django application. These changes to the MoinMoin Config class are shown below.
class Config(multiconfig.DefaultConfig):
# ...
# Use ExternalCookie method for integration authentication with Django:
auth = [ExternalCookie(autocreate=True)]
# remove ability to change username & email, etc.
user_form_disable = ['name', 'aliasname', 'email',]
user_form_remove = ['password', 'password2', 'css_url', 'logout', 'create',
'account_sendmail', 'jid']
Next we create an ExternalCookie class and associated helper functions to process the cookie and verify it in Redis. This code is shown in its entirety below. It is based off the example in the MoinMoin external cookie documentation, but uses Redis as the shared storage area.
import hashlib
import Cookie
import logging
from MoinMoin.auth import BaseAuth
from MoinMoin.user import User
import redis
COOKIE_NAME = 'YOUR_COOKIE_NAME_HERE'
# Redis connection and database settings
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
REDIS_DB = 0
# The name of the set in Redis that holds cookie hashes
REDIS_SET = 'wiki_cookie_keys'
logger = logging.getLogger(__name__)
def get_cookie_value(cookie):
"""Returns the value of the Django cookie from the cookie.
None is returned if the cookie is invalid or the value cannot be
determined.
This function works around an issue with different Python versions.
In Python 2.5, if you construct a SimpleCookie with a dict, then
type(cookie[key]) == unicode
whereas in later versions of Python:
type(cookie[key]) == Cookie.Morsel
"""
if cookie:
try:
morsel = cookie[COOKIE_NAME]
except KeyError:
return None
if isinstance(morsel, unicode): # Python 2.5
return morsel
elif isinstance(morsel, Cookie.Morsel): # Python 2.6+
return morsel.value
return None
def get_redis_connection(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_DB):
"""
Create and return a Redis connection using the supplied parameters.
"""
return redis.StrictRedis(host=host, port=port, db=db)
def validate_cookie(value):
"""Determines if cookie was created by Django. Returns True on success,
False on failure.
Looks up the hash of the cookie value in Redis. If present, cookie
is deemed legit.
"""
h = hashlib.sha256()
h.update(value)
set_member = h.hexdigest()
conn = get_redis_connection()
success = False
try:
score = conn.zscore(REDIS_SET, set_member)
success = score is not None
except redis.RedisError:
logger.error('Could not check Redis for ExternalCookie auth')
return success
class ExternalCookie(BaseAuth):
name = 'external_cookie'
def __init__(self, autocreate=False):
self.autocreate = autocreate
BaseAuth.__init__(self)
def request(self, request, user_obj, **kwargs):
user = None
try_next = True
try:
cookie = Cookie.SimpleCookie(request.cookies)
except Cookie.CookieError:
cookie = None
val = get_cookie_value(cookie)
if val:
try:
username, email, _ = val.split('#')
except ValueError:
return user, try_next
if validate_cookie(val):
user = User(request, name=username, auth_username=username,
auth_method=self.name)
changed = False
if email != user.email:
user.email = email
changed = True
if user:
user.create_or_update(changed)
if user and user.valid:
try_next = False
return user, try_next
Conclusion
I've been running this setup for a month now and it is working great. My users and I are enjoying our shiny new MoinMoin wiki integrated with our Django powered community website. The single sign-on experience is quite seamless and eliminates the need for separate accounts.
Who's Online with Redis & Python, a slight return
In a previous post, I blogged about building a "Who's Online" feature using Redis and Python with redis-py. I've been integrating Celery into my website, and I stumbled across this old code. Since I made that post, I discovered yet another cool feature in Redis: sorted sets. So here is an even better way of implementing this feature using Redis sorted sets.
A sorted set in Redis is like a regular set, but each member has a numeric score. When you add a member to a sorted set, you also specify the score for that member. You can then retrieve set members if their score falls into a certain range. You can also easily remove members outside a given score range.
For a "Who's Online" feature, we need a sorted set to represent the set of all users online. Whenever we see a user, we insert that user into the set along with the current time as their score. This is accomplished with the Redis zadd command. If the user is already in the set, zadd simply updates their score with the current time.
To obtain the curret list of who's online, we use the zrangebyscore command to retrieve the list of users who's score (time) lies between, say, 15 minutes ago, until now.
Periodically, we need to remove stale members from the set. This can be accomplished by using the zremrangebyscore command. This command will remove all members that have a score between minimum and maximum values. In this case, we can use the beginning of time for the minimum, and 15 minutes ago for the maximum.
That's really it in a nutshell. This is much simpler than my previous solution which used two sets.
So let's look at some code. The first problem we need to solve is how to convert a Python datetime object into a score. This can be accomplished by converting the datetime into a POSIX timestamp integer, which is the number of seconds from the UNIX epoch of January 1, 1970.
import datetime
import time
def to_timestamp(dt):
"""
Turn the supplied datetime object into a UNIX timestamp integer.
"""
return int(time.mktime(dt.timetuple()))
With that handy function, here are some examples of the operations described above.
import redis
# Redis set keys:
USER_SET_KEY = "whos_online:users"
# the period over which we collect who's online stats:
MAX_AGE = datetime.timedelta(minutes=15)
# obtain a connection to redis:
conn = redis.StrictRedis()
# add/update a user to the who's online set:
username = "sally"
ts = to_timestamp(datetime.datetime.now())
conn.zadd(USER_SET_KEY, ts, username)
# retrieve the list of users who have been active in the last MAX_AGE minutes
now = datetime.datetime.now()
min = to_timestamp(now - MAX_AGE)
max = to_timestamp(now)
whos_online = conn.zrangebyscore(USER_SET_KEY, min, max)
# e.g. whos_online = ['sally', 'harry', 'joe']
# periodically remove stale members
cutoff = to_timestamp(datetime.datetime.now() - MAX_AGE)
conn.zremrangebyscore(USER_SET_KEY, 0, cutoff)
A better "Who's Online" with Redis & Python
Updated on December 17, 2011: I found a better solution. Head on over to the new post to check it out.
Who's What?
My website, like many others, has a "who's online" feature. It displays the names of authenticated users that have been seen over the course of the last ten minutes or so. It may seem a minor feature at first, but I find it really does a lot to "humanize" the site and make it seem more like a community gathering place.
My first implementation of this feature used the MySQL database to update a per-user timestamp whenever a request from an authenticated user arrived. Actually, this seemed excessive to me, so I used a strategy involving an "online" cookie that has a five minute expiration time. Whenever I see an authenticated user without the online cookie I update their timestamp and then hand them back a cookie that will expire in five minutes. In this way I don't have to hit the database on every single request.
This approach worked fine but it has some aspects that didn't sit right with me:
- It seems like overkill to use the database to store temporary, trivial information like this. It doesn't feel like a good use of a full-featured relational database management system (RDBMS).
- I am writing to the database during a GET request. Ideally, all GET requests should be idempotent. Of course if this is strictly followed, it would be impossible to create a "who's online" feature in the first place. You'd have to require the user to POST data periodically. However, writing to a RDBMS during a GET request is something I feel guilty about and try to avoid when I can.
Redis
Enter Redis. I discovered Redis recently, and it is pure, white-hot awesomeness. What is Redis? It's one of those projects that gets slapped with the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes sense to me when described as a lightweight data structure server. Memcached can store key-value pairs very fast, where the value is always a string. Redis goes one step further and stores not only strings, but data structures like lists, sets, and hashes. For a great overview of what Redis is and what you can do with it, check out Simon Willison's Redis tutorial.
Another reason why I like Redis is that it is easy to install and deploy. It is straight C code without any dependencies. Thus you can build it from source just about anywhere. Your Linux distro may have a package for it, but it is just as easy to grab the latest tarball and build it yourself.
I've really come to appreciate Redis for being such a small and lightweight tool. At the same time, it is very powerful and effective for filling those tasks that a traditional RDBMS is not good at.
For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py client library. It can be installed with a simple
$ sudo pip install redis
Who's Online with Redis
Now that we are going to use Redis, how do we implement a "who's online" feature? The first step is to get familiar with the Redis API.
One approach to the "who's online" problem is to add a user name to a set whenever we see a request from that user. That's fine but how do we know when they have stopped browsing the site? We have to periodically clean out the set in order to time people out. A cron job, for example, could delete the set every five minutes.
A small problem with deleting the set is that people will abruptly disappear from the site every five minutes. In order to give more gradual behavior we could utilize two sets, a "current" set and an "old" set. As users are seen, we add their names to the current set. Every five minutes or so (season to taste), we simply overwrite the old set with the contents of the current set, then clear out the current set. At any given time, the set of who's online is the union of these two sets.
This approach doesn't give exact results of course, but it is perfectly fine for my site.
Looking over the Redis API, we see that we'll be making use of the following commands:
- SADD for adding members to the current set.
- RENAME for copying the current set to the old, as well as destroying the current set all in one step.
- SUNION for performing a union on the current and old sets to produce the set of who's online.
And that's it! With these three primitives we have everything we need. This is because of the following useful Redis behaviors:
- Performing a SADD against a set that doesn't exist creates the set and is not an error.
- Performing a SUNION with sets that don't exist is fine; they are simply treated as empty sets.
The one caveat involves the RENAME command. If the key you wish to rename does not exist, the Python Redis client treats this as an error and an exception is thrown.
Experimenting with algorithms and ideas is quite easy with Redis. You can either use the Python Redis client in a Python interactive interpreter shell, or you can use the command-line client that comes with Redis. Either way you can quickly try out commands and refine your approach.
Implementation
My website is powered by Django, but I am not going to show any Django specific code here. Instead I'll show just the pure Python parts, and hopefully you can adapt it to whatever framework, if any, you are using.
I created a Python module to hold this functionality: whos_online.py. Throughout this module I use a lot of exception handling, mainly because if the Redis server has crashed (or if I forgot to start it, say in development) I don't want my website to be unusable. If Redis is unavailable, I simply log an error and drive on. Note that in my limited experience Redis is very stable and has not crashed on me once, but it is good to be defensive.
The first important function used throughout this module is a function to obtain a connection to the Redis server:
import logging
import redis
logger = logging.getLogger(__name__)
def _get_connection():
"""
Create and return a Redis connection. Returns None on failure.
"""
try:
conn = redis.Redis(host=HOST, port=PORT, db=DB)
return conn
except redis.RedisError, e:
logger.error(e)
return None
The HOST, PORT, and DB constants can come from a configuration file or they could be module-level constants. In my case they are set in my Django settings.py file. Once we have this connection object, we are free to use the Redis API exposed via the Python Redis client.
To update the current set whenever we see a user, I call this function:
# Redis key names:
USER_CURRENT_KEY = "wo_user_current"
USER_OLD_KEY = "wo_user_old"
def report_user(username):
"""
Call this function when a user has been seen. The username will be added to
the current set.
"""
conn = _get_connection()
if conn:
try:
conn.sadd(USER_CURRENT_KEY, username)
except redis.RedisError, e:
logger.error(e)
If you are using Django, a good spot to call this function is from a piece of custom middleware. I kept my "5 minute cookie" algorithm to avoid doing this on every request although it is probably unnecessary on my low traffic site.
Periodically you need to "age out" the sets by destroying the old set, moving the current set to the old set, and then emptying the current set.
def tick():
"""
Call this function to "age out" the old set by renaming the current set
to the old.
"""
conn = _get_connection()
if conn:
# An exception may be raised if the current key doesn't exist; if that
# happens we have to delete the old set because no one is online.
try:
conn.rename(USER_CURRENT_KEY, USER_OLD_KEY)
except redis.ResponseError:
try:
del conn[old]
except redis.RedisError, e:
logger.error(e)
except redis.RedisError, e:
logger.error(e)
As mentioned previously, if no one is on your site, eventually your current set will cease to exist as it is renamed and not populated further. If you attempt to rename a non-existent key, the Python Redis client raises a ResponseError exception. If this occurs we just manually delete the old set. In a bit of Pythonic cleverness, the Python Redis client supports the del syntax to support this operation.
The tick() function can be called periodically by a cron job, for example. If you are using Django, you could create a custom management command that calls tick() and schedule cron to execute it. Alternatively, you could use something like Celery to schedule a job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope to explore in the near future).
Finally, you need a way to obtain the current "who's online" set, which again is a union of the current and old sets.
def get_users_online():
"""
Returns a set of user names which is the union of the current and old
sets.
"""
conn = _get_connection()
if conn:
try:
# Note that keys that do not exist are considered empty sets
return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY])
except redis.RedisError, e:
logger.error(e)
return set()
In my Django application, I calling this function from a custom inclusion template tag .
Conclusion
I hope this blog post gives you some idea of the usefulness of Redis. I expanded on this example to also keep track of non-authenticated "guest" users. I simply added another pair of sets to track IP addresses.
If you are like me, you are probably already thinking about shifting some functions that you awkwardly jammed onto a traditional database to Redis and other "NoSQL" technologies.