Here’s a quick explanation of a gotcha I ran into while writing my own Django process_response middleware. I had written some middleware to perform a “who’s online” type function some time ago. The middleware checks to see if a cookie is present. If not, it updates the database with a timestamp for the user, and then sets a cookie with a lifespan of 10 minutes. In this way I only have to update the database with “last seen” information about every 10 minutes, per user. Because I need to set a cookie, this processing had to be done in a process_response middleware. I had tested and debugged this middleware and I thought everything was fine.
But then yesterday I was checking various links on my site, and one link in particular generated a traceback from my middleware: “AttributeError: ‘WSGIRequest’ object has no attribute ‘user’.” What? Why was this happening on this one particular view and only this view?
One of the first things I do in my process_response function is check to see if the user is authenticated:
if request.user.is_authenticated(): ...
After putting a breakpoint in this code with pdb, I observed that indeed the request object my middleware received had no “user” attribute. This had me stupified until I suddenly realized that the link I had clicked on was of the form /xxx/yyy instead of /xxx/yyy/ due to a typo in my template. I forgotten the trailing slash. Aha, this was a clue, but I still couldn’t piece it together, and I had to get some sleep. I don’t know about you, but I hate going to sleep when you have a nagging unsolved problem.
Tonight I tackled the problem again. I remembered I have the APPEND_SLASH setting in my settings.py file set to True. This normalizes all my URLs such that they should all end in a trailing slash. This is implemented itself via a piece of middleware, the so called “Common” middleware which is baked into Django itself. I knew that there was some interaction going on between the CommonMiddleware and my own middleware. After reviewing the middleware docs to refresh my memory on how all this works, I turned my attention to my MIDDLEWARE_CLASSES setting:
MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'debug_toolbar.middleware.DebugToolbarMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'gpp.core.middleware.InactiveUserMiddleware', 'gpp.core.middleware.WhosOnline', 'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware', )
Because of the order above, the first piece of middleware to process a request is the CommonMiddleware. When it observed the lack of a trailing slash, it returned a HttpResponseRedirect object to redirect the user to the correct URL with a trailing slash. This immediately killed the the process_request chain of middleware processing. Thus, the AuthenticationMiddleware never got to run, and this is the middleware that attaches the user object to the request. Furthermore, since the request processing ended, we now start the process_response processing. This works through the above middleware list backwards. So when my WhosOnline middleware gets called, it is suddenly presented with a request object that has no user attribute. And bam, I hit my bug.
To fix this problem, one could reorder the list of middleware. But finding a middleware ordering that will satisfy all of your installed middleware is a kind of a maddening process that requires flipping through many doc pages. It just appeared to be too brittle of a solution. I’ll probably add some middleware in the future and completely upset the apple cart again. In the end, I decided to just guard against the case of not having a user attribute on the request object and just bail out early. In this use case, it isn’t important to see every single response. I can simply make it up on the next one. So I modified my code to look like this:
def process_response(self, request, response):
"""
Keep track of who is online.
"""
# Note that some requests may not have a user attribute
# as these may have been redirected in the middleware chain before
# the auth middleware got a chance to run. If this is the case, just
# bail out. We also ignore AJAX requests.
if not hasattr(request, 'user') or request.is_ajax():
return response
if request.user.is_authenticated():
...
This was a difficult and unexpected bug. However it produced one of those rare, genuine “light bulb” moments as it forced me to take a hard look at Django’s middleware processing in detail and finally learn it front to back.
Tags: django, middleware
Django 1.2 is coming out soon. I had been sitting on a trunk version since last October or November, and I finally decided to update to try out some of the new features. This blog post will summarize my experience and report on any gotchas I ran into.
First of all, I have to mention the great new site Django Advent. Django Advent was created as a way to publicize the new features and exciting changes coming in version 1.2. I assume this site was inspired by The 14 Days of jQuery, which did a similar thing for the great Javascript library jQuery. If you haven’t already, please visit Django Advent to get a nice overview of the changes. I developed a quick “shopping list” of features I wanted to add to my site after perusing Django Advent.
A more obvious way of finding out what changed is to check Django’s fine development documentation. In particular, check out the Django 1.2 release notes. Note that since 1.2 isn’t out yet, these notes are likely to change, so check back on them from time to time until the final release. These notes don’t tell the complete story. You’ll likely also want to read the Django Deprecation Timeline. The Django documentation is really great for an open-source project (or any project for that matter). I highly recommend spending some time familiarizing yourself with all the information that is available in the docs.
So with some trepidation and giddiness, I finally did the svn up command and pulled down a hot off-the-press copy of trunk. What happened next? My site under development still ran, and after clicking around randomly I found nothing obviously broken. Yes, I know I need to get some tests written for my site and applications to make this more scientific and repeatable.
One of the first things I did was cut my settings.py over to the new settings format for configuring your databases. That’s right, databases, plural. Although I don’t have an immediate use-case for this feature, I can easily see it becoming useful in the future. In any event, I appreciate the work that went into this, and it should help Django get more accepted in the enterprise world. Please read the Django Advent article on this feature for more background.
The second change I made was to try out the cached template loader. This loader caches the compiled templates in your site’s cache, and thus Django doesn’t need to go to the filesystem (often multiple times) on every request to fetch template files. Again, read the Django Advent article (and this one) for more explanation of this great new feature.
This was very easy to setup, however I totally derailed myself when doing so. When I reviewed my settings.py and began to cut it over to the cached template loader, I threw out the “app_directories” loader from my list of template loaders. I didn’t need that, I have all my templates under a common templates directory (with sub-directories under it for my apps). I then happily confirmed my templates were being cached and went on to another task.
It wasn’t until a few days later that I noticed my admin wasn’t working; it couldn’t find the login template. Huh? And gee, my Admin docs stopped working too. Well after some flailing about, I realized that indeed *I* don’t use the app_directories loader, but several applications I didn’t write do. In particular, I was reminded that yes, the Django admin is, in fact, an application. Ha-ha! Whoops. Okay, I put the app_directories loader back and all was well.
The cached template loader will be useful in production, but I think I’m going to have to turn it off in development. I noticed already that when I change a template and then hit reload on my browser, gee, my change isn’t seen. This is not going to be a problem since settings.py is just a Python file, I’ll just conditionally use the correct loader for my current environment.
Once again, I refer you to the corresponding Django Advent article for an overview. A new messaging system has been put into place, replacing the old functionality that was tied to contrib.auth. I liked how this new system took the lead from the Python logging module for its design. I can imagine situations where it may be useful to filter messages at certain levels, for example.
It was very straight-forward to cut over to this new scheme. I appreciated being able to use the tags feature to tack on CSS styling to messages.
Many improvements were made to the syndication feed application. In particular, I liked the increased flexibility in the URL routing. It was easy to cut my syndication feed classes over to the new system, and along the way it looks like I was able to gain some additional RSS functionality thanks to the improvements in the base class.
I did run into one snag that was quickly resolved. I was using the cache_page decorator in my URLconf to cache the output of my feed classes. This stopped working after the upgrade. After I reported this problem on the Django Users mailing list, Django core developer Russell Keith-Magee confirmed it was a problem and wrote a ticket on this issue. Within hours it was resolved. Thanks Russell! Someday I hope to understand the root problem, which apparently is a bug in Python itself. I still need to do some more homework on how Python decorators work.
And finally, major updates to the CSRF protection system landed. Since I was not using this feature, I skipped over reading the release notes about it. Thus I was surprised when my login stopped working and started throwing CSRF related errors. It turns out that even though I am not using the CSRF middleware, all of the contrib applications, including admin and auth, have been cut over to use it. Normally this is not a problem, but as the upgrade notes state, if you aren’t using the provided contrib templates and you POST to a contrib view, things will stop working. The solution is to add the {% csrf_token %} to your custom template.
I am probably going to spend some time and cut all my applications over to use the CSRF protection. Django makes it easy to do, so it is really a no-brainer to add a bit of security to my site. It will be tedious to find all the existing forms in my templates to add the {% csrf_token %}, but that is a one-time task. I can easily add them to future forms as they are created.
Well that is all the issues I’ve run into so far, and as you can see, they are pretty minor and/or self-inflicted. But I hope this write-up will help other people on the fence about upgrading, or to just give them pointers on where to find upgrade information. Again, the Django Advent site combined with Django’s comprehensive documentation makes this upgrade easy. The new features rock and I can’t wait to incorporate them into my site and get some mileage on them. Thanks to all the Django developers and contributors for such a great piece of software.
Dougal Matthews wrote a great blog post entitled “Testing Your First Django App“. This is something that I have been meaning to do for a long time now, but didn’t know how to get started. Much to my surprise (I guess I shouldn’t be), Python supports the xUnit style of testing via the standard library package unittest. Since we are now using CxxTest at work, this is quite familiar to me now. Dougal’s blog entry shows some nice ways of testing Django web applications without using a server by mocking up requests and examining responses from your view functions. Very cool!
I was looking at the SQL my views were generating, and I came across a couple of places where I was using get_object_or_404(), and then later following some foreign keys in the returned object. Something like this:
forum = get_object_or_404(Forum, slug=slug) if not forum.category.can_access(request.user): return HttpResponseForbidden()
The problem is that two SQL queries occur here, one during the get_object_or_404(), and then another in the following if statement, when we access category, a foreign key on the forum object. It would sure be nice to somehow use a select_related() there to avoid the extra SQL query. I did some googling, and found a quick tip on one of the This Week in Django podcast pages. And yes, the documentation confirmed that get_object_or_404() can now take as a first argument either a model, a manager, or a queryset!
So now you can keep using the handy get_object_or_404() idiom, and reduce the number of queries with a slight bit of refactoring:
forum = get_object_or_404(Forum.objects.select_related(), slug=slug) if not forum.category.can_access(request.user): return HttpResponseForbidden()
Very cool!
I installed memcached on my production server a while back. It’s supposed to be thee way to get fast and efficient caching for your Django powered website. I remember the process as being somewhat less than satisfying. Tonight I decided to get it running on my development box, which is running Ubuntu 8.04. So I took a lot of notes and present them here for my own future reference. I hope this may help someone. And as you can see, I have a few questions myself that perhaps someone can help me with. Read on…
(more…)
Tags: cmemcache, daemon, django, libmemcache, memcached
I started working on a simple Django application for accepting and recording Paypal donations. While I was working on the IPN code, it suddenly occurred to me that I really needed a way to log any errors that might occur. After all the IPN process will be initiated by Paypal completely out of my control (not counting the Paypal sandbox) and without any visual feedback. Thus, I’d like a record of the path through my code to make sure everything is working the way I expected.
I had learned about the brilliant Python logging module some time ago, and had even used it in my IRC bot application. But could you use this with Django?
I did some research and found a couple of blog posts and related Django projects. After studying them I came to the conclusion that indeed the Python logging module is an excellent way to add logging to your web application. And after working with it again, I am very impressed by the functionality that it offers and how easy it is to use. I would have killed to have something like this in my PHP days. So I laced some logging calls throughout my IPN listener code, and I’ll know soon enough if it works correctly. I’ll post more about this later. But for now, I’d like to add some links to the things I found useful related to logging in Django.
First there is this very useful blog post by Simon Willison titled “Debugging Django.” In addition to talking about logging, Simon also has tips on using the Python debugger, asserts, and some useful middleware.
Second, there is a Django application called django-logging. This application seems mainly aimed at getting your logging statements displayed at the bottom of your web pages while debugging a problem. Of course you could also hook in a logger to log to a file, which is more of what I wanted to do. Another useful feature of this application is that you can configure it to automatically log your application’s SQL queries.
And finally I looked at Fairview Computing’s Django request logging, or drlog. This project provides some middleware which adds the ability to add a unique identifier to each log entry to associate it with a given HTTP request. This allows you to easily trace a single request, even while multiple concurrent requests are happening.
Studying Simon’s blog post and the source code for the above two applications was very enlightening. In the end I decided to start with the bare Python logging facility for now, configuring it to write to a file. I was reassured to read that the Python logging module is thread safe. If I start using the logging module heavily, I may use the drlog middleware to help me map log statements to HTTP requests.
I just got finished integrating Leah Culver’s django-elsewhere application. Django-elsewhere was formerly Django-PSN (Portable Social Networks) and was originally created for the now defunct social networking site Pownce. This nifty application allows your users to add an arbitrary number of social networks, websites, and instant messengers to their profile. The application even comes with many icons for widely known sites.
In my previous design I had just stuck a few fields in my user profile for websites and a few of the common instant messengers. This was limiting, and I had been thinking about expanding it to a more general solution when I stumbled across this application.
To integrate it with my site, I created a template tag to display a user’s “elsewhere” sites, and I made a view and template to allow a user to edit their sites. This code was based off the example view and template that came with the application. In general the django-elsewhere code quality is quite high. There are still a few print statements in the code base, but that’s all I can find fault with right now.
Thank you django-elsewhere team for the big time saver!
Tags: django, django-elsewhere, sg101
Based on this blog post by Django co-BDFL Jacob Kaplan-Moss, I wanted to try using html5lib to sanitize user input. I’m using Markdown on most of the site. But in one particular place (news items), I am (currently) allowing users to submit HTML news stories with the TinyMCE Javascript editor. This is mainly because my users like to copy and paste content from sites like MySpace, and TinyMCE might be easier for them to use than Markdown. I may revisit this decision, but for now we’ll go with it.
I was using the lxml sanitizer for this purpose. But because of the high praises html5lib received from Jacob, and from studying the source code to both, html5lib gives me greater confidence, even if it is an order of magnitude slower. But, it isn’t like this is going to get used more than a few times a day, so that isn’t a concern.
Never having used html5lib, or any other HTML/XML parser before, it was a bit confusing to figure out how to use it for this task. After studying the code and the html5lib news group, I came up with the following bit of code I thought I would share. Comments are extremely welcome.
import html5lib
from html5lib import sanitizer, treebuilders, treewalkers, serializer
def sanitizer_factory(*args, **kwargs):
san = sanitizer.HTMLSanitizer(*args, **kwargs)
# This isn't available yet
# san.strip_tokens = True
return san
def clean_html(buf):
"""Cleans HTML of dangerous tags and content."""
buf = buf.strip()
if not buf:
return buf
p = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("dom"),
tokenizer=sanitizer_factory)
dom_tree = p.parseFragment(buf)
walker = treewalkers.getTreeWalker("dom")
stream = walker(dom_tree)
s = serializer.htmlserializer.HTMLSerializer(
omit_optional_tags=False,
quote_attr_values=True)
return s.render(stream)
I haven’t tested it extensively yet, but it seems to do the trick. I understand a future version of html5lib will have an option to strip completely out offending tags. Right now they are simply rendered harmless and remain in the input (via < and >). This is fine, as I can see them in the admin as I review submitted stories.
Tags: django, html5lib, sanitization, sg101
Check out this ticket! Granted, it is just a typo fix to a Python docstring, but you got to start somewhere.
What a great feeling in any event.
I hope I can contribute more meaningful features and bug fixes in the future.
Tags: django
I ended up creating a time zone picker for the event calendar. I saw the idea on the web somewhere. The problem is that there are nearly 400 common time zones in the database. Since every time zone is named in the format “area/location”, I created an area select and a location select. That broke up the time zones nicely, although some of the areas still have far too many entries to be completely convenient. I wrote a short Python script that parsed the pytz common time zones and generated a Javascript object literal to contain the select menus contents. Here is a screen shot showing it in action:

When you select an area (the left-most) control, the location select fills with the appropriate options. When the form is submitted, some Javascript runs to take the two select values and puts them together and populates a hidden time zone input field with the result. So, in the example above, when the form is submitted, the hidden field receives “US/Pacific”. Likewise, when the form is displayed, the hidden field is parsed and the two select controls are set accordingly. This works pretty well, although I think I could have done a better job of modularizing this code in case I need to use it in another place on the site (such as in a user’s profile). I will definitely do this later.
I’ve decided to tackle recurring events later, as it seems a bit involved, and as I stated, very few events on the calendar need this capability. So with the time zone picker in place, and the corresponding code on the server side (thanks to pytz), I can now accurately add events to the event calendar without losing local time information.
I also sat down finally and converted The Madeira’s website from mod_python to mod_wsgi. This wouldn’t have been possible without the excellent documentation that mod_wsgi has. I feel this will scale better, and it will allow me to more easily run multiple Python web applications side by side. I am anxious to get a Trac issue tracker running as well as a beta version of the new site.
The rest of the weekend was spent working the “to-do” list for the site in preparation for deploying a beta version. I really do need to get an issue tracker going to capture all the ideas and work I need to complete.