msgbartop
by Brian Neal
msgbarbottom

20 Dec 09 SG101 2.0 Status Report

This is the obligatory “why haven’t you been blogging about your project in a while” post. Yes, I suppose it is time to give a quick update.

Things slowed down dramatically on SG101 2.0 this summer. There was the 2009 SG101 convention vacation and other fun summer things to distract me. I had put up a beta site for feedback but was still missing a forums application. I got the itch to start back up again sometime in the Fall. I decided to start on a forums application myself and to see how it went. If it was too complicated I would look for a third-party solution.

This is one case where I did look at several third-party applications for ideas and to check on their status. There doesn’t seem to be a single recognized forums application in Django-land. There are a number of them, and they range from very simple to moderately complex. Many of them seem unsupported and have obvious problems. So in the end, I decided that since the forums are probably the most important part of my site, it would be best if I wrote it myself so that I could understand it completely. This includes both strengths and weaknesses. I did borrow many ideas from existing applications, and some of my initial momentum came from djangobb. However, I quickly stopped looking at other apps because Django really makes it easy to write complex web applications once you get an idea and try a few things.

My forum app contains most of the functionality of the venerable phpBB-based board I have now. I added a few things like the ability for users to flag posts as spam or abuse (I sure wish I had that now). I am considering making the first few posts of a user require approval to counter spam. But I’m not sure it is worth the effort with the “flag post” feature in-place. I might just wait and see how well that works.

I also decided to save a user’s post read and unread status in the database, instead of using cookies. Too many of my existing users complain that when their cookies expire they lose track of which threads are new. It will cost some database space, no doubt, but it is an often requested feature to fix this issue. I implemented a rolling 7-day window of thread and post read status, and in initial tests it seems to work just fine. It did add significant complexity to the design however, and I’m not looking forward to debugging that logic when a problem occurs.

After finishing the forums, I began working on my lengthy to-do list using my Trac issue tracker. I also spent a great deal of time refactoring some of my original code that I wrote over a year ago. I’ve become so much more proficient with Python and jQuery it is inevitable. My task list has become quite small and I am thinking about wiping the existing beta site and putting up a new one over holiday break and launching an official beta test.

The one area that I am lacking in right now is a good design and layout. A few users have volunteered to help with that, and one in particular is showing me some really nice work. If I can just manage to implement his design we may be on to something. I may also try to reach out to someone who is familiar with Django.

There are a couple of interesting problems I either solved or worked around during this period that I should blog about. I’ll just have to find the time to do that. In particular, I wanted to share how I created an admin dashboard for user-created content that needs admin approval before being published.

I’ve also volunteered to give a “brown-bag” lunchtime talk at my employer on Python. I’ll have to prepare some slides over the holiday break for this.

Tags: ,

15 Sep 09 Django Tip: get_object_or_404() and select_related()

I was looking at the SQL my views were generating, and I came across a couple of places where I was using get_object_or_404(), and then later following some foreign keys in the returned object. Something like this:

forum = get_object_or_404(Forum, slug=slug)
if not forum.category.can_access(request.user):
     return HttpResponseForbidden()

The problem is that two SQL queries occur here, one during the get_object_or_404(), and then another in the following if statement, when we access category, a foreign key on the forum object. It would sure be nice to somehow use a select_related() there to avoid the extra SQL query. I did some googling, and found a quick tip on one of the This Week in Django podcast pages. And yes, the documentation confirmed that get_object_or_404() can now take as a first argument either a model, a manager, or a queryset!

So now you can keep using the handy get_object_or_404() idiom, and reduce the number of queries with a slight bit of refactoring:

forum = get_object_or_404(Forum.objects.select_related(), slug=slug)
if not forum.category.can_access(request.user):
     return HttpResponseForbidden()

Very cool!

Tags: ,

13 Jun 09 Installing memcached for use with Python and Django

I installed memcached on my production server a while back. It’s supposed to be thee way to get fast and efficient caching for your Django powered website. I remember the process as being somewhat less than satisfying. Tonight I decided to get it running on my development box, which is running Ubuntu 8.04. So I took a lot of notes and present them here for my own future reference. I hope this may help someone. And as you can see, I have a few questions myself that perhaps someone can help me with. Read on…
(more…)

Tags: , , , ,

12 Jun 09 Django and Python Logging

I started working on a simple Django application for accepting and recording Paypal donations. While I was working on the IPN code, it suddenly occurred to me that I really needed a way to log any errors that might occur. After all the IPN process will be initiated by Paypal completely out of my control (not counting the Paypal sandbox) and without any visual feedback. Thus, I’d like a record of the path through my code to make sure everything is working the way I expected.

I had learned about the brilliant Python logging module some time ago, and had even used it in my IRC bot application. But could you use this with Django?

I did some research and found a couple of blog posts and related Django projects. After studying them I came to the conclusion that indeed the Python logging module is an excellent way to add logging to your web application. And after working with it again, I am very impressed by the functionality that it offers and how easy it is to use. I would have killed to have something like this in my PHP days. So I laced some logging calls throughout my IPN listener code, and I’ll know soon enough if it works correctly. I’ll post more about this later. But for now, I’d like to add some links to the things I found useful related to logging in Django.

First there is this very useful blog post by Simon Willison titled “Debugging Django.” In addition to talking about logging, Simon also has tips on using the Python debugger, asserts, and some useful middleware.

Second, there is a Django application called django-logging. This application seems mainly aimed at getting your logging statements displayed at the bottom of your web pages while debugging a problem. Of course you could also hook in a logger to log to a file, which is more of what I wanted to do. Another useful feature of this application is that you can configure it to automatically log your application’s SQL queries.

And finally I looked at Fairview Computing’s Django request logging, or drlog. This project provides some middleware which adds the ability to add a unique identifier to each log entry to associate it with a given HTTP request. This allows you to easily trace a single request, even while multiple concurrent requests are happening.

Studying Simon’s blog post and the source code for the above two applications was very enlightening. In the end I decided to start with the bare Python logging facility for now, configuring it to write to a file. I was reassured to read that the Python logging module is thread safe. If I start using the logging module heavily, I may use the drlog middleware to help me map log statements to HTTP requests.

Tags: ,

03 May 09 Django-Elsewhere

I just got finished integrating Leah Culver’s django-elsewhere application. Django-elsewhere was formerly Django-PSN (Portable Social Networks) and was originally created for the now defunct social networking site Pownce. This nifty application allows your users to add an arbitrary number of social networks, websites, and instant messengers to their profile. The application even comes with many icons for widely known sites.

In my previous design I had just stuck a few fields in my user profile for websites and a few of the common instant messengers. This was limiting, and  I had been thinking about expanding it to a more general solution when I stumbled across this application.

To integrate it with my site, I created a template tag to display a user’s “elsewhere” sites, and I made a view and template to allow a user to edit their sites. This code was based off the example view and template that came with the application. In general the django-elsewhere code quality is quite high. There are still a few print statements in the code base, but that’s all I can find fault with right now.

Thank you django-elsewhere team for the big time saver!

Tags: , ,

12 Apr 09 Using html5lib to Sanitize User Input

Based on this blog post by Django co-BDFL Jacob Kaplan-Moss, I wanted to try using html5lib to sanitize user input. I’m using Markdown on most of the site. But in one particular place (news items), I am (currently) allowing users to submit HTML news stories with the TinyMCE Javascript editor. This is mainly because my users like to copy and paste content from sites like MySpace, and TinyMCE might be easier for them to use than Markdown. I may revisit this decision, but for now we’ll go with it.

I was using the lxml sanitizer for this purpose. But because of the high praises html5lib received from Jacob, and from studying the source code to both, html5lib gives me greater confidence, even if it is an order of magnitude slower. But, it isn’t like this is going to get used more than a few times a day, so that isn’t a concern.

Never having used html5lib, or any other HTML/XML parser before, it was a bit confusing to figure out how to use it for this task. After studying the code and the html5lib news group, I came up with the following bit of code I thought I would share. Comments are extremely welcome.

import html5lib
from html5lib import sanitizer, treebuilders, treewalkers, serializer

def sanitizer_factory(*args, **kwargs):
    san = sanitizer.HTMLSanitizer(*args, **kwargs)
    # This isn't available yet
    # san.strip_tokens = True
    return san

def clean_html(buf):
    """Cleans HTML of dangerous tags and content."""
    buf = buf.strip()
    if not buf:
        return buf

    p = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("dom"),
            tokenizer=sanitizer_factory)
    dom_tree = p.parseFragment(buf)

    walker = treewalkers.getTreeWalker("dom")
    stream = walker(dom_tree)

    s = serializer.htmlserializer.HTMLSerializer(
            omit_optional_tags=False,
            quote_attr_values=True)
    return s.render(stream)

I haven’t tested it extensively yet, but it seems to do the trick. I understand a future version of html5lib will have an option to strip completely out offending tags. Right now they are simply rendered harmless and remain in the input (via < and >). This is fine, as I can see them in the admin as I review submitted stories.

Tags: , , ,

07 Apr 09 I contributed to Django!

Check out this ticket! Granted, it is just a typo fix to a Python docstring, but you got to start somewhere. :) What a great feeling in any event.

I hope I can contribute more meaningful features and bug fixes in the future.

Tags:

06 Apr 09 Infrastructure: Trac & Subversion

I’ve been wanting to get some kind of issue tracker up and running for some time now. Trac seems like a great choice. We’ve used it where I work, and the Django project uses it. I even managed to install it on Windows at work. Still, I was kind of dreading trying to get it working on the dedicated server I rent. I finally gathered the strength and tackled this problem this weekend, and it went far easier than I imagined.

Subversion

First of all, I decided I might as well upgrade my Subversion (SVN) server while I am at it. I see that Subversion 1.6 is out now. However, reading the fine print, I noticed that they seemed to have changed their Python bindings in 1.6, and I wasn’t sure if Trac is compatible with this. So without doing any further research I decided to just run the last stable version before that, 1.5.6.

My dedicated server is running Fedora Core 6, which isn’t maintained anymore, so there is no way to my knowledge of getting a binary package for these recent builds of SVN. I need to build from source. I had done this once before, and I even took detailed notes (which I had forgot about). Building from source is fairly easy, but there is one gotcha on the AMD64 server I run, you need to invoke the configure script with an –enable-shared switch. Luckily I wrote this down from the first time I did this. Getting the required dependencies for the source build isn’t too hard. The Subversion folks helpfully package some of the less readily available dependencies, so it is just a matter of grabbing them and untarring them on top of the unpacked source tarball.

Since I wanted to integrate Trac with my Subversion repository, I needed to ensure I built the Python Subversion bindings. I used Yum, the package manager that comes with Fedora, to make sure I had SWIG installed before I ran configure to build SVN. Then it is a simple matter of building the Python SWIG bindings after Subversion proper is built. This is explained very well in the Subversion documentation.

This seemed to go well, although I had a minor heart attack when Apache crashed the first time I tried to restart it with the new SVN in place. Another restart and it was fine. Hmmm. In short order I had upgraded my existing repository and things seemed to be working fine.

A New Subdomain

I then created a new subdomain to host my issue tracker. I rely on the Plesk control panel to do this lifting for me. It came installed with the server, and I rely on it heavily to configure Apache, the mail server, etc. I’m not a hard core server admin, so this is a big help. Although I can see the day when the training wheels can come off as I become more familiar with Linux and these tools. I can sort of see what Plesk is doing by examining the config files it creates and it doesn’t appear to be rocket science. Still, it is a big time saver for me.

Trac

To get Trac installed requires getting all the dependencies in place first. In most cases, I was was able to use Yum to get most of the dependencies in binary form from the Fedora repository. Despite the fact that Fedora Core 6 is pretty old, the version numbers of the dependencies in the repository were still compatible with the newest version of Trac. The one notable exception was the template engine Trac uses, Genshi. In this case a simple “easy_install Genshi” did the trick. Nice.

I might have been able to easy_install Trac, but the docs say that this only works for Python 2.5 and 2.6. I’m still running 2.4 on the dedicated server. Upgrading my OS is definitely on the long term to-do list, but I must take baby-steps for now. But it was a simple matter of grabbing the Trac tarball, untarring it, and doing the usual “python setup.py install”. It went flawlessly.

Now luckily I had setup Trac at work before, so I already knew what to do. I ran the command-line Trac admin tool to create a project and tied it to my new Subversion repository. Trac comes with a development server, and I ran that after configuring the project. I could then point my browser at my server and see my new Trac project for the first time! Things are cooking at this point.

Mod_WSGI

Of course I can’t use the development server for real work. So the next step was to get Apache to serve my Trac project. I once again chose mod_wsgi as the deployment method, after just recently converting The Madeira site from mod_python to mod_wsgi. The mod_wsgi documentation is excellent, and a wiki page covers integrating Trac and mod_wsgi in great detail. After studying the docs for a short while I had the magic Apache configuration down. I restarted Apache, and once again I was amazed that things were working on the first try. I had been pretty lucky so far. (In fact the most trouble I had that day was trying to change the logo on the Trac site!)

At Last…

I was now ready to configure my Trac project and get my new Subversion repository loaded. I had an existing Subversion repository that I was doing all my work in. However I had checked in some settings files that contained database password information. Shortly after realizing this I just locked the whole repository down. Since then, I have learned the Django settings.py and local_settings.py trick, and have placed the sensitive information in the local_settings.py file (which is not controlled in SVN). Now I can have a public read-only repository again.

So here it is, ready for beta testing: http://code.surfguitar101.com. Now there isn’t anything stopping me; I have to do the real work of deploying a beta version of SG101 2.0 for testing and feedback.

Tags: , , ,

29 Mar 09 Event Calendar: Time Zone Picker and Updates

I ended up creating a time zone picker for the event calendar. I saw the idea on the web somewhere. The problem is that there are nearly 400 common time zones in the database. Since every time zone is named in the format “area/location”, I created an area select and a location select. That broke up the time zones nicely, although some of the areas still have far too many entries to be completely convenient. I wrote a short Python script that parsed the pytz common time zones and generated a Javascript object literal to contain the select menus contents. Here is a screen shot showing it in action:

Time Zone Picker

When you select an area (the left-most) control, the location select fills with the appropriate options. When the form is submitted, some Javascript runs to take the two select values and puts them together and populates a hidden time zone input field with the result. So, in the example above, when the form is submitted, the hidden field receives “US/Pacific”. Likewise, when the form is displayed, the hidden field is parsed and the two select controls are set accordingly. This works pretty well, although I think I could have done a better job of modularizing this code in case I need to use it in another place on the site (such as in a user’s profile). I will definitely do this later.

I’ve decided to tackle recurring events later, as it seems a bit involved, and as I stated, very few events on the calendar need this capability. So with the time zone picker in place, and the corresponding code on the server side (thanks to pytz), I can now accurately add events to the event calendar without losing local time information.

I also sat down finally and converted The Madeira’s website from mod_python to mod_wsgi. This wouldn’t have been possible without the excellent documentation that mod_wsgi has. I feel this will scale better, and it will allow me to more easily run multiple Python web applications side by side. I am anxious to get a Trac issue tracker running as well as a beta version of the new site.

The rest of the weekend was spent working the “to-do” list for the site in preparation for deploying a beta version. I really do need to get an issue tracker going to capture all the ideas and work I need to complete.

Tags: , , , , ,

18 Mar 09 Event Calendar: Oh yeah, time zones…

Very shortly after I wrote the last blog entry I began having some nagging doubts about time zones. The current PHP version of the calendar is really time zone agnostic. It is assumed that when you see an event for California, for example, that you understand the time for the event is local to the Pacific time zone. I was kind of hoping to dodge this problem, but it seems unavoidable now that I am using Google calendar for the back-end.

The Google calendar has a time zone associated with it that I set when I created it. I am using my own local time zone: Central Standard Time, aka CST, aka GMT-6. Well, currently we are in daylight savings time (ack), so it is actually CDT, or GMT-5. When I initially added events to Google calendar, I naively just added them without providing any time zone information. Being somewhat new to Python, I haven’t totally come to grips with how Python deals with time zones, and I was just hacking with my blinders on. Well, needless to say, this didn’t work well, as Google simply assumed I was providing UTC times. Thus when they were displayed on my Google calendar, they had a nice 5 hour offset. Oops. Okay, to “fix” that, I peeked at django-cal, and noticed it was using a Django utility to provide the required time zone information. Using that technique, I got my events to show up in the correct local time on my Google calendar. Hooray!

But wait, that isn’t the whole story. Users can look at my Google calendar, and choose to copy events to their own Google calendars, which may have different time zones. Therefore, if a user submits an event to my calendar that takes place in California, my current code sets the time zone incorrectly. However this isn’t obvious as the time on my calendar will still be the same as the user entered it. But, if that user chooses to copy the event back to her California-based calendar, the event will suddenly be off by 2 hours. Crap!

So it looks like I do need to come to terms with time zones, and bite the bullet and ask the user what time zone the event is in. I am currently studying how Python handles time zones, and looking at pytz, a nice Python interface to the famous Olson database of timezones. Do I need to build some kind of time zone picker? I’m not finding a lot of those, which makes me wonder. In any event, this is seriously complicating my event calendar. But it is pretty interesting stuff and I hope I can get it right, as I really do want the events on my calendar to reflect accurate local times.

Tags: , , , ,