Well I’ve been doing some reading and thinking, and I think I have mentally prepared myself to re-image the server with Ubuntu 8.04 LTS. Everything seems to be manageable without Plesk. I did some reading, and discovered the default Mail Transfer Agent (MTA – I learned a new acronym!) in Ubuntu is Postfix. I started reading their online documentation, and at first I found it quite inscrutable. I kind of panicked and ordered a book on Postfix. But after reading the docs some more, it is starting to make sense, and seems quite easy to configure it to just forward email. I’m glad I bought the book though. It hasn’t arrived yet, but I’m sure it will be handy down the road. I don’t currently have a need for locally hosted mailboxes, spam filters, etc. One thing I can definitely appreciate: administering mail for a large network is very complicated. My use case is pretty simple though, I hope.
I did some poking around on my Ubuntu equipped laptop. Configuring Apache should be easy. They made support for virtual hosts very easy right out of the box by how they arranged the configuration files.
So right now I am taking some notes about what to backup, making lists of databases, database users, collations, etc., all in preparation for RI-Day (Re-Imaging Day). I’ve got lots of stuff to juggle. I surprised myself by how much custom stuff I added in little over a year, and it will take some time to get it set back up on the new server. Therefore I am going to have to give myself a lot of time to do this. I’m thinking Memorial Day weekend would be a good target as I have some additional days off around then.
I think it will be a very good step to move to Ubuntu and I’m looking forward to it! I’m curious as to how my host has customized the image. Surely they have some things like DNS and SSH configured, otherwise no one could remotely login and administer it out of the box. But did they also setup mail? I guess I’ll find out.
I’ve been renting a dedicated server from 1&1 Hosting (yes that’s my affiliate link) for a little over a year now. I’ve been very happy with it, it has allowed me to run several websites, host an IRC server, a TeamSpeak server, Subversion, and Trac. I even very briefly ran a Call of Duty server on it! It is currently running Fedora Core 6, which was old even when I got it a year ago. I’ve been itching to re-image it to something that is supported, and I think I’m finally going to do it. It makes sense to get this done before I go live with SG101 2.0. The only question now is what OS should I go with?
A year ago there wasn’t very many OS options with 1&1. Now, they have 17 variants, including CentOS, openSUSE, Debian, and Ubuntu, with various permutations of the Plesk control panel. I think the safe choice here is to switch to CentOS 5 with Plesk 9. This distribution is currently very similar to FC6 and comes with a newer version of Plesk, which I rely on a lot. However, it would be very cool to run Ubuntu 8.04 LTS, since that is also what I am developing with on my laptop. I think it would minimize surprises to have the same OS on both production and development. And, I’d really like to use at least Python 2.5. FC6 and CentOS 5 are still using Python 2.4.
Unfortunately, 1&1 does not offer Plesk with this configuration. Hmmm, can I take the training wheels off and manage a server without Plesk? Here are the things that Plesk makes easy for me with their web GUI:
So the question becomes, can I manage these things without Plesk? Let’s see:
So after thinking long and hard about this, I think going with Ubuntu will be a good choice for the long term. Having similar production and development environments will be important. Ubuntu 8.04 LTS still has (I think) about 4 years left for server support, and there should be a migration path to the next LTS release.
However leaving Plesk behind will require me to become more of a server geek. This, I don’t actually mind, as I am enjoying learning all this stuff slowly over time. I can practice some of these things on my laptop before I make the big switch. I need to figure out the mail server thing, and doing a dry run of moving the MySQL databases from production to my development server.
Another important question is when to do this? Ideally I will want to give myself plenty of time, perhaps over a long weekend. It will mean downtime for the current hosted websites. And it will probably be pretty nerve wracking.
I’ll be thinking about these things and blogging about them over the course of the next few days.
Based on this blog post by Django co-BDFL Jacob Kaplan-Moss, I wanted to try using html5lib to sanitize user input. I’m using Markdown on most of the site. But in one particular place (news items), I am (currently) allowing users to submit HTML news stories with the TinyMCE Javascript editor. This is mainly because my users like to copy and paste content from sites like MySpace, and TinyMCE might be easier for them to use than Markdown. I may revisit this decision, but for now we’ll go with it.
I was using the lxml sanitizer for this purpose. But because of the high praises html5lib received from Jacob, and from studying the source code to both, html5lib gives me greater confidence, even if it is an order of magnitude slower. But, it isn’t like this is going to get used more than a few times a day, so that isn’t a concern.
Never having used html5lib, or any other HTML/XML parser before, it was a bit confusing to figure out how to use it for this task. After studying the code and the html5lib news group, I came up with the following bit of code I thought I would share. Comments are extremely welcome.
import html5lib
from html5lib import sanitizer, treebuilders, treewalkers, serializer
def sanitizer_factory(*args, **kwargs):
san = sanitizer.HTMLSanitizer(*args, **kwargs)
# This isn't available yet
# san.strip_tokens = True
return san
def clean_html(buf):
"""Cleans HTML of dangerous tags and content."""
buf = buf.strip()
if not buf:
return buf
p = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("dom"),
tokenizer=sanitizer_factory)
dom_tree = p.parseFragment(buf)
walker = treewalkers.getTreeWalker("dom")
stream = walker(dom_tree)
s = serializer.htmlserializer.HTMLSerializer(
omit_optional_tags=False,
quote_attr_values=True)
return s.render(stream)
I haven’t tested it extensively yet, but it seems to do the trick. I understand a future version of html5lib will have an option to strip completely out offending tags. Right now they are simply rendered harmless and remain in the input (via < and >). This is fine, as I can see them in the admin as I review submitted stories.
Tags: django, html5lib, sanitization, sg101
Check out this ticket! Granted, it is just a typo fix to a Python docstring, but you got to start somewhere.
What a great feeling in any event.
I hope I can contribute more meaningful features and bug fixes in the future.
Tags: django
I’ve been wanting to get some kind of issue tracker up and running for some time now. Trac seems like a great choice. We’ve used it where I work, and the Django project uses it. I even managed to install it on Windows at work. Still, I was kind of dreading trying to get it working on the dedicated server I rent. I finally gathered the strength and tackled this problem this weekend, and it went far easier than I imagined.
First of all, I decided I might as well upgrade my Subversion (SVN) server while I am at it. I see that Subversion 1.6 is out now. However, reading the fine print, I noticed that they seemed to have changed their Python bindings in 1.6, and I wasn’t sure if Trac is compatible with this. So without doing any further research I decided to just run the last stable version before that, 1.5.6.
My dedicated server is running Fedora Core 6, which isn’t maintained anymore, so there is no way to my knowledge of getting a binary package for these recent builds of SVN. I need to build from source. I had done this once before, and I even took detailed notes (which I had forgot about). Building from source is fairly easy, but there is one gotcha on the AMD64 server I run, you need to invoke the configure script with an –enable-shared switch. Luckily I wrote this down from the first time I did this. Getting the required dependencies for the source build isn’t too hard. The Subversion folks helpfully package some of the less readily available dependencies, so it is just a matter of grabbing them and untarring them on top of the unpacked source tarball.
Since I wanted to integrate Trac with my Subversion repository, I needed to ensure I built the Python Subversion bindings. I used Yum, the package manager that comes with Fedora, to make sure I had SWIG installed before I ran configure to build SVN. Then it is a simple matter of building the Python SWIG bindings after Subversion proper is built. This is explained very well in the Subversion documentation.
This seemed to go well, although I had a minor heart attack when Apache crashed the first time I tried to restart it with the new SVN in place. Another restart and it was fine. Hmmm. In short order I had upgraded my existing repository and things seemed to be working fine.
I then created a new subdomain to host my issue tracker. I rely on the Plesk control panel to do this lifting for me. It came installed with the server, and I rely on it heavily to configure Apache, the mail server, etc. I’m not a hard core server admin, so this is a big help. Although I can see the day when the training wheels can come off as I become more familiar with Linux and these tools. I can sort of see what Plesk is doing by examining the config files it creates and it doesn’t appear to be rocket science. Still, it is a big time saver for me.
To get Trac installed requires getting all the dependencies in place first. In most cases, I was was able to use Yum to get most of the dependencies in binary form from the Fedora repository. Despite the fact that Fedora Core 6 is pretty old, the version numbers of the dependencies in the repository were still compatible with the newest version of Trac. The one notable exception was the template engine Trac uses, Genshi. In this case a simple “easy_install Genshi” did the trick. Nice.
I might have been able to easy_install Trac, but the docs say that this only works for Python 2.5 and 2.6. I’m still running 2.4 on the dedicated server. Upgrading my OS is definitely on the long term to-do list, but I must take baby-steps for now. But it was a simple matter of grabbing the Trac tarball, untarring it, and doing the usual “python setup.py install”. It went flawlessly.
Now luckily I had setup Trac at work before, so I already knew what to do. I ran the command-line Trac admin tool to create a project and tied it to my new Subversion repository. Trac comes with a development server, and I ran that after configuring the project. I could then point my browser at my server and see my new Trac project for the first time! Things are cooking at this point.
Of course I can’t use the development server for real work. So the next step was to get Apache to serve my Trac project. I once again chose mod_wsgi as the deployment method, after just recently converting The Madeira site from mod_python to mod_wsgi. The mod_wsgi documentation is excellent, and a wiki page covers integrating Trac and mod_wsgi in great detail. After studying the docs for a short while I had the magic Apache configuration down. I restarted Apache, and once again I was amazed that things were working on the first try. I had been pretty lucky so far. (In fact the most trouble I had that day was trying to change the logo on the Trac site!)
I was now ready to configure my Trac project and get my new Subversion repository loaded. I had an existing Subversion repository that I was doing all my work in. However I had checked in some settings files that contained database password information. Shortly after realizing this I just locked the whole repository down. Since then, I have learned the Django settings.py and local_settings.py trick, and have placed the sensitive information in the local_settings.py file (which is not controlled in SVN). Now I can have a public read-only repository again.
So here it is, ready for beta testing: http://code.surfguitar101.com. Now there isn’t anything stopping me; I have to do the real work of deploying a beta version of SG101 2.0 for testing and feedback.