Something strange happened that I wish to document in case it helps others. I had to reboot my Ubuntu server while troubleshooting a disk problem. After the reboot, I began receiving internal server errors whenever someone tried to view a certain forum thread on my Django powered website. After some detective work, I determined it was because a user that had posted in the thread had an avatar image whose filename contained non-ASCII characters. The image file had been there for months, and I still cannot explain why it just suddenly started happening.

The traceback I was getting ended with something like this:

File "/django/core/files/storage.py", line 159, in _open
return File(open(self.path(name), mode))

UnicodeEncodeError: 'ascii' codec can't encode characters in position 72-79: ordinal not in range(128)

So it appeared that the open() call was triggering the error. This led me on a twisty Google search which had many dead ends. Eventually I found a suitable explanation. Apparently, Linux filesystems don't enforce a particular Unicode encoding for filenames. Linux applications must decide how to interpret filenames all on their own. The Python OS library (on Linux) uses environment variables to determine what locale you are in, and this chooses the encoding for filenames. If these environment variables are not set, Python falls back to ASCII (by default), and hence the source of my UnicodeEncodeError.

So how do you tell a Python instance that is running under Apache / mod_wsgi about these environment variables? It turns out the answer is in the Django documentation, albeit in the mod_python integration section.

So, to fix the issue, I added the following lines to my /etc/apache2/envvars file:

export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'

Note that you must cold stop and re-start Apache for these changes to take effect. I got tripped up at first because I did an apache2ctrl graceful, and that was not sufficient to create a new environment.


Comments

comments powered by Disqus