<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>Death of a Gremmie</title>
    <link>http://deathofagremmie.com</link>
    <description>Brian Neal's blog about programming.</description>
    <pubDate>Sat, 21 Jan 2012 05:35:56 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>Django Uploads and UnicodeEncodeError</title>
      <link>http://deathofagremmie.com/2011/06/04/django-uploads-and-unicodeencodeerror</link>
      <pubDate>Sat, 04 Jun 2011 20:00:00 CDT</pubDate>
      <category><![CDATA[Python]]></category>
      <category><![CDATA[Linux]]></category>
      <category><![CDATA[Unicode]]></category>
      <category><![CDATA[Django]]></category>
      <guid isPermaLink="true">http://deathofagremmie.com/2011/06/04/django-uploads-and-unicodeencodeerror</guid>
      <description>Django Uploads and UnicodeEncodeError</description>
      <content:encoded><![CDATA[<div class="document">
<p>Something strange happened that I wish to document in case it helps others.  I
had to reboot my Ubuntu server while troubleshooting a disk problem. After the
reboot, I began receiving internal server errors whenever someone tried to view
a certain forum thread on my <a class="reference external" href="http://djangoproject.com">Django</a> powered website. After some detective work,
I determined it was because a user that had posted in the thread had an avatar
image whose filename contained non-ASCII characters. The image file had been
there for months, and I still cannot explain why it just suddenly started
happening.</p>
<p>The traceback I was getting ended with something like this:</p>
<div class="highlight"><pre><span class="n">File</span> <span class="s">&quot;/django/core/files/storage.py&quot;</span><span class="p">,</span> <span class="n">line</span> <span class="mi">159</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_open</span>
<span class="k">return</span> <span class="n">File</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">path</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="n">mode</span><span class="p">))</span>

<span class="ne">UnicodeEncodeError</span><span class="p">:</span> <span class="s">&#39;ascii&#39;</span> <span class="n">codec</span> <span class="n">can</span><span class="s">&#39;t encode characters in position 72-79: ordinal not in range(128)</span>
</pre></div>
<p>So it appeared that the <tt class="docutils literal">open()</tt> call was triggering the error. This led me on
a twisty Google search which had many dead ends. Eventually I found a suitable
explanation. Apparently, Linux filesystems don't enforce a particular Unicode
encoding for filenames. Linux applications must decide how to interpret
filenames all on their own. The Python OS library (on Linux) uses environment
variables to determine what locale you are in, and this chooses the encoding for
filenames.  If these environment variables are not set, Python falls back to
ASCII (by default), and hence the source of my <tt class="docutils literal">UnicodeEncodeError</tt>.</p>
<p>So how do you tell a Python instance that is running under Apache / <tt class="docutils literal">mod_wsgi</tt>
about these environment variables? It turns out the answer is in the <a class="reference external" href="https://docs.djangoproject.com/en/1.3/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror">Django
documentation</a>, albeit in the <tt class="docutils literal">mod_python</tt> integration section.</p>
<p>So, to fix the issue, I added the following lines to my <tt class="docutils literal">/etc/apache2/envvars</tt>
file:</p>
<div class="highlight"><pre><span class="nb">export </span><span class="nv">LANG</span><span class="o">=</span><span class="s1">&#39;en_US.UTF-8&#39;</span>
<span class="nb">export </span><span class="nv">LC_ALL</span><span class="o">=</span><span class="s1">&#39;en_US.UTF-8&#39;</span>
</pre></div>
<p>Note that you must cold stop and re-start Apache for these changes to take
effect. I got tripped up at first because I did an <tt class="docutils literal">apache2ctrl
graceful</tt>, and that was not sufficient to create a new environment.</p>
</div>
]]></content:encoded>
    </item>
  </channel>
</rss>

