The case of the missing forum topic

I use the awesome Haystack search framework for my Django powered website. I have found Haystack to be a huge win. It is easy to setup, configure, and customize when you have to. As someone who doesn't know very much about the world of searching, I'm grateful to have a powerful tool that just works without me having to get too involved in arcane details.

One day one of our users noticed that he could not find a forum topic with the title "Hawaiian" sounding chords. Notice the word Hawaiian is in quotes. The topic would turn up if you searched for sounding or chords. But no combination of Hawaiian, with or without quotes would uncover this topic.

I should mention I am using the Xapian backend. I know the backend tries to remove puncuation and special characters to create uniform searches. But I could not figure out where this was getting dropped at. After a bit of searching online, I found a few hints which led to the solution.

Safety versus correctness

As suggested in the documentation, I am using templates to build the document used for the search engine. My template for forum topics looked like this:

{{ object.name }}
{{ object.user.username }}
{{ object.user.get_full_name }}

A mailing list post from another user suggested the problem. Django by default escapes text in templates. Thus the forum topic title:

"Hawaiian" sounding chords

was being turned into this by the Django templating engine:

"Hawaiian" sounding chords

Now what Haystack and/or the Xapian backend were doing with "Hawaiian" I have no idea. I tried searching for this unusual term but it did not turn up any results. Apparently it is just getting dropped.

The solution was to modify the template to this:

{{ object.name|safe }}
{{ object.user.username|safe }}
{{ object.user.get_full_name|safe }}

But is it safe?

After changing my template and rebuilding the index, the troublesome topic was then found. Hooray! But have I just opened myself up to a XSS attack? Can user supplied content now show up unescaped in the search results? Well I can't answer this authoritatively but I did spend a fair amount of time experimenting with this. I'm using Haystack's highlight template tag, and my users' input is done in Markdown, and I could not inject malicious text into the search results. You should test this yourself on your site.

Conclusion

This turned out to be a simple fix and I hope it helps someone else. I will make enquiries to see if this should be added to the Haystack documentation.


Comments

comments powered by Disqus