I’ve been thinking a lot recently about human readable URLs. In the event that you’ve been living under a rock for the last 7 years or so, human readable URLs are where you setup your web application to have URLs that look more like
http://distinctpixel.org/entries/view/2007/05/pure_love_wines_layer_cake
vs. something more machine readable like:
http://distinctpixel.org/index.aspx?mode=view&entryId=301
There’s an older article by AdaptivePath article that does a fine job explaining it in some detail, but, in a nutshell: readable URL’s are better for SEO, usability, and helping stop global warming.
I’ve really struggled with good URL formatting for a long time. Since my background is ColdFusion/ASP/.NET, most of the web applications I’ve built have run in Windows hosting environments. And, since there’s no built-in URL rewriting in IIS that makes doing this quite a bit more difficult1.
In the past, I’ve written solutions using ASP.NET where I would use a 404 page that would parse the original request, and route based on that. Unfortunately, this method resulted in at least 2 or 3 HTTP 302 redirects (since I that’s what Response.Redirect() does) So while you could use readable URLs, once the requested was completed you’d end up at an ugly old URL.
Well, now that I’m playing with rails, I’ll have to admit that I’m completely tickled that “out of the box” it comes with a readable URL setup, even if it does put the emphasis on the model’s (somewhat unfriendly) ID.
In an effort to extend it some, I’ve recently made some additional changes to this blog engine where it uses the date and title in the URL as opposed to just the ID.
My initial plan was to make a DB migration that just churned through all the entries and created a new URL column with a copy of the title. Obviously, we’d need to put a unique constraint on it, and there would need to be manipulation done to the titles to make them so they work (remove spaces, slashes, periods, question marks, etc).
All said and done, an entry with the title “This is a test” would get “/entries/year/month/this_is_a_test” for it’s URL. Great!
Except for one hitch.
Looking through the DB, it became clear very quickly that I tend use a lot of high ASCII characters in my titles (stupid mac).
Ok. No big deal. We’ll just zap any character in the title that isn’t A-Z, a-z, or 0-9. In fact, when writing a wine review last night I’ve noticed that this is exactly what Cork’d has started doing—just escaping any “wacky” chars to underscores. Problem solved, right?
Well. Kind of.
Being the old cranky guy that I am, seeing something like Ch_teauneuf_du_Pape in the URL really bothered me. Where’s the damned a? I suppose you could rewrite various characters without their accents, but suddenly that becomes a really long list of rules—circumflex alone has 21 possible characters (Â, â, Ê, ê, Î… etc.)
So, for now, I’ve decided to make it a manual process. New entries going forward will have edited friendly URLs, but older entries (this goddamn blog dates back to 2001) will continue to use the ID.
Anyway, like I said, it’s something I’ve been thinking abut lately.
1 There are handful of IIS filters out now that allow you to do rewriting, but a) Loosely documented 3rd party ISAPI filters make me nervous, and b) unless you’re on a dedicated windows box, you’re outta luck.