2019-05-25 15:30 — By Erik van Eykelen
You’ve probably noticed that many article titles use stylistic formatting called “title casing”. Recently I wanted to add a titlecase
method to the Msgtrail static blog engine. Quickly I realized that title casing is harder than I thought!
Let’s begin with a few examples to demonstrate some edge cases:
Notice that:
at
and to
are not title-cased;Is
is title-cased;Of
is title-cased, but only because it’s the last word.Or take this example:
Notice that:
/var/run
and /boot
remain untouched;before/after
becomes Before/After
.See here for additional test cases.
While researching the topic I ran into an article by John Gruber about title casing. His article points to a Perl script which he uses to title-case the articles of his (magnificent) blog. The article also points to implementations in other languages, including Ruby.
I looked at several implementations in order to understand the rule set:
I ended up writing my own implementation which has about 40 lines of code and passes all tests.
My implementation is a fraction faster than “titlecase” and 3x faster than “titleize”. Benchmarking a run on 10.000 English sentences yields:
It was fun to write this code because it was a challenge to make it fast, readable, and pass all test cases. I am planning to turn the code into a Ruby gem. I have released a Ruby gem based on this implementation: https://github.com/evaneykelen/nicetitle.