I spent a while this week looking into how best to automatically hyphenate text on websites, to improve Today’s Guardian. I couldn’t find anything recent that summarised the options, so here’s a quick run-down of what I discovered.
Phil Gyford: Hyphenation on the web
In 1997, when I was first hired at New York magazine, Kurt Andersen, now a best-selling novelist and radio-show host, had just been fired as editor. Everybody was grieving about this, though not me, since I wouldn’t have had a job there otherwise. And though it wasn’t until years later that I even met Kurt, he unwittingly left me a gift: tacked to the bulletin board in the office I took over was a single page titled ‘Words We Don’t Say’. It contained, as you might surmise, words and phrases that Kurt found annoying and didn’t want used in his magazine. Just yesterday, I rescued it from a bunch of old office stuff that I was throwing out, and I have to say, 14 years later, it’s still a pretty useful list of phony-baloney vocabulary that editors are well-advised to excise from stories.
The 6th Floor: Words we don’t say
We see it every day on signs, billboards, packaging, in books and magazines; in fact, you are looking at it now the Latin or Roman alphabet, the worlds most prolific, most widespread abc. Typography is a relatively recent invention, but to unearth the origins of alphabets, we will need to travel much farther back in time, to an era contemporaneous with the emergence of (agricultural) civilisation itself.
I love typography: The origins of abc
Local councils have been warned over a slew of jargon that baffles ordinary people, but why do they love to obfuscate?
The Local Government Association’s list of 100 words that should not be used in communication with the general public makes for alarming reading.
BBC News: Why do councils love jargon?
Google is usually great for helping sort out uses of English, so you can check the difference between a pedaller and a peddler — though that doesn’t stop Guardian journalists getting it wrong, of course. But there are times when the majority of people get things wrong. In today’s Guardian, Patrick Barkham reports that “according to the Oxford English Corpus, a database of a billion words, dozens of traditional phrases are now more commonly misspelled than rendered correctly in written English.”
The Guardian: Watch your language — most of you are wrong
The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.
Porter Stemming Algorithm
This page includes English translations of less common Latin phrases (i.e., not always found in dictionaries), some of which are themselves translations from Greek.
Wikipedia: List of Latin phrases