We were having a conversation recently at work about server naming conventions and it reminded of an article where the author was using a mnemonic word list to name servers. After a little digging, I was able to track it down.
Index
- Index
- A proper server naming scheme
- Oren Tirosh’s Mnemonic Encoding Project
- The word list
- Save this for future reference
A proper server naming scheme
The original post was A proper server naming scheme over at mnx.io. It is worth a read, but some of the links are outdated. It looks like the original mnemonic encoding project (from Oren Tirosh) that was referenced has vanished. That was the project that contained the word list that I was looking for.
Oren Tirosh’s Mnemonic Encoding Project
I was able to track down a copy of the mnemonicode project. It looks like Oren compiled a list of 1626 words that could be used to encode or decode information. The words have been chosen to be easy to understand when spoken over the phone.
From the readme
Mnemonic tries to be selective about its word list. Its criteria are thus:
Mandatory Criteria:
- The wordlist contains 1626 words.
- All words are between 4 and 7 letters long.
- No word in the list is a prefix of another word (e.g. visit, visitor).
- Five letter prefixes of words are sufficient to be unique.
Less Strict Criteria:
-
The words should be usable by people all over the world. The list is far from perfect in that respect. It is heavily biased towards western culture and English in particular. The international vocabulary is simply not big enough. One can argue that even words like “hotel” or “radio” are not truly international. You will find many English words in the list but I have tried to limit them to words that are part of a beginner’s vocabulary or words that have close relatives in other european languages. In some cases a word has a different meaning in another language or is pronounced very differently but for the purpose of the encoding it is still ok - I assume that when the encoding is used for spoken communication both sides speak the same language.
-
The words should have more than one syllable. This makes them easier to recognize when spoken, especially over a phone line. Again, you will find many exceptions. For one syllable words I have tried to use words with 3 or more consonants or words with diphthongs, making for a longer and more distinct pronunciation. As a result of this requirement the average word length has increased. I do not consider this to be a problem since my goal in limiting the word length was not to reduce the average length of encoded data but to limit the maximum length to fit in fixed-size fields or a terminal line width.
-
No two words on the list should sound too much alike. Soundalikes such as “sweet” and “suite” are ruled out. One of the two is chosen and the other should be accepted by the decoder’s soundalike matching code or using explicit aliases for some words.
-
No offensive words. The rule was to avoid words that I would not like to be printed on my business card. I have extended this to words that by themselves are not offensive but are too likely to create combinations that someone may find embarrassing or offensive. This includes words dealing with religion such as “church” or “jewish” and some words with negative meanings like “problem” or “fiasco”. I am sure that a creative mind (or a random number generator) can find plenty of embarrassing or offensive word combinations using only words in the list but I have tried to avoid the more obvious ones. One of my tools for this was simply a generator of random word combinations - the problematic ones stick out like a sore thumb.
-
Avoid words with tricky spelling or pronunciation. Even if the receiver of the message can probably spell the word close enough for the soundalike matcher to recognize it correctly I prefer avoiding such words. I believe this will help users feel more comfortable using the system, increase the level of confidence and decrease the overall error rate. Most words in the list can be spelled more or less correctly from hearing, even without knowing the word.
-
The word should feel right for the job. I know, this one is very subjective but some words would meet all the criteria and still not feel right for the purpose of mnemonic encoding. The word should feel like one of the words in the radio phonetic alphabets (alpha, bravo, charlie, delta etc).
The word list
Here is a sample from the middle of the list:
lobster local logic logo lola london
lucas lunar machine macro madam madonna
madrid maestro magic magnet magnum mailbox
major mama mambo manager manila marco
marina market mars martin marvin mary
master matrix maximum media medical mega
melody memo mental mentor mercury message
metal meteor method mexico miami micro
milk million minimum minus minute miracle
mirage miranda mister mixer mobile modem
modern modular moment monaco monica monitor
mono monster montana morgan motel motif
motor mozart multi museum mustang natural
neon nepal neptune nerve neutral nevada
news next ninja nirvana normal nova
novel nuclear numeric nylon oasis observe
ocean octopus olivia olympic omega opera
optic optimal orange orbit organic orient
origin orlando oscar oxford oxygen ozone
pablo pacific pagoda palace pamela panama
pancake panda panel panic paradox pardon
paris parker parking parody partner passage
I have reproduced the entire word list here and on GitHubGist to make it easy to find. It was extracted from the mn_wordlist.c file in the project.
Craig G’s MnemonicEncodingWordList Project
After I published this post, Craig created a small project in PowerShell to work with this list. In that project he has the full list in a json document.
Save this for future reference
I don’t have an immediate need for this list, but it took be a bit longer to find it than I expected. I know I could use any word list but this one was carefully crafted and I wanted to preserve it. I hope you find a valuable use for it. I know I am saving this one for later.