Making your own "translator" to turn English text into a fantasy language

  • bentelk
  • 10/31/2010 06:36 PM

For my game Small World I have the player play as a cat. As a cat, you cannot understand what humans are saying, so from your point of view the humans are speaking a foreign language. I saw three options for dealing with this in-game:

1. disallow the player from interacting with humans
2. replace human dialog with a description, such as "*words, words, words*" or "(He says something to you.)"
3. present human dialog in a way that the player cannot understand

I threw out option 1 immediately. Humans and cats interact in real life, and I wanted humans to respond to the player in the game, often verbally. For other games, however, option 1 might be entirely appropriate.

Option 2 I went with initially, but as time wore on I felt it was kind of lazy. I also realized that I wanted players to have some idea of what the humans were saying... was it a question? A loud statement? Was it something short and quick, or something longer? While these things could be described, any descriptions I could come up with seemed awkward. (ex: "The human says something briefly. It seems confused." might have described "Oa goam?" in Small World.)

This lead me to option 3: present dialog to the player, but in a way that the player can't understand.

There are a couple routes that could be taken with this: you could mash out some random gibberish for each NPC, or you could invent your own language. Mashing out random gibberish wouldn't be consistent from NPC to NPC, however, and inventing your own language is very time consuming! Fortunately, there's another option: create an algorithm that turns English text into a made-up language. With a little work, you can even be sure that this algorithm creates text which is still pronounceable.

Before getting started, I want to note: though this article assumes we are translating English text, it is of course entirely possible to create a translator from any language. However since this article is in English, and I wrote my game in English, I will refer to the English language throughout the article.

===Benefits and Considerations===

Benefits to this approach:
1. Since you, the author of the game, will write English text for each character, the characters ARE actually saying something. This is a useful tool for the writer (it gives the NPCs some definition), and also gives the dialog a natural look. Some dialog will be short; some long. The presence of question marks, commas, etc, will also give the player a feeling for what that NPC is saying, even if they don't know what exactly they're saying.
2. Consistency, as mentioned earlier. Besides looking nice to the player, this consistency also allows you to create multiple languages for your game, which players will be able to identify upon reading, even though they can't tell what that NPC is saying. This can be used to build story (a human speaking elvish?) or as a puzzle element: perhaps the game "translates" NPC text to English if the party contains a member who knows the language; if the player has access to more characters than the party can hold, then the player who recognizes an NPC as speaking "elvish" knows to recruit the elf character to their party if they want to interact with that NPC.
3. Speed. To get benefit 1 or 2, you'd usually have to create an original language by hand. By making this "translator", you can get dialog quickly.

1. This creates some bookkeeping for you, the designer. For each NPC, you need two copies of the dialog. If the copies are both in-game (the NPC uses one or the other, depending on whether you can understand him/her; or the "translator" function is built in to the game engine!) this isn't much extra work at all. However if the player can never understand the NPC (such as in Small World), you should keep a copy of each NPC's dialog somewhere else for future reference.
2. The consistency of the text also means your players may be able to decipher some of the NPC dialog on their own. If you want to be sure that players can NOT know what the NPC is saying, then this approach may not be the best. In the case of Small World, this was not a problem for me: the human NPCs never say anything important, so if a player deciphers the text, game play cannot be ruined.
3. Your invented language will not be as rich as a language invented by hand. Grammar rules, for example, often differ from language to language; some languages pay closer attention to relative social status when one person addresses another; etc. These are all aspects that an algorithm is not likely to capture.

===Designing the Language===

A first approach might be to make a simple letter-to-letter translation - ex: "t" becomes "x", "h" becomes "g", etc. Making sure that the end result is pronounceable, however, and not just a string of random letters, is not possible with this method. Even if you replace vowels with vowels and consonants with consonants, there are enough common words where several vowels - or worse, several consonants - follow each other, making an ugly mess. For example, if "t" becomes "x", and "h" becomes "g", then any word with "th" in it is already in trouble! Fixing this particular issue is likely to create issues with other combinations.

A better approach is to replace sounds in words. For example, "th", "sh", "ch", "qu", "igh", "oo", "ay"... For Small World, I identified the following "sounds". This list is probably not complete, but it served my purposes (a small, dialog-light game):

'ough', 'igh', 'eau', 'ou', 'oo', 'ee', 'i', 'ay', 'oh', 'eh', 'oa', 'uh', 'u', 'e', 'a', 'o', 'y', 'ow', 'th', 'sh', 'qu', 'ch', 'ck', 'ph', 'ng', 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'r', 's', 't', 'v', 'w', 'x', 'z'

Once you've identified the complete list of sounds you will translate, creating a new "language" is a matter of deciding what each sound will be translated into. This will require some testing - once the translator algorithm is complete - to get right. Vowel sounds - such as "oh" and "igh" - should translate into other vowel sounds, and consonant sounds - such as "th" and "g" - should translate into other consonant sounds, but as you experiment you will find that some of your translations may consistently produce unpronounceable words when combined with other translations. Often this problem is caused because a particular combination of letters that makes a single sound was not accounted for. This can be fixed by creating a new replacement rule and/or adjusting existing rules.

===The Algorithm Itself===

The trick in writing the algorithm is to make sure that you only pass over the string to translate once. If you instead loop through all of your replacements, applying them to the string one after another, you will end up replacing your replacements with further replacements! ex: suppose you are translating "there", and the first replacement is "th" -> "k"... now you have "kere", but what happens when you later get to the rule for replacing "k"?

Again, rather than looping through the replacements, applying them to the string, you want to go through the string, identifying replacements and applying them, then moving on to the next part of the string.

I am most familiar with PHP, and in PHP this algorithm has already been implemented with the built-in function "strtr". Using strtr, however, you cannot preserve capitalization, so I created a custom function which preserves capitalization. While this was not strictly necessary, I found it made my bookkeeping easier.

That function is:

function translate($text)

$from = [ 'ough', 'igh', 'eau', 'ou', 'oo', 'ee', 'i', 'ay', 'oh', 'eh', 'oa', 'uh', 'u', 'e', 'a', 'o', 'y', 'ow', 'th', 'sh', 'qu', 'ch', 'ck', 'ph', 'ng', 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'r', 's', 't', 'v', 'w', 'x', 'z' ];
$to = [ 'ing', 'eau', 'y', 'ay', 'eh', 'u', 'o', 'ow', 'ee', 'oo', 'ough', 'ou', 'oh', 'a', 'oa', 'i', 'oo', 'uh', 'l', 'j', 'g', 'p', 'p', 's', 'um', 'ck', 'g', 'c', 'm', 'v', 'r', 'p', 'sh', 'w', 'ph', 't', 'z', 'k', 'n', 'm', 'th', 'ch', 'x', 'qu' ];

for($x = 0; $x < strlen($text); ++$x)
// the number 4 represents the longest sound we can replace (in this case, "ough")
// if you identify longer sounds, this number must be changed
for($y = 4; $y >= 1; --$y)
// take a piece of the string...
$piece = substr($text, $x, $y);

// ... and search to see if this string is a sound we replace.
$i = array_search(strtolower($piece), $from);

// if it is:
if($i !== false)
// get the replacement
$replace = $to[$i];

// if the original was capitalized, capitalize the replacement
if(strtolower($piece) != $piece)
$replace = ucfirst($replace);

// insert the replacement
$text = substr($text, 0, $x) . $replace . substr($text, $x + $y);

// skip over the replaced text, so we do not replace it!
$x += strlen($replace) - 1;

return $text;


I hope this inspires more use of invented languages in games. They are certainly not appropriate for every game, but when used correctly I believe they can add depth to the game and the game world.


Pages: 1
who am i and how did i get in here
I might use this in "The Hunt" you play as a wolf and ive been trying to think of how to add people.
Pages: 1