Greetings! Here at Allcorrect, we recently worked on a few projects that had procedurally generated text in them. We realized that it still remains a pretty mysterious process for a lot of people, so we decided to create a handy guide. This article aims to explain what procedural generation is, what types of generation exist, and how you can avoid the various pitfalls that are hidden in text generation.
Who Can Benefit from This Article?
Game developers who are looking to implement procedural generation of text in their project. Game publishers who are looking to onboard such a project and currently assessing the risk and complexities of its localization. And game producers who want to find out how best to prepare for such a project and what kind of coordination between the developer and localization team is required.
How Procedural Generation is Used in Games
Procedural generation is used in the industry in order to create large amounts of content in a game. Some examples are The Binding of Isaac (2D levels, random placement of monsters and loot), the Civilization series (the world map), the Borderlands series (weapons and upgrades), Star Dynasties (narrative), and Rogue Legacy (random levels, items).
Narrowing It Down: Text Generation
You might ask, “But what does this have to do with localization?” It’s true that a randomly generated level probably won’t drastically affect the localization of a game, but it becomes relevant when you start using procedural generation of the game text itself. We’ll cover two ways of generating in-game text. They differ in complexity, scale, and what they make possible.
The first way is a procedurally generated narrative, which is probably the most difficult thing to localize. And the second way is generation of the composed lines and things you can encounter in a game, such as items of different rarity/quality, varying attributes, and so on.
Procedural Text Generation
Procedural generation allows for dynamic and varied narratives, dialogue, quests, and other textual elements in games. By defining a set of narrative structures, rules, and variables, developers can generate unique quests or storylines for each playthrough, offering players a more personalized and diverse experience. This approach provides developers with the opportunity to create a wide variety of content in the game, effectively multiplying the available narrative options by combining different parts.
While procedural generation provides a multitude of variations and scenarios in game narratives, it is crucial to acknowledge the potential challenges it presents. The combinations of generated texts can occasionally appear awkward or unnatural, underscoring the need for a comprehensive pre-production stage to ensure adherence to proper grammar rules and narrative coherence. Moreover, it is imperative to prioritize localization efforts to ensure that all the mentioned generated texts are properly adapted to the target languages, ensuring seamless integration within the game.
Before we proceed to discuss the crucial aspects of ensuring well-crafted procedurally generated texts, it’s important to acknowledge the significance of grammar in the localization process of such texts. Grammar rules in different languages play a vital role not only in the localization of regular texts but also in the localization of procedurally generated content. Various factors, including gender, singular and plural forms, word order, case systems, and agreement rules, greatly influence the adaptation and translation of generated texts. For instance, in languages like Spanish, French, or German, translators must maintain gender agreement between nouns, articles, adjectives, and pronouns in the text. Sentence structures may differ across languages, necessitating the rearrangement of phrases or clauses to preserve proper grammar and meaning. Furthermore, languages with case systems introduce added complexity, requiring translators to ensure that the generated texts adhere to appropriate case forms and declensions for grammatical accuracy in the localized version. Taking these language-specific considerations into account is crucial for achieving a seamless and linguistically precise localization of procedurally generated texts in games.
When setting up rules and variables for generated texts, it is crucial to consider the peculiarities of grammar rules in the target languages, especially for localization purposes. Since developers may not be aware of all the rules in different languages, the system’s rules and variables for narrative generation should be flexible. A fixed set of rules might not work effectively for all languages, necessitating customization for each language’s specific grammar rules.
We localized Star Dynasties into German in 2021. With this game, we worked with the lead linguist and the developers to partially reinvent the variable system for German localization. Here is the list of key considerations for adjustable rules with examples from this game:
- Gender differentiation. In some languages, gender affects the spelling and structure of phrases. Developers should consider implementing rules to differentiate between male and female strings based on the speaker’s gender in the game. Note that gender-neutral translations may not always be possible due to language constraints.
This is an example of how this feature was customized in Star Dynasties.
English: “My magnificent, glorious {Up(title(PlayerCharacter))}, anything for you!” This is rendered as, “My magnificent, glorious Duke, anything for you!”
German: “{mf(Character, ‘Mein großer, glorreicher’, ‘Meine große, glorreiche’)} {Up(title(PlayerCharacter))}, für Euch tu ich doch alles!” This is rendered as “Mein großer, glorreicher Herzog, für Euch tu ich doch alles!” if the player’s character is male or, “Meine große, glorreiche Herzogin, für Euch tu ich doch alles!” if the player uses a female character. - Grammatical tenses. Languages may have variations in grammatical tenses, affecting how the third-person singular rule works or pronoun conjugations. Developers should create a system that allows for different rules for male and female lines to account for such variations.
Example:
English: “{Dhas(Assignee)} started the assignment.” This is rendered as, “Your father has started the assignment.”
German: “{Euer_hat(Assignee)} einen Auftrag begonnen.” This is rendered as, “Euer Vater hat einen Auftrag begonnen.” But if it’s the player’s character, it becomes, “Ihr habt einen Auftrag begonnen.” - Pronoun conjugation. Pronouns in different languages may require specific conjugation rules based on gender, number, and case. Developers should ensure that their system incorporates these rules accurately to generate appropriate pronoun forms.
English: “‘So, what do we do with these things?’ {he(Character)} asks,” is rendered as, “‘So, what do we do with these things?’ he asks.”
German: “‘Und was machen wir mit diesen Dingern?’ fragt {de_pron_nom(Character)},” is rendered as, “‘Und was machen wir mit diesen Dingern?’ fragt er.”
Here is also one more example to illustrate how the variables transfer not only the correct pronoun conjugation but also the correct gender of the character.
English: “{Character} cannot have any more children with their spouse anyway,” becomes, “Duke Casper Allen cannot have any more children with their spouse anyway.”
German: “{Character} kann keine weiteren Kinder mit {dat_m(Character)} Ehepartner haben,” becomes, “Herzog Casper Allen kann keine weiteren Kinder mit seinem Ehepartner haben.”
These functions can truly allow for all gender variables. For example, it can be possible to say, “{Character} cannot have any more children with {his(Character)} spouse anyway,” omitting the gender-neutral “their” to give the corresponding possessive pronoun of {Character}. However, German like many other languages (such as French) has no gender-neutral word for spouse. Therefore, it would make localization into other languages much easier to turn this sentence into, “{Character} cannot have any more children with {his(Character} {spouse(Character)} anyway,” which would allow for the German pronouns like his/her/its (German: sein/e or ihr/e) and gendered words like “spouse” (German: Ehemann or Ehefrau, i.e. husband or wife). - Singular and plural nouns grammar. The grammar rules for singular and plural nouns can vary across languages. Developers should consider these distinctions and design their system to generate grammatically correct forms for both singular and plural nouns.
English: “I heard that this is about our {title(RulerOf(System))}s,” (notice the “s” following the string to signify plural) becomes, “I heard that this is about our Dukes.”
German does not have a simple pluralization form like adding an “s.” As you can see from the following examples, there are multiple different ways to pluralize words: duke/dukes are Herzog/Herzöge, aunt/aunts are Tante/Tanten, uncle/uncles are Onkel/Onkels and mother/mothers are Mutter/Mütter. It is therefore difficult yet necessary to consider how the target language pluralizes nouns and its corresponding pronouns and articles and develop functions in the target language to allow for as many variations as possible. However, there will be instances where this is impossible, which requires translators and localizers to find natural workarounds without functions in the target. To use the example from above, this may be a possible workaround.
German: “Ich habe gehört, dass es um {RulerOf(System)} und {gen_p(RulerOf(System))} Leute geht,” becomes, “Ich habe gehört, dass es um Herzog Casper Allen und seine Leute geht.” - Noun cases. Another crucial aspect to consider in language-specific rule sets is the concept of noun cases. Different languages employ noun cases to indicate the role or function of nouns within a sentence. Developers should account for these variations and ensure that their procedural text generation system can generate accurate noun case forms based on the grammatical rules of the target language.
The last point doesn’t have any specific example from Star Dynasties as the game doesn’t have such a function, but this point may be relevant for your game and the languages you would like to localize it into.
Customizing the rule set for each target language ensures optimal performance. However, there are potential pitfalls associated with this approach. Here is a list of pros and cons for customized rule sets:
Pros of Customized Rule Sets:
- Complex localization, including coding of the localization, becomes possible and feasible, allowing for a seamless gaming experience across different languages and cultures.
- The game can embrace gender inclusivity through localized features, creating a more inclusive and representative environment for players, and thus speaking to players who would otherwise be less likely to play and buy the game.
- Pre-translation analysis helps identify and prevent potential issues during the localization process, reducing the likelihood of errors and ensuring smoother integration of language-specific rules.
Cons of Customized Rule Sets:
- Complex localization, including coding of the localization, requires a significant amount of time and budgetary resources, impacting the overall development timeline and cost.
- Linguists involved in the localization process may introduce potential errors in functions, resulting in text that may appear strange or unnatural.
- Automatic quality assurance (QA) and spellcheck tools may not be as effective due to the presence of procedural functions, leading to false alerts and requiring additional manual review.
- Some functions might be complex, and pre-translation analysis (PTA) may not be effective in preventing issues related to them beforehand. So, functions might be adjusted only after the linguists tried to work with them in the context.
In addition, it is crucial to provide means for linguists to test how each line appears in the game. While localizing a project, it’s not always possible to accurately grasp the context solely from the information provided in the localization file. A valuable tool in this regard is the Dynamic Text Tester, which was generously created and provided to Allcorrect during the localization of Star Dynasties. Linguists can simply copy and paste the line they wish to check, and the tester generates variations of the line based on the functions used in the game. This tool proved invaluable in ensuring that the localized text aligned perfectly with the intended context and functionality within the game.
Composed Lines
Now let’s jump to something a little different. In Rogue Legacy 2, some of the player’s equipment is randomly generated. The material such as Leather, Gilded, Obsidian, etc., (adjective) is concatenated with the type of equipment like Weapon, Cape, Helm, etc., (noun). It works perfectly in English, in all situations without extra work, but in other languages adjectives and nouns can have different genders, and in some languages the order will be reversed.
You need to concoct a “formula” to tell the game how it should glue these lines together, and know that the formula will differ depending on language.
Here’s a snippet of how it works in Rogue Legacy 2.
Material
English | Italian | Chinese | Portuguese |
Leather | di cuoio | 皮革 | Couro |
Obsidian | di ossidiana | 黑曜石 | Obsidiana |
Gilded | d'oro | 镀金 | Ouro |
Crescent | lunare | 新月 | Crescente |
Drowned | dell'abisso | 淹没 | Submersão |
Equipment
English | Italian | Chinese | Portuguese |
Helm | Elmo | 头盔 | Elmo |
Cape | Mantello | 披风 | Capa |
Formatter/formula
English | Italian | Chinese | Portuguese |
{Material} {Equipment} | {Equipment} {Material} | {Material}{Equipment} | {Equipment} de {Material} |
(EN) Leather Cape turns into (IT) Mantello di cuoio.
Chinese is pretty straightforward: we reversed the order and removed the whitespace (since the Chinese language, just like Japanese, doesn’t use whitespace) in the formula. In Portuguese, we needed to reverse the order and insert the “de” preposition in the formula. Italian was trickier because the preposition could be different, so we put the preposition into the translation of adjectives.
With such an approach, composed lines SHOULD NOT be used outside of this concatenation. The translation and the formulas are tailored to work in this specific case, and ideally, they shouldn’t be used anywhere else in the game. If, say, you reused the “Leather” string somewhere else in the game, it might not be correct in Italian. In English, it can be either a noun or adjective, and it works fine, but in the example our Italian translation for “Leather” is more like, “made of leather,” and the first letter is in lowercase, so it would only work in the composed line together with the formula.
How to Prepare
What you need | What happens if you don't have it |
Complete list of which lines are used in generation, and what terms are combined. Example: Material is taken from rows 2-10, items are rows 11-20, the formatter is row 21. | There is a risk of missing something, or not covering all possible cases of concatenation. |
A line shouldn't serve two purposes. (If in your game, "Leather" can be an adjective in some places and a noun in other places, while it's the same word in English, you will still need to use two source lines for "Leather" if you plan localization.) | The translation is correct in one place, but incorrect somewhere else. |
Ability to customize how the item name is generated depending on the language (formula/formatter). It shouldn't be hard-coded. | Broken names and incorrect grammar in the localized version. |
Rogue Legacy 2 Gendered Strings
The main character could be either male or female. We didn’t have any custom tags for the gender, but we could provide two translations for specific lines (and the game would automatically pick either the male or female version based on the character’s gender).
English_M | English_F | Italian_M | Italian_F |
You're a cool dude. | You're a lovely lady. | Sei un figo. | Sei un'adorabile donzella. |
Barbarian | Barbarian | Barbaro | Barbara |
{0} has succumbed to the Black Root Poison | {0} has succumbed to the Black Root Poison | {0} è morto a causa del veleno di radice nera | {0} è morta a causa del veleno di radice nera |
Advantage: without a custom tag system, this needs the least amount of dev time.
Disadvantage: you need a bigger localization budget.
Pre-production
It can’t be stressed enough that pre-production is crucial in the localization of generated texts. A lot of problems related to generated texts could be avoided if enough time was spent preparing for the localization process. Here are some things that would help a lot:
- Documents explaining how different functions work. Such documents would be the best reference for all linguists working on the project, helping them decide how exactly they want to translate a specific line based on the way it could be used in functions.
- Documents that explain how different parts of sentences are connected in case of composed lines. It is absolutely necessary for linguists to understand what options could be connected to a specific line in order to translate it correctly.
- A list of all functions present in the lockit. It would be a great start to do pre-translation analysis of all functions present in a game so we can find a general logic that could be used throughout the project.
- Beastiarium (or a similar encyclopedia). An overview of all characters, enemies, NPCs, and locations would be an amazing reference to make sure that every member of the team understands every character.
- A tool for the linguistic team to test how localized lines will look in the game.
We already mentioned pre-translation analysis, but it really has a very important role in projects with any type of generation. During that analysis, a lot of potential issues can be found and resolved. For example, analyzing “if” functions allows us to introduce genders to nouns in languages that require it.
Last but not least, selecting one linguist per language to communicate with developers directly helps a lot. Direct communication makes it much easier to solve language-specific problems, such as functions not being fit for language grammar.
LQA
One crucial aspect to ensure the effectiveness of text-generated narratives is linguistic quality assurance. It is essential to conduct thorough testing before the game’s release to avoid unexpected surprises. Without proper testing, the game experience may feel unpredictable, like opening a box of chocolates without knowing what you’ll get. To optimize the testing process in terms of budget, time, and effort, a per-function testing approach is recommended.
Since these games offer non-linear gameplay, where different players can encounter varied content based on their choices, thoroughly checking all game content can be time-consuming and costly. Thus, adopting a per-function testing strategy can be more efficient. This approach involves testing a set of functions that share the same logic and usage in the game. Once all variations are tested within that set, the tester can move on to the next set of functions, ensuring comprehensive coverage of all possible combinations in the game while minimizing the time spent on iterations.
By employing per-function testing, developers can streamline the testing process, verify all potential variations, and achieve a higher level of confidence in the game’s generated content.
Closing Words
We hope that this article helped you to understand what procedural generation is and shed light on how you can avoid some pitfalls of this technique. The very final thought that we would want to get across is that projects with procedural generation are often great, they have very high replayability, and you can create very rich and dynamic worlds that would keep players entertained for multiple playthroughs. Yes, this feature comes at the price of a more complicated localization that requires more time, budget, and developer involvement, but we think that it’s worth it in the end. Last but not least, if you have any questions left, please be sure to reach out—we would be glad to help :)
We would like to thank the Iceberg Interactive and Cellar Door Games teams for the opportunity to work on Star Dynasties and Rogue Legacy 2 localizations. We would also like to thank the translation teams for making everything happen, and Anna Augustin, the lead linguist on Star Dynasties, for linguistic guidance.
Check out Star Dynasties for yourself:
https://store.steampowered.com/app/1194590/Star_Dynasties/
And Rogue Legacy 2:
https://store.steampowered.com/app/1253920/Rogue_Legacy_2/
https://www.nintendo.com/store/products/rogue-legacy-2-switch/
https://www.xbox.com/games/store/rogue-legacy-2/