by David Ethan Kennerly
April 22, 2003
Players spend millions of man-hours selecting optimum strategies in a massive multiplayer online game (MMOG). They are getting the best return on investment (ROI) from your MMOG. Are you? In this article, I will show you how data mining can improve game design in general and then I will present four practical applications:
1. Balance the economy.
2. Catch cheaters.
3. Cut production costs.
4. Increase customer renewal.
Although this article is written for massive multiplayer online games, you will also find that most of these techniques can be adapted to multiplayer and single player games. I will give several examples using fantasy MMORPG terms, since it is common knowledge. However, these techniques apply to most MMO genres; I have even used these techniques to improve an online trivia game show. But before we learn the techniques, let’s understand why data mining is a good tool for these jobs.
Because players lie. Player feedback alone provides a poor diagnosis of game design. The picture their verbal feedback paints is not even an approximate guide. It is a distorted portrait of psychological and social forces. Players do not accurately report their own behavior in surveys or customer feedback. They may say one thing but do another instead. For example, Dr. William Rathje, an anthropologist surveyed the amount of beer people drank in a household and then went through their garbage. The garbage revealed twice as much consumption as the surveys had. This method was more insightful than surveys, which had been the traditional method of data collection. As psychological and social creatures, players, and developers, subconsciously revise their self-reports.
Figure 1. Which gives you the clearest picture of your game? Surveys or logs?
As political creatures, players, and developers, also revise their reports. Players belong to special interest groups, which bias their reports. Political ganging, a human trait, exists in online communities, too. Wherever a MMOG has guilds, classes, or any social organizations it has special interest groups. The members of these groups put their own group’s interests before those of the entire community. Each claims that it is the victim of poor game balance. But the players that actually suffer the most from poor game balance are the most silent. The greatest victims are ending their days in your game in quiet desperation.
To many players the time spent online in your game is an investment. They expect their investment to perform well. They become upset if, despite their skill and time commitment, someone who happened to pick the better class, item, or other option in your game surpasses them. Data mining begins with accurate, empirical data. With this the game designer can make informed decisions. He can identify the victims of poor game balance, and he can correct it so that all players have an equal opportunity to achieve maximum performance.
Data mining also builds better theories. It gives the game designer insight into how players use and abuse the game. It broadens perspective, proves or disproves hypotheses, and substitutes facts in place of opinions. With increasing specialization of game development, a game designer no longer sees the big picture. It is all-too-common for any game developer to acquire a skewed view of the nature of his game. Disinformation, best-case scenarios, and a dose of self-hypnosis distort our theories. But if we can see the big picture, we can begin to challenge our own misinformed opinions. Let’s learn how to scan this big picture.
In the beginning there may have been the Design, but let's start the cycle where data mining begins, so we can discover how to recycle old data into new design:
Figure 2. Recycle old data into new design.
1. Live: Scoop up lots of raw data in the live service.
2. Archive: From here, clean it up and store it for safe keeping in an archive.
3. Statistics: Sift through the data to create statistics, which are more informative than the raw data.
4. Analysis: Then apply the actual mining, which yields knowledge about player performance.
5. Hypothesis: Propose hypotheses about how to tune the game.
6. Test: Test each hypothesis and then introduce the new design into the live service.
The final step closes the loop. Each iteration of this cycle evolves game balance. Let’s dive into the details.
A massive multiplayer game has thousands of game assets, or more. Every class, item, monster, quest, skill, zone, or any other game object is a game asset. In the data these game assets are dead; in the live service these assets come to life. It is the players that animate them. Player behavior generates rich information about game balance, so scoop up as much data as possible. Collect a large sample. Like any other statistical data collection, the sample should be random or otherwise representative of the actual proportions of player population. The larger the sample, the clearer the picture becomes. In a perfect game, an infinite number of players would render a perfect portrait of player behavior. On the other extreme, a small or biased sample generates no meaningful statistics. Given that this is a server-based game, collecting data is convenient. The data is already on your server.
When should data be collected? Temporal cycles, such as the season, day of the week, and the time of the day, complicate data collection. The most basic and instructive of these cycles is the weekly cycle. Once you understand the week, you can grasp the effect of a month, season, or holiday. Players cannot play as often as they wish on all days of the week. They have real-world schedules. So their playing volume varies depending on which day of the week it is. A graph depicts when most players participate. For a given player demographic it might be higher on some days of the week and some times of the day. For example, usage might peak on Saturdays, Sundays, and Friday evenings.
Figure 3. Player behavior is a function of the day of the week.
As well as the quantity, the quality of play differs depending on the day of the week. Some players might go on an extended adventure when they have more hours to spend. They might just stop to keep in touch with friends when they have little time to spend. So, to avoid daily variation, collect player performance data once per week. This provides you with the average behavior for the whole week. Be sure to measure at exactly the same day and time of the week. You should automate this process, such as with “crontab” in the Unix environment, or whatever scheduling tools your database management software supports. When you measure once per week instead of once per day, you achieve three ends simultaneously: you eliminate weekday variation, you reduce the data collection workload up to 600%, and you reduce the required archive storage space, also by up to 600%. If you are measuring data other than average player performance, then you may need to collect more often. But that is beyond the scope of this introduction.
After scooping up the raw data, let’s make it easier to analyze. Like processing a raw mineral, there are several steps that will prepare your data for mining. Many alternate methods can do this. Here is a simple method that economizes storage space and reduces mining computation. This preprocess has five general steps:
1. Take a snapshot of the database.
2. Validate that the data is clean and appropriate for analysis.
3. Integrate the data into a central archive.
4. Reduce the data down to just the fields you need.
5. Transform the reduced data into a form that is easy to analyze for player performance.
The details depend on the system’s configuration. This example explains each step in a simple system:
Figure 4. Prepare the raw data for mining.
Suppose you are operating a fantasy MMORPG during its commercial service.
1. Start at the accounts database. This will be the first step to economy, since the accounts database has the ID of every record that you want information on. Schedule an automated snapshot of the user data at 00:00 on Sunday morning.
2. Validate which data is relevant and clean. This eliminates garbage as soon as possible, so that you are not storing or analyzing unusable data. Starting at the accounts database, exclude unregistered accounts or administration accounts. For example, exclude test and admin characters that have artificial attributes. For each valid character in an account, query for activity in the log database. If the character has not been active during the previous week, then its record contains no player performance information.
3. Backup valid user, log, and accounts records into an archive database. This will be a useful warehouse that you may return to in the future to mine for data you have not considered yet. Treat this backup preciously; if you were an archaeologist, this would be your find; if you were a detective, this would be your forensic sample.
you are now overwhelmed with a deluge of data.
There is much more than you need to analyze a particular problem, such
as the amount of experience points earned per hour of play. So reduce the data down to the fields you
need. In this example, select the
character ID, level, class, experience points, and number of hours played. Create a table of these values.
ID, level, class, exp, time
this reduced data to make it easier to analyze. Since this archive has weekly versions of the data, use last
week’s data to create new information.
Get the difference of the experience points and the difference of the
time played. Append these columns to
the table. If this is the character’s
first week, then there will be no information from the previous week. If the character has not played a while,
then search backward through each prior week’s archive.
Δ exp = exp1 – exp0
Δ time = time1 – time0
ID, level, class, exp, time, Δ exp, Δ time
Figure 5. Archive a table of player performance data in terms of EPH.
Basic statistics can extract information from this fresh, well-prepared data. Since there is too much raw data to draw conclusions from, categorize or aggregate this data. For a simple example, let's categorize the data by one of four fantasy player classes: fighter, priest, rogue, or wizard.
We will attempt to measure performance. Do not be misled by the popularity of each category. The number of characters that fit into a certain class or choose a strategy in the game depends on many variables irrelevant to optimum performance. Cultural preferences, aesthetics, fads, rumors, and other trends sway players' choice. Chasing popularity as a measure of performance, leads to a vicious circle. Like a cat chasing its own tail, balance would never be achieved.
Measure rates instead of instantaneous values. High performance is not any particular value. It is measure of change from a low value to a high value in a short period of time. The period of time to measure is the week. As noted earlier, the week is more stable than the day.
Let's take experience points per hour versus level for each class as an example. “Experience points per hour” is such a useful indicator that I will abbreviate it as EPH. Like a car’s MPH (miles per hour), a player’s EPH indicates his speed or rate of progress. Count the “experience points,” which is a performance indicator, instead of the population of a class. Count the change in experience points from one week to the next week. Count the time that the character actually played, instead of the total amount of time that has passed. For example, if the character played twenty hours in a week then use this value, instead of the 168 hours in a week. This gives the following derivative:
Figure 6. Like a car’s MPH indicates speed, a player’s EPH indicates rate of advancement.
EPH = Δ exp / Δ time
Let’s graph the results. On the vertical axis is the EPH. On the horizontal axis is the level range. If there are too few samples per level, then group nearby levels together.
Figure 7. Compare player performance between various strategies in the game.
Then plot each category as a data series. In this example each series is a player class: fighter, priest, rogue, or wizard. Along the horizontal axis we can see the difference between the heights of each class' performance. If the difference is small, then it is statistically insignificant. If the difference is large, then it is statistically significant. Based on the size of the sample and other qualities of the data, statistics defines the minimum gap that indicates significantly low performance. In this example, the most significant gap is between the high-level fighter and the other three high-level classes. So statistics discovered that the high-level fighter segment of the player population suffered from low-performance during that week.
The core of data mining begins where statistics ends. Here we can extract golden knowledge from the raw mineral that we began with. Several techniques can be applied, most of them particular to the data and the purpose. Here is a simple set of techniques.
Calculate the maximum and minimum performance values. Do this for performance rate and performance growth. In this example EPH is a derivative of the experience points, and the EPH itself can be viewed as a function of class and level:
EPH = f(level)
Calculus provides the derivative:
EPH' = f'(level)
Because of the finite sample size, the precise limit and derivative does not exist. However the approximate derivative will provide insight into the game balance. At the maximum derivative players rapidly advance. At the minimum derivative players suffer stagnation. They play for hours with little advancement. Each of these will help isolate low-performance segments of the player population.
Comparing a previous and subsequent period can identify a trend. In this example, the EPH can be subtracted from its value last week, creating a new function:
Δ EPH = f1(level) – f0(level)
Where the change is significantly positive, that segment of players is performing better than it had been the previous week. This helps isolate an effect of a modification to a game’s design. Players’ adjustment to the modification delays full impact. Usually only early adopters will use the new feature at first. If it outperforms an old substitute, then most players will migrate. After migration the empirical comparison between the two features stabilizes.
Both of the above techniques can be combined to isolate and track specific low-performance. For example, tracking the change in high-level fighters from one week to the next indicates if their performance is improving or not.
EPH = Fighter1(80%) – Fighter0(80%)
Comparing this value to the other class values indicates the relative change. As the values converge, the classes are becoming balanced.
Figure 8. Top-down meets bottom-up when you analyze strategies as clusters of game assets.
Data mining can combine top-down analysis techniques with bottom-up analysis techniques. From the bottom-up our game may appear to be a galaxy of game assets with no hierarchical organization. From the top-down the same game may appear to be rigid containers of game assets. Cluster analysis might improve class or strategy design, since it generates clusters from the bottom-up, by mapping differences of individual game assets. This can compare similar assets in different categories. As well, cluster analysis can identify assets that multiple strategies share. If you are interested, the books at the end of this article explain techniques for cluster analysis.
As a game designer, it is dangerous to assume that you know your game. The analysis should inspire the hypothesis, since analyzing player behavior can prove or disprove a good hypothesis about game assets. The kind of hypothesis mentioned here meets two criteria:
1. Explain existing trends of game assets.
2. Predict the result of modifying, inserting, or removing a set of game assets.
Here are two examples of game asset hypotheses:
1. In EverQuest, players prefer pretty races.
2. In Dark Ages, a trap skill will increase mid-level rogue performance.
The domain delimits where the hypothesis applies. In this case the domain is a particular MMORPG, Sony Online Entertainment’s EverQuest or Nexon’s Dark Ages. Define the domain, or scope, that the knowledge that you believe you are discovering applies to.
Figure 9. Is player preference skin deep? (SOE's EverQuest)
Suppose when you discuss the appearance of races in an MMORPG with artists, the team divides into two camps. One camp argues for an equal number of game assets for gruesome player races as well as beautiful races. The other camp argues that many more players will choose beautiful races, so almost all assets devoted to the more beautiful races. Nick Yee provides survey data in his EverQuest research paper “Norrathian Scrolls” that may inspire this hypothesis. EverQuest players prefer Elves, in general, about 10-to-1 compared to the two least popular and, arguably, the most ugly races Trolls and Ogres (http://www.nickyee.com/eqt/metachar.html#4). To make the hypothesis rigorous, actual player population and the race performances should be analyzed, because, as noted earlier, data mining more accurately depicts player behavior than a survey does.
Figure 10. How can you balance group members but still keep the group together? (Nexon's Dark Ages)
In the second example, suppose you have analyzed player performance in Dark Ages. You note that mid-level, but not high-level, rogues have low-performance in terms of measured EPH when compared to the other four classes. In 1999 this was one of the decisions that I faced. I hypothesized that inserting a set of mid-level trap skills will improve performance, by improving their damage ratio. Then I used techniques in this article to test my hypothesis. During the transition, some players, especially non-rogues, argued about the performance of rogues. But the experiment succeeded. Within a month mid-level rogues had balanced EPH.
Testing is the most rigorous, sensitive, and critical step in the cycle. Although it feels good to hold a gem of wisdom, it feels bad to realize your treasured hypothesis is a false gem. So it is tempting, and sadly common, to halt the cycle before the testing stage. Test each hypothesis. If it is correct, it will survive with its value proven. If it is incorrect, then please conserve the team’s resources by discarding it.
A good test has two and only two possible outcomes: the hypothesis is true, or the hypothesis is false. A good test rarely yields an inconclusive result, which means the test needs to be repeated or modified to yield a definite true or false. This cycle is an elaboration of a basic idea: trial and error. Since testing detects error, it improves a game’s design.
Figure 11. Measure test results to validate or invalidate the hypothesis.
In the earlier example, high-level fighters suffered from low-EPH. Suppose someone suggests a new game asset, a new skill to increase the fighter's combat effectiveness. You design “Sword Mastery” to do this. After collecting data on the test server you compare the old and new EPH for each class in order to conclude if the skill improved high-level fighter EPH and what other results it may have.
In the test, mirror actual conditions as much as possible. Just like an ideal point, or a limit, identical conditions do not exist, yet you can approximate. Test an identical configuration, build version, feature set, and at the same day of week and time of day. Additionally, the population will be smaller, which means results will be less precise. But the most uncontrollable factor of the test is the players. Your test player population is not going to be random sample. It will be a self-selected sample whose average motivations and behavior will be biased. So the test contains error. Worse than this, discovering the direction of bias may be an intractable problem.
Although a perfect test is impossible, a test that contains experimental error may still improve your game's balance tenfold, because this process is iterative. If a single iteration cuts game imbalance in half, two iterations will quarter game imbalance, and so on. This improvement is far better than no improvement or worse, than designing with disinformation, such as feedback motivated by competing special interest groups.
After a new design passes this test, feed the design back into live service. The process is iterative, so for best results, repeat monthly.
We have glossed over the general process. Let's now step back and consider a healthy scope for data mining. Data mining provides answers that other methods of evolutionary game design cannot. However, it is not a panacea.
Data mining takes numbers, processes them, and makes new numbers. These numbers cannot tell you how each player feels. The player may be misinformed or biased about the balance of the game, but she is always right about how she feels. Some players' feelings may be immature, and some players may have contradictory responses. Yet the paradox is that they are all right. Every player’s emotional response is valid. The data also does a poor job of revealing how players feel about each game asset. It does not indicate which asset has beautiful modeling, expressive animation, or a compelling story.
Figure 12. Preemptive data mining employs your staff to harass customers.
A healthy scope excludes preemptive or preventive data mining, which attempts to identify and prevent cheating, harassment, or sabotage. This equates to profiling and an invasion of privacy. Besides being unethical, preemptive data mining is disastrous. Data mining cannot establish cooperation or culpability. Not only is it prone to random error and false positives, but also it creates a new source of player harassment. This source of harassment is hard to discover, impossible to eliminate, and much more costly: Harassment by your own staff upon your customers.
Data mining is also called knowledge discovery. While you can mine knowledge from data, you cannot mine wisdom. You have to prioritize results and decide which game imbalances should be left alone. Data mining automates a process within your overall evolutionary design cycle. It amplifies an efficient design process and multiplies the problems in a poor process.
Now that you have seen the general process, let’s apply it to some common MMOG problems. Here are four practical applications of data mining:
1. Balance the economy.
2. Catch cheaters.
3. Cut production costs.
4. Increase customer renewal.
Each game asset that passes hands between players is a commodity or currency. These tradable game assets define the game’s economy. The commodities and currencies need not be limited to money and property. For example, in Nexon’s Dark Ages, I designed and implemented a labor currency, a political currency, and a religious currency.
Figure 13. Religion, politics, and labor can also become currencies in an MMORPG. (Nexon's Dark Ages)
Be careful when measuring individual character gains and losses. Account for transactions that exchange one commodity or currency for another. For example, a character could have less money after one week but have more wealth. He may have exchanged his money for other commodities of greater value.
Track the game’s macro-economic indicators. See if the supply of currency is increasing or decreasing. Like a real-world money supply this tells you about the inflation rate of the currency. Measure key performance indicators and generate hypotheses of how to improve game balance.
One simple balance technique you can use is to change the price of a game asset. Players are more receptive to price changes than they are to other attribute changes. For example, in 2002 when Stewart Steel noticed low admittance rate for wizards in Nexon’s Nexus: The Kingdom of the Winds, he increased the rate by increasing the starting items of that class. In effect, this increased the price that an NPC paid to the player for choosing the wizard career.
Figure 14. Players tolerate price adjustments more than other changes. (Nexon's The Kingdom of the Winds)
After testing the hypothesis, repeat the cycle each month. Each modification, although seemingly insignificant, can have a huge ripple effect on the rest of the economy. In the same example, if there were a higher starting value but a poor prospectus for the career of a wizard, then retention rate among wizards might drop.
While balancing the strategies, such as player classes in a fantasy setting, ensure that each strategy remains unique. Keep the clusters in strategic space from converging. Let’s return to the original example. The low-performing high-level fighters have several unique and shared assets. When adding a new asset to balance their performance, it might be better not to give a fighter “Poison Tolerance.” If the Priest class has an ability to cure poison, then this would be redundant. It would reduce the group’s demand for Priests and begin to merge the two classes. Instead it might be better to provide “Sword Mastery” if no other class has this kind of ability. This controls the supply of assets so that each cluster of assets retains its unique niche in the game.
Figure 15. Balance each strategy's performance yet keep each strategy unique.
A cheater in a MMOG does not just cheat himself. He performs an injustice to all honest players. Cheating short-circuits gameplay, so it achieves exceptionally high performance. Players adopt high performance strategies, whether intended by the designers or not. Cheating also penalizes the relative performance of all non-cheaters. If not corrected quickly, the cheating will spread like wildfire. In a matter of weeks or even days a cheat can flood the game’s economy. These techniques can help catch cheating before it ruins the economy.
Start at the table that preprocessing generated. This lists each character ID and their performance. Sort the list by the performance column. Now, at the top of the list is the most suspicious character ID. Investigate his exceptional performance.
Figure 16. Investigate suspicious player performance starting at the top.
Let's sort the example table by the EPH column. The character at the top is the most suspicious. Even though he has a lower total experience gain, he has a higher rate, since he accumulated the experience during fewer hours. Investigate the logs to discover how he performed so well. The answer will enlighten you as to how players use and abuse your game.
The answer does not indicate the player’s intention. The player may have been using a legitimate feature of the game. In fact, a player may argue that unless he modified the software, all of his behavior is a legitimate use. He played the game as it was given to him. Regardless of the motive, deleting a cheater cannot solve the problem. System imbalances breed cheaters, so the design itself can prevent cheating.
Each game asset took some amount of programming, art, design, testing, and customer service to develop and maintain. Yet some of these classes, items, monsters, quests, skills, zones, and other objects in your game are being wasted. This lowers developer morale.
Low-performance ROI -> 0%
Game assets with low performance have a return-on-investment value that approaches zero. Players make decisions and optimize their decisions over time. Communication with other players accelerates migration to an optimal strategy. They quickly adopt the highest performing game assets available. They discard low performance assets. In terms of competition these assets are liabilities, so they become obsolete. For an obvious example, if there are two nearly equivalent weapons, except one has a higher damage rate, the other weapon is obsolete. In a game, this kind of decision, between obsolete assets and newer assets creates fat. It means there is some fraction of your game that might as well not exist, because no one uses it. Imagine having to break this news to an artist: "Thank you for the long-nights you spent making this new graveyard that we specified, but no one hunts there. Sorry about that."
Figure 17. Are players using all of your game's assets? (Nexon's Dark Ages)
It does not have to be this way. A wise patch can put the assets back into the players’ list of options. Recycle the artists, programmers, and testers’ hard work as much as possible. Create new, well-balanced instances. Measure and prove their balance in terms of performance. Do not change the values of existing instances. Let them remain as they are. Recycle the art with modest modifications so those man-hours are not lost. But only recycle obsolete assets. Players do not tolerate recycling of assets that they do not consider obsolete. They demand fresh assets.
If the player cannot or does not realize how to improve his performance with the choices he has already made, he is doomed. For example, if all high-level fighters perform worse than average high-level characters, all fighters are doomed. The players’ sense of doom will become the developer’s death knell unless you act fast.
Figure 18. If a player cannot improve her performance you may lose her.
When a player suffers from poor performance in a single-player game, he suffers alone. But in a massive multiplayer game his whole team suffers. Unfortunately, a good choice for the team to increase their performance is to exclude low-performers. When low-performance is not the player's fault, this breeds frustration. Suppose a group of players can increase its EPH 20% by excluding low-performance players. Sadly, many groups will. Suppose the excluded low-performer is unable to alter his EPH liability. Like an endangered species that is unfit to hunt and unable to evolve, this set of players becomes extinct. The character will not only become extinct from the playscape, but the player’s motivation to play will become extinct, too. If she will not play, eventually she will not pay, either.
To prevent this, balance the strategies. Do not edit existing instances of game assets. This will upset other players using other strategies. They will perceive the correction as an injustice, an act of favoritism. Instead of creating a perceived injustice, add new assets.
If your game is commercial, improve player performance instead of worsening it. If some asset is too good, but players love it, let it be. Only when the asset would cause long-term customer losses should it be removed, because removing or degrading an asset decreases customers’ good faith. There are few things that say, “I do not want your money, go away” as quickly as removing a beloved feature in the game. Players paid money in advance and continue to pay a subscription fee each month for a reason. They expect the game to improve each month. Their criterion is simple. The game should improve for their character personally and for the special interest group that their character belongs to.
A major error’s existence costs more than this loss of good faith. In this uncomfortable cases, prove to players you care by negotiation and diplomacy. Players' feelings are stake. Imagine a worst-case scenario from the most extreme player’s perspective: This morning your paycheck was suddenly slashed 50%. Your brand of car drove half as fast, required repairs twice as often, and costs twice as much. Because the "gods" said so. Some customers take the game just as seriously.
We have only touched the tip of the iceberg of data mining and game design. Both are elaborate and exciting fields for research, experimentation, and application. For years we, as game designers, have wanted systematic and scientific tools. I hope this tool will help improve your game’s design. If you have questions, comments, or would like to discuss this topic in detail please contact me at kennerly%20(AT)%20finegamedesign%20(DOT)%20com
I could not have written this article without the support of each of Nexon’s employees and players. They encouraged my experiments.
Han, Jiawei and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers: San Francisco, 2001.
An introductory practical explanation for database programmers.
Hand, David, Heikki Mannila, and Padhraic Smyth. Principles of Data Mining. MIT Press: Cambridge, 2001.
An interdisciplinary explanation of the mathematics and fundamentals of data mining.
Electronic Privacy Information Center. “Total Information Awareness (TIA)” <http://www.epic.org/privacy/profiling/tia/> 20 April 2003.
An ongoing log of preemptive data mining and its danger to society.