Civ4 AI Survivor Season 3: Conclusions

Civ4 AI Survivor Season Three File Downloads

Excel Spreadsheet Download
Starting Savegame Files Download

Season Three of Civ4 AI Survivor concluded in November 2017 with a close ending in the Championship match. If you're still reading or watching through the games from Season Three, it's time to stop reading right now because this page will be full of spoilers as far as what took place throughout the competition. The rest of this conclusions section goes into a detailed breakdown of the data from this year's matches. We'll start by looking at the ranking of the AI leaders based on finishing order as we've done in past years:

We ended up with the identical distribution pattern as in Season Two, with exactly 11 leaders making the Wildcard game along with 18 in the playoffs and 6 in the championship. We had far more eliminations than in either of the past seasons, however, and in fact only six leaders in total ended up surviving the competition. 46 of the 52 leaders were eliminated at one stage or another this year, sheesh! It made for some very entertaining games though. We had a return to a series of early eliminations again this season, with Frederick, Ragnar, and Sitting Bull suffering extremely early exits. Only the Turn 124 death of Louis in Season One of AI Survivor was earlier than what we saw this year. At the other end of the spectrum, Shaka's Turn 377 elimination in the Playoff round was one of the latest to date. The current record holder in that regard is a Turn 404 (!) elimination of Wang Kon, also from Season One in a truly madcap opening round game. Qin similarly had a Turn 393 elimination in Season Two; this result from Shaka was the third-latest to date. All of those eliminations were outlier results since very few games even last that long. One other point worth noting: with Zara Yacob and Catherine both suffering eliminations this year in the opening round, we've now reached a point where no one has survived all three seasons. Every AI has been completely eliminated in at least one of the three years.

It was another terrible year for Germany, with both Frederick and Bismarck finishing near the bottom of the heap. That's the second year in a row for the two of them to suffer horrendous performances, and the evidence is really starting to pile up that both of them are poor AI leaders. Greece also saw both of its leaders bounced from the competition with weak performances, a significant letdown after Pericles made the Championship game in Season Two. For his part, Alexander has been a lead weight since the beginning. Sitting Bull, Gilgamesh, and Tokugawa all continued to show why Protective is the worst trait in the game, while Darius and Willem proved that having the best trait doesn't automatically make you into a winner. Willem in particular has been awful in all three seasons despite the amazing Financial/Creative pairing. Something about his AI personality just doesn't work in practice. Then we had our favorites that were bounced in the opening round: Huayna Capac, Cyrus, Catherine, and Mao all collapsed this year and destroyed innumerable picking contest entries. I tend to think this was due to poor luck or bad starting positions more than anything else, but we'll see how they bounce back in the next iteration of this event.

At the other side of the table, it was a breakout year for several of our AI leaders. Stalin was the biggest star in this regard, winning all three of his games and piling up seven kills en route to the championship belt. That's the best result we've ever seen, equalling Huayna Capac's gold/gold/gold performance from last year while setting a new record for kills in a single season. Julius Caesar almost matched that feat by winning two games and piling up seven kills of his own. It was a rejuvenated performance for Justinian and Mansa Musa as well, who both returned to the Championship again with fine performances in their own right. Justinian was a single oddly-timed war declaration from Caesar away from likely winning the title. Several other leaders emerged out of nowhere with impressive opening games: Hannibal, the Charlemagne/Wang Kon duo, and especially De Gaulle. He had been a punch line for the first two seasons and ended up with a legitimately great performance in the opening round this year, winning outright and picking up three kills in the process. For one game, at least, De Gaulle was a leader to be feared. It was also a good year for Russia (with a championship from Stalin and Peter somewhat undeservedly making the playoffs), India (two leaders in the playoffs), and France (ditto). Rome and Egypt had bifurcated performances, with one strong leader and one weak leader. We also had the top three leaders in Stalin, Justinian, and Caesar combine for an astonishing 20 kills. The three of them feasted on the rest of the field in a way that we've never seen in past years. The three of them had 20 kills... and the other 49 leaders combined for 26 kills. Absolutely incredible.

Here are the updated rankings based on order of finish after three seasons:

Now that I've posted this list based on finish order, I'm going to explain next why I don't think this is a very useful criteria for evaluating the AI leader performances. In other words, I come here not to praise but to bury. What's the problem with the finish ranking system? In a nutshell, while listing AIs by this particular finishing order is a useful way of displaying all of the information about a single season in one place, it's not very useful at all in terms of evaluating which AIs were stronger than others, outside of a very rough sense of who made it further. It's even worse at making sense of AI performance across multiple seasons. Here are some specific criticisms of this ranking system:

* Finish Order Ranking prioritizes survival too highly. OK, so there is some logic to this since the name of the overall competition is AI Survivor. And if we're forced to choose between a whole bunch of AI leaders who accomplished very little, then ranking them by how many turns they managed to last before being eliminated makes as much sense as anything else. However, this creates a false sense of achievement for leaders who do nothing other than sit in the corner and avoid elimination. For example, does Roosevelt really deserve to finish 20 spots higher than Frederick because it took longer for him to die? Neither leader was ever the least bit competitive in their respective games, it simply took Roosevelt longer to be removed from the playing field. This is even worse when factoring in the Wildcard game. Inert, stagnant leaders that do nothing in their opening round games and make the Wildcard game get a ranking in the 20s despite often doing nothing to deserve it. In this year's competition, Washington gets accorded a ranking of 22nd place out of the 52 leaders - Washington! The man launched a single unsuccessful war in his opening round game, followed by doing absolutely nothing, and then sat in the corner of the Wildcard game getting saved from conquest by a series of Apostolic Palace peace resolutions. He was a terrible leader this year, and yet because he lasted for a long time without being eliminated, Finish Order Ranking considers him to be one of the better performers. This is far too much value placed on simply not dying.

* Finish Order Ranking includes no difference between first and second place. This is a real problem because there's often a huge difference between the winner of an individual game and the runner up. While both of them get to move on to the next round, that doesn't mean that they exhibited equally strong performances. Game Four from the opening round was a good example of this, with Justinian crushing everyone else via a Domination victory and Louis tagging along to the playoffs despite being 4000 points behind on the scoreboard. Finish Order Ranking rates these performances exactly the same because both leaders "survived" to the next round. There are a number of leaders who have accrued seemingly impressive rankings via finish order by getting carried along in second place by other, better leaders who took home first place. Does anyone really think Asoka is a top ten leader with his pair of second place finishes and Wildcard game appearance across the three seasons? If we're interested in actual evaluations of skill, we should grant more weight to the leaders who emerge victorious in each individual game. That is the whole point after all.

* Finish Order Ranking has the identical 52-point scale for each spot in the ranking order with no other distinguishing factors. This is a little bit harder to explain so bear with me here. Everything under this traditional ranking system is based on a scale from 1 to 52, with each leader assigned to one of those slots. Under this system, there's no ability to recognize especially strong or especially weak performances, with everything getting averaged towards the center. If you look at the three year averages, virtually all of the AIs wind up between 10 and 40 and there's not enough distinction between them. In Season Two, Huayna Capac won all three of his matches and scored five kills, yet his first place ranking put him only 10 slots above nonentities like Shaka and Tokugawa. In fact, there was as much gap between Huayna Capac and Shaka (10 slots) as there was between Shaka and Roosevelt, another AI who produced a pair of third place finishes and generally did nothing of consequence. Huayna Capac absolutely torched Season Two and he should have finished enormously ahead of those other leaders, not a slight nudge better. The finish order system fails to accurately represent standout performances and keeps dragging everyone back towards the center.

The net result is a ranking system that doesn't work very well and leads to odd, nonsensical results. The biggest culprit of this is Brennus, and I'm calling him out right here as a total fraud. Finish Order Ranking has Brennus rated as the fifth best AI in the whole field! Seriously, Brennus?! He's never won a single game in all three seasons, and yet this ranking order is slotting him in between Huayna Capac (four wins, eight kills) and Caesar (three wins, eleven kills). Finish Order Ranking claims that Brennus is a better leader than Caesar, better than Kublai (two Championship appearances), better than Cyrus (the "true" Season Two Champion), better than Stalin and Mao and Catherine. Brennus has perfectly managed to thread the needle as far as gaming the system, with a pair of second place performances in the first two seasons (carried along by Suryavarman and Mansa Musa) and then an uninsipiring Wildcard appearance in Season Three where he managed to avoid dying and therefore ended up in 21st place. Brennus is a total fraud and any system that ranks him this highly needs serious reevaluation. (Shaka, Qin, Asoka, and Peter are also highly overrated here with a grand total of one victory between the three of them.) The finish order ranking system is a bit like using Runs Batted In (RBIs) in baseball, a statistical convention that began at an early date and then kept being used despite not having much predictive or analytic usefulness. Sure, having 100 RBIs is better than having 10 RBIs, but any sabermetrician can tell you that it's a poor way to judge individual baseball players. We can come up with a better metric for AI Survivor.

Several of the community members have been experiencing the same feeling and tinkering with alternate ranking systems for the AI leaders on their own merits. The one that I'm going to use here is fairly simple: 5 points for a first place finish, 2 points for a second place finish, and 1 point for each kill. This system rewards AI leaders for finishing highly and taking out other competitors as opposed to simply avoiding death. It correctly recognizes that a first place performance carries more weight than a second place performance, and it awards many more points to top performers to separate them from the rest of the field. The Golden Spear winner with the most kills in each season will typically end up with roughly five kills and therefore get the weight of one extra first place victory, which feels about right to me. Stalin's performance this year racks up 15 points for his gold/gold/gold finishes and then another 7 points for his collection of kills, which puts a chasm between his 22 points and the 0 points that the nonentities produced. There's no penalty for being eliminated under this scoring system, but naturally bombing out of the competition means an end to scoring any more points. Similarly, just making the Wildcard game means nothing unto itself. The Wildcard game offers another chance to score points, and that's it. This scoring is completely arbitrary, of course, but no more arbitrary than the Finish Order Ranking that we were using in past years. The new system shifts the ranking from passive "not dying" to actively winning games and defeating competitors, making it a more interesting and hopefully more accurate metric.

Here's the new list using this system, which I've dubbed Power Ranking to help distinguish it from the old system:

Under this system more points are good to have, which is helpfully also more logical than the Finish Order Ranking. This list fits much better with what we would expect to see, as Mansa Musa narrowly edges out Justinian and Huayna Capac for the top spots. Mansa is the only leader to have a victory in all three seasons, and he's been almost scarily consistent with exactly two kills in each year. Justinian has had two amazing seasons and one dud, while Huayna Capac piled up an incredible 20 points with his Season Two run before crashing out of Season Three with a disastrous performance. Caesar and Stalin make the top five with ridiculous runs in this year's competition but can't quite catch the leaders at the top due to weaker past seasons. And from there it's pretty much the names we would expect to see, with familiar successful leaders like Kublai, Cyrus, Pacal, Cathy, Zara, Mao, etc. The top 13 leaders on this list have all made the Championship at least once, with Gandhi being the first one not to make it that far, and Gandhi has two victories and two playoff appearances to his credit. Brennus loses his fradulent top five spot and falls all the way down to a tie for 20th place, grouped with one hit wonders in Lincoln and Wang Kon. Other overly ranked leaders like Qin, Asoka, and Peter are similarly placed in more accurate spots. Peter drops all the way from 14th place down to a tie for 30th place, which again is much truer to what he's actually done in this competition. Peter has a distant second place finish and two Wildcard appearances - hardly the mark of a leader in the top quartile.

At the bottom of the list are a bunch of leaders who have failed to achieve anything at all. Nine leaders have zero points from finishes and zero points from kills. Another five leaders have a single point from a random kill without ever managing to taste first or second place. Given that everyone in the competition has played in at least three games by now, and most of them have taken part in four or five games, this is a rather harsh indictment of their skills. Victoria is actually middle of the table (32nd place) by Finish Order Ranking despite scoring zero points from finishes or kills. That's a good indication of how repeated appearances in the Wildcard game can overinflate a leader's finish ranking. Note that the average score by Power Ranking is 7.7 but the median score is only 4.5. The scoring is heavily weighted towards the top of the table, and most of the AI leaders have never won a game at all. We've had 39 total games but only 20 of the 52 leaders have a victory to their credit. The Mansa Musas and Huayna Capacs of the world have been greedy in taking so many victories for themselves.

With a full listing of the kills across three seasons provided here, we can also hand out our Golden Spear award for Season Three. We officially had a tie between Stalin and Caesar, but since Stalin broke the tie in the most dramatic fashion possible by axing Caesar himself in the waning moments of the Championship game, Stalin is our deserving winner. Note that both of them easily exceeded Season One winner Mao's total of four kills, and Season Two winner Huayna Capac's five kills. Justinian also would have taken home the crown in any other season with his own six kills. Caesar holds the overall multiseason crown with eleven kills, with Stalin and Justinian joining Huayna Capac at eight. At the bottom of the list, Montezuma did finally score his first kill this season while Alexander and Gilgamesh still remain holding the bag with nothing at all to show for their efforts. There are a dozen AIs still remaining who have failed to claim any kills and another dozen that have only one. The top of the chart has once again monopolized the lion's share of the 129 total kills.

Let's look at how the leader traits grade out using the same system:

Once again, higher numbers are better here unlike the previous ranking based on finish order. The numbers that appear here are the point value of each leader who holds each trait, with each score appearing twice since each leader has two traits. This also means that there are a whole bunch of zero scores from all of the leaders who have failed to earn any points at all across the three seasons of competition. The trait distribution looks similar to what we had last year, where Imperialistic and Financial were also the top two traits. We know that these are the traits that help AI leaders expand faster and tech better, which are some of the most important factors in winning games. (And yet amazingly Victoria, the Finanicial/Imperialistic leader, has a total score of zero. How?!) The two biggest movers were Spiritual and Industrious, both of which saw major leaps upward as compared to their old values. Industrious in particular graded out as the worst trait under the previous ranking, and indeed most of the Industrious leaders have fared poorly to date. The Industrious trait is being carried on the backs of Stalin and Huayna Capac, and would otherwise grade out as being worse than Protective. I expect that Industrious will fall back a bit in future rankings since it's unlikely that Stalin will be able to replicate his magnificent run a second time. The Aggressive trait is similarly being propped up by good seasons from Stalin and Kublai Khan which may not be that likely to repeat down the road.

The biggest fall came from the Philosophical trait. It graded out as the best trait overall after the first year of competition, which was a bizarre result that didn't make any sense at the time. The AI leaders always get a ton of Great People on Deity difficulty, and adding Philosophical to the mix does little to boost their performance. It was reassuring to see Philosophical fall to the middle of the pack following Season Two, and another year's worth of data (and a better ranking system) seems to have it finally rated in an appropriate location. The Protective trait deservedly takes its place at the bottom of the scale this year with an average score of only 4.11 points. Without Mao at the top of the list it would be even worse off. Protective leaders have managed only two total wins in 39 games to date and for the moment the trait appears to have a lock on its much-deserved status as the worst trait in the game. Even in an AI vs AI competition, this is a virtually useless trait.

Finally, we continue the tradition of looking at the rankings of the individual games to see which ones had the strongest competition. This table uses the Power Ranking numbers as a hopefully more accurate evaluation of individual leaders, with higher numbers again an indication of superior performance. The number next to each leader constitutes the points that they've accumulated via power ranking, with averages tallied on the right hand side. Unlike the prior system, this is no longer based on a median score in a distribution from 1-52 or anything like that. More points simply means that the leaders in that game have piled up more top finishes and more kills.

Two games immediately stand out from the opening round of this season. One of them was Game Two with a pitifully low average score of 3.43 points, among the lowest we've ever seen across all three years. It should have been more obvious that Gandhi would emerge from that contest given that the other AIs had achieved very little in the past. (Joao took first in a Wildcard game and Asoka has two runner-up spots, including one from this game.) Gandhi's other victory came from another opening round game that scored nearly as low, his win in Game Three of Season Two (3.67 points). Maybe we should all hold off on buying tickets on the Gandhi express just yet, since his two wins have both come against weak fields of high peace weight leaders. The worst opening round game of all time continues to be Shaka's win in Game Five of Season Two, which also graded out as the weakest game following Season Two. The competitors were Shaka, Tokugawa, Hannibal, Isabella, Willem, and Frederick. Yeesh. If there ever was a game with the motto of "someone had to win", this was it. That score of 3.33 points graded out more than 1.5 standard deviations below the median; for you statheads out there, the mean of the opening round games across all three seasons equals 7.67 and the standard deviation is 2.62.

The other game that stood out from this season was Game Six. We wrote in the preview that this game appeared to have an extraordinarily talented field and that was born out by the stats. The average score of 13 points dwarfs anything else from the opening round of Season Three and graded out higher than many of the playoff games that we've run. Caesar, Cyrus, Zara, Pericles, and Elizabeth have all made the Championship game in at least one season, and even Brennus and Qin are solid AIs with some modest successes under their belts. There wasn't a single dud AI in the field. This game currently has the highest score of any opening round game to date, a full two standard deviations above the mean. This is pretty much the definition of an outlier match.

One interesting aspect about this new ranking setup is how poorly the Wildcard games graded out, with average scores around 4 points for the first two seasons before rising to 7 points for this year. That might seem surprising, but only from the perspective of the old finish ranking system that awarded high marks to avoiding death and having leaders reach an additional match. The new ranking system looks at the leaders in the Wildcard game and sees a bunch of losers who couldn't manage to finish in first or second place, a group of weaklings known for doing the "sit in the corner and build" non-strategy. And to be honest, that's probably a more accurate assessment of their performances. The Wildcard game has been a frequent home to the Dariuses and Washingtons and Joaos of the world, boring mediocrities that have fared poorly against tougher competition. We need to stop thinking about appearing in the Wildcard game as though it's some kind of an achievement; the Wildcard game is something to avoid, a last chance for AI leaders who have failed to get it done in their initial efforts. Once the reader adopts that mindset, these numbers start to make sense. The Wildcard game should typically have a LOWER score than the normal opening round game, not higher, and that's exactly what we see here.

Among the playoff games, this year's matches graded out as a fairly weak lot. The average playoff game score across the three seasons has been 13.06 points and all three of this year's efforts were less than that. Playoff Game Three was especially weak and currently rates as the worst of the nine that have been staged to date. That was the community consensus going into the game and the stats back it up here. The standout playoff game was the first one ever held, with an average score of 17.17 points that actually grades out as being higher than Season One's Championship! That's what happens when Huayna Capac, Justinian, Cathy, and Cyrus are all put together on one map. (Also Augustus - what the heck was he doing there?! It should come as no surprise that he was First to Die in that match.) In contrast to this year's weak playoff games, we ended up with the strongest Championship to date with an average score of 24.50 points. This was one of those situations where the three playoff games served a winnowing function and all the deadweight was pruned away for a fantastic final match. True, the individual game rankings continue to be a bit of a circuitous process, with the leaders in the Championship rated highly by virtue of making the Championship and so on. Nonetheless, with three seasons now finished we're starting to get a larger sample size of total data, and I think we have enough at this point to make some solid empirical judgments. Mansa Musa and Huayna Capac are simply better at these matches than Sitting Bull, Isabella, and Bismarck.

Stalin, Survivor Season Three Gold Medalist

Here we are again at the conclusion of another season. As I said after Seson Two finished, it's been a ton of work running this competition but it's also been a great deal of fun. We had more participants for this season in the picking contest, with roughly 110 entries on average as compared to about 90 entries in Season Two. We also had a better turnout during the Livestream, including hitting a peak of 130 concurrent viewers during the Championship itself. That's among the highest totals that I've ever had while streaming, bettered only by the one random night in 2012 when all of the professional League of Legends players were traveling to a tournament and I found myself with 200+ viewers due to a lack of competition. The communal aspect of running the games on stream continues to be a big winner, and the Google Forms picking contest system makes it smooth and easy to process all of the submissions. We streamlined the setup process this year by generating the maps ahead of time, and we already have some awesome community suggestions on how improve things further for a potential Season Four (assigning only the real two starting techs to each AI and not the Deity freebies, adding a SECOND AI-controlled observer civ so that the player can adopt the Apostolic Palace religion and see the resolutions without opening up a potential Religious victory). For the moment, we'll be taking an extended break from Civ4 AI Survivor to let everyone's batteries recharge a bit. Until then, stay safe, have fun, and always beware of more trolling from Wang Kon!