Yes, I know you’ve seen about a dozen discussions about the broken EOL ranking over the years, and that none of them ever amounted to anything. Some good ideas were presented, some less good, but sadly it never lead to actual implementation. I believe one of the main causes was that none of those discussions (at least none of those that I’m aware of) were complemented by actual testing and comparison of the proposed ranking formulas.
==========================================================================================================
After Zero’s last topic about this ( https://mopolauta.moposite.com/viewtop ... f=4&t=9861, I decided to take the matter into my own hands. I’ve discussed this quite a lot on discord so some of you may be aware, but for the others, I wrote a framework that can compute the ranking from existing battle data. I tested it on a basic ELO formula, but I also played around with some parameters and added some modifications. The framework uses the results of the first 129,200 battles (all battles at the end of February 2018), and it computes the ranking after every single battle, for all of those battles and for all players. This enables us to readily implement many different formulas and their variations, and then compare the resultant rankings directly, eventually deciding which one we think is the most suitable.
I’m hoping to spark a constructive discussion here, in order to finally reach the desired outcome – the selection of an algorithm to implement. And a ranking will be implemented, likely on the new version of the elmaonline site that is currently under development by Kopaka.
==========================================================================================================
I’d like to start with stating a premise that we should use to define the ranking. Note that this is not final, but merely what I believe the ranking should represent, so let it be part of the discussion.
1. The ranking should indicate how likely, statistically, players are to beat each other.
2. The ranking should be as simple as possible, allowing players to understand it easily.
I.e. if player A is more likely to beat player B than the other way around, player A should be higher in the ranking, and the more likely he is to win, the bigger their difference in ranking should be. Simplicity is also crucial. The ranking is for us, and if we don’t understand how it’s computed, it’ll have no value for us.
Note that I didn’t use the word “skill”. I think we should have come to the conclusion by now that it is not possible to accurately determine the skill of a player. I think what matters in battles though, it the likelihood of defeating someone. Hence the premise above.
==========================================================================================================
Next, I’m going to present a summary of the past discussions about the ranking (list here).
- We need a battle ranking.
- Gaining points. A player should get more points for defeating a better player. This seems to be a consensus. Points for a battle could be the sum of points given/taken for beating/losing to all the other players in the battle.
- Losing points. Some said that a player should not be able to lose points by playing a battle, so as not to discourage players from participating. However, no good arguments for this were given, apart from “I don’t want to be punished for playing”. Moreover, this would lead to an ever-escalating ranking, as everybody’s points would only keep increasing. It would also mean that the ranking would favour the most active players. For these reasons, it’s very likely that players will lose points for poor battle performance.
- Ranking escalation. The formula should not allow the ranking to escalate indefinitely. This is a risk when new players enter the ranking at a default value. For example, if losing a battle means losing points, the poor players will have ranking below the default value (say, 1000). Then, a new player can enter at the default value (1000), but his actual skill is much lower (say, 500), so his ranking will tend towards that. But, if the average number of points per player remains constant (i.e. total points won = total points lost), then all the other players will have their ranking increase slightly. And this will happen every time a new player joins. A potential solution to this is provisional ranking, which uses the first few battles to approximate the skill of a player, and enters him at that level. However, for some formulas (e.g. ELO) this isn’t that much of an issue, as will become apparent from plots further down.
- Quantity of battles played. A lot of formulas (including the original EOL one) favour playing a lot of battles. E.g., if only a certain number of battles per week/month/year are included, those playing more battles than the threshold will benefit, as only their best battles will be selected. This is not desirable, as the number of battles played says nothing about skill.
- Battle size. Beating a good player in a battle with few players should not matter less than beating him in a battle with many players. Battle size should be somehow included in the ranking, but not by excluding battles with fewer players (which would happen if only a certain number of best battles are included).
- Using all battles. If only a certain number of battles are included, many battles won’t count for the most active players, which may be disheartening.
- Inactivity. Inactive players should not be able to climb up the ranking. Keeping their ranking constant means that a good player who briefly topped the ranking 10 years ago and hasn’t played since then could still be on top, which is undesirable. However, becoming inactive doesn’t mean losing skill, which leads to the dilemma, and also to the next issue:
- Old battles. One way to tackle inactive players staying on top of the ranking is to make old battles contribute less to the ranking. This could be done through a decay factor (an old battle’s contribution is multiplied by a factor that decreases with its age), or periodically (for example, the battles from the last 6 months are multiplied by 1, the ones from 6-12 months ago by 0.5, 12-18 months by 0.3, etc.). This means that taking pauses in the game would negatively contribute to the ranking, which is probably desirable. However, if points are lost by losing, players with ranking lower than initial would see their ranking increase after inactivity. This would only tend towards the initial value and only would only affect low-skill players though, so maybe isn’t that much of a problem.
- Responsiveness. The ranking should be responsive on a relatively short time scale, so that players can see their progress. However, players should not be able to gain/lose too many points in a short period of time. The ranking could be updated periodically as well (e.g. weekly)?
- Low effort battles. This point appeared quite frequently. Including all battles in the ranking could potentially discourage players from participating if they arrive late or know they could only play for a while. This could be tackled by, for example, excluding players that played a battle less than a certain amount of time (e.g. less than a minute). However, this could lead to the same problem, as players would potentially be encouraged to quit a battle early if they don’t do well or dislike the level. Also, an exception could be made for when a player does really well (e.g. gains points, or wins) despite a short playtime. However, in that case, players would be encouraged to practise in SL or in editor. Another idea is to give a player more points if he played little time, but that’s probably not desirable, as it overcomplicates things and possibly encourages players to quit a battle just after they made an ok time.
- Exclusions. For example, 0 apple results or certain battle types could be excluded from the ranking computation. Excluding 0 apple results is good because it excludes situations where a player esced in a megahard lev just to finish higher. But then if somebody does finish, he doesn’t get credit for beating all those who didn’t manage to finish and so had 0 apples. Battle starters could be excluded as well.
All of these issues need to be decided. Some leave little doubt, others still need discussion. When you reply, please say your opinions about this issues, but make sure to support those opinions with arguments. “I like this better” will not be taken into consideration, regardless of who it’s coming from. Reasonable arguments, however, will always be taken into consideration, again regardless of who they’re coming from.
==========================================================================================================
Now, let’s get down to actual ranking implementations. Below is a short list of the most reasonable ones mentioned in the past discussions.
- An ELO system is a fairly simple solution that is implemented widely in many disciplines. Although ELO is normally suited for 1v1 competitions, battles could be treated as a combination of several 1v1 battles. A player gains points for every player they beat, and loses points for every player they were beaten by. More points gained for beating a better player, and more points lost for losing to a weaker player.
- Mila’s solution for the belma ranking was similar to ELO, but instead of Ra’=Ra+k(-1/(10^((Rb-Ra)/400))) it’s Ra’=Ra*(1+k*exp(q*(Rb-Ra))) (and divide instead of multiply for players that defeated him).
- Another option is that every player donates a certain % of his ranking points into the battle pool, and at the end of the battle the players are awarded “points” – e.g. 5, 4, 3, 2, 1. The total amount of points is equal to 15, so the first player would get 5/15 of the pool, the second 4/15 of the pool, etc. This is similar to ELO in that a better player has to beat more players to maintain his ranking – he gives up more to the pool, so has to beat more players to get it all back.
I know these don’t necessarily explain the rankings very well, but it’s just a quick outline before I go into detail.
==========================================================================================================
ELO ranking system
This is the first (and so far only) system that I have implemented. I like it mainly because of its simplicity and popularity.
This implementation assumes 1v1 battles between all players, and works on the basic principle of expected result versus actual result. The actual result (s) is 1 for a win and 0 for a loss. So, in a battle of 10 players, the winner’s actual result would be 9 points (as he beated 9 players), and the last player’s results would be 0 points. The expected result (e) for player A against player B is calculated from:
eA = 1/ (1+ 10^((rB-rA)/B) )
and for player B against player A:
eB = 1/ (1+ 10^((rA-rB)/B) )
Where rA and rB are the rankings of players A and B before this battle, and B is some factor.
Then, the new ranking of the players is calculated from:
rA’ = rA + K*(sA-eA)
rB’ = rB + K*(sB-eB)
Where rA’ and rB’ are the updated rankings, sA and sB are the actual results, and eA and eB are the expected results. Also, the initial ranking value given to everyone at the start is equal to 1000.
To illustrate with an example, say there is a battle with 4 players that are ranked as follows:
- Markku with 2000
- Spef with 1200
- bene with 1000
- Zero with 500
Say they finish the battle as follows:
- 1. Zero
- 2. Markku
- 3. bene
- 4. Spef
In this scenario, Zero was strongly expected to lose, but he won, so should be rewarded heavily. Markku was strongly expeceted to win with all others, but was 2nd, so he will be penalised slightly. Spef and bene both did rather poorly, both beaten by a really bad player (Zero), so their ranking should go down (Spef’s even more, because he has 0 wins). The actual point gains and losses using the ELO system with default factor values (K = 1, B = 200) is as follows:
- Markku loses ~1.000 points
- Spef loses 1.909 points
- bene loses 0.088 points
- Zero gains 2.997 points
The K factor defines how much points can be gained/lost from a single battle. The B factor defines the spread of expected result. For example, with B=200, a difference in ranking of 200 means that the better player has a 90.9% chance to win, a difference in ranking of 400 means a 92.5% chance to win, and a difference in ranking of 100 means a 76.0% chance to win. If we increase B, the same skill difference will result in bigger ranking point differences, and if we decrease B, we’ll get smaller ranking point differences.
If any of this is too hard to understand, I recommend you check out the Wikipedia page. Meanwhile, I hear you shout “renaults or riot!!!”. One word: oke.
I varied the factors K and B for testing, and I also implemented a filter for battles with <5 players, and a low effort battle filter. The low effort filter checks which players played less than 1 minute andscored negative points, then, removes them from that battle, and recalculates the scores for that battle.
Here are plots of the ranking for chosen 6 players (click on image for full size):
K = 1, B = 200
K = 1, B = 400 (default)
K = 1, B = 600
K = 1, B = 800
K = 1, B = 1000
K = 4, B = 400
K = 1, B = 400 (default), 5 player rule included
K = 1, B = 400 (default), low effort filter included
And the top40 rankings for all the variations of B factor (I think I had it for the low effort filter and 5 player rule too, but can't find now):
==========================================================================================================
Now, if this post looks unfinished, that's because it is. I run out of effort for tonight but really wanted to get something out there, so there you go. Hopefully enough to start a discussion.
I honestly expect 95% of people to not even bother reading half of this post, but I really hope that some of you will. I'm looking especially at the mods, and at the most experienced and active people out there. But as I said in the beginning, anybody's opinion is welcome. Just please, don't waste my time with unnecessary spam.
And yeah, the post is to be continued. I'll try to post more results of teh ELO simulations (please say here if there are any particular results (e.g. plots for particular players) that you'd like to see). I'll also try to implement the other two systems and generate similar results. But even before that, it would be great to settle the issues that I bolded out earlier in this post.
Sorry if the writing is too hectic, it's late

