EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

pawq · Post by **pawq** » 16 Jun 2018, 01:56

Hi everyone.

Yes, I know you’ve seen about a dozen discussions about the broken EOL ranking over the years, and that none of them ever amounted to anything. Some good ideas were presented, some less good, but sadly it never lead to actual implementation. I believe one of the main causes was that none of those discussions (at least none of those that I’m aware of) were complemented by actual testing and comparison of the proposed ranking formulas.

==========================================================================================================

After Zero’s last topic about this ( https://mopolauta.moposite.com/viewtop ... f=4&t=9861, I decided to take the matter into my own hands. I’ve discussed this quite a lot on discord so some of you may be aware, but for the others, I wrote a framework that can compute the ranking from existing battle data. I tested it on a basic ELO formula, but I also played around with some parameters and added some modifications. The framework uses the results of the first 129,200 battles (all battles at the end of February 2018), and it computes the ranking after every single battle, for all of those battles and for all players. This enables us to readily implement many different formulas and their variations, and then compare the resultant rankings directly, eventually deciding which one we think is the most suitable.

I’m hoping to spark a constructive discussion here, in order to finally reach the desired outcome – the selection of an algorithm to implement. And a ranking will be implemented, likely on the new version of the elmaonline site that is currently under development by Kopaka.

==========================================================================================================

I’d like to start with stating a premise that we should use to define the ranking. Note that this is not final, but merely what I believe the ranking should represent, so let it be part of the discussion.
1. The ranking should indicate how likely, statistically, players are to beat each other.
2. The ranking should be as simple as possible, allowing players to understand it easily.
I.e. if player A is more likely to beat player B than the other way around, player A should be higher in the ranking, and the more likely he is to win, the bigger their difference in ranking should be. Simplicity is also crucial. The ranking is for us, and if we don’t understand how it’s computed, it’ll have no value for us.
Note that I didn’t use the word “skill”. I think we should have come to the conclusion by now that it is not possible to accurately determine the skill of a player. I think what matters in battles though, it the likelihood of defeating someone. Hence the premise above.

==========================================================================================================

Next, I’m going to present a summary of the past discussions about the ranking (list here).
- We need a battle ranking.
- Gaining points. A player should get more points for defeating a better player. This seems to be a consensus. Points for a battle could be the sum of points given/taken for beating/losing to all the other players in the battle.
- Losing points. Some said that a player should not be able to lose points by playing a battle, so as not to discourage players from participating. However, no good arguments for this were given, apart from “I don’t want to be punished for playing”. Moreover, this would lead to an ever-escalating ranking, as everybody’s points would only keep increasing. It would also mean that the ranking would favour the most active players. For these reasons, it’s very likely that players will lose points for poor battle performance.
- Ranking escalation. The formula should not allow the ranking to escalate indefinitely. This is a risk when new players enter the ranking at a default value. For example, if losing a battle means losing points, the poor players will have ranking below the default value (say, 1000). Then, a new player can enter at the default value (1000), but his actual skill is much lower (say, 500), so his ranking will tend towards that. But, if the average number of points per player remains constant (i.e. total points won = total points lost), then all the other players will have their ranking increase slightly. And this will happen every time a new player joins. A potential solution to this is provisional ranking, which uses the first few battles to approximate the skill of a player, and enters him at that level. However, for some formulas (e.g. ELO) this isn’t that much of an issue, as will become apparent from plots further down.
- Quantity of battles played. A lot of formulas (including the original EOL one) favour playing a lot of battles. E.g., if only a certain number of battles per week/month/year are included, those playing more battles than the threshold will benefit, as only their best battles will be selected. This is not desirable, as the number of battles played says nothing about skill.
- Battle size. Beating a good player in a battle with few players should not matter less than beating him in a battle with many players. Battle size should be somehow included in the ranking, but not by excluding battles with fewer players (which would happen if only a certain number of best battles are included).
- Using all battles. If only a certain number of battles are included, many battles won’t count for the most active players, which may be disheartening.
- Inactivity. Inactive players should not be able to climb up the ranking. Keeping their ranking constant means that a good player who briefly topped the ranking 10 years ago and hasn’t played since then could still be on top, which is undesirable. However, becoming inactive doesn’t mean losing skill, which leads to the dilemma, and also to the next issue:
- Old battles. One way to tackle inactive players staying on top of the ranking is to make old battles contribute less to the ranking. This could be done through a decay factor (an old battle’s contribution is multiplied by a factor that decreases with its age), or periodically (for example, the battles from the last 6 months are multiplied by 1, the ones from 6-12 months ago by 0.5, 12-18 months by 0.3, etc.). This means that taking pauses in the game would negatively contribute to the ranking, which is probably desirable. However, if points are lost by losing, players with ranking lower than initial would see their ranking increase after inactivity. This would only tend towards the initial value and only would only affect low-skill players though, so maybe isn’t that much of a problem.
- Responsiveness. The ranking should be responsive on a relatively short time scale, so that players can see their progress. However, players should not be able to gain/lose too many points in a short period of time. The ranking could be updated periodically as well (e.g. weekly)?
- Low effort battles. This point appeared quite frequently. Including all battles in the ranking could potentially discourage players from participating if they arrive late or know they could only play for a while. This could be tackled by, for example, excluding players that played a battle less than a certain amount of time (e.g. less than a minute). However, this could lead to the same problem, as players would potentially be encouraged to quit a battle early if they don’t do well or dislike the level. Also, an exception could be made for when a player does really well (e.g. gains points, or wins) despite a short playtime. However, in that case, players would be encouraged to practise in SL or in editor. Another idea is to give a player more points if he played little time, but that’s probably not desirable, as it overcomplicates things and possibly encourages players to quit a battle just after they made an ok time.
- Exclusions. For example, 0 apple results or certain battle types could be excluded from the ranking computation. Excluding 0 apple results is good because it excludes situations where a player esced in a megahard lev just to finish higher. But then if somebody does finish, he doesn’t get credit for beating all those who didn’t manage to finish and so had 0 apples. Battle starters could be excluded as well.
All of these issues need to be decided. Some leave little doubt, others still need discussion. When you reply, please say your opinions about this issues, but make sure to support those opinions with arguments. “I like this better” will not be taken into consideration, regardless of who it’s coming from. Reasonable arguments, however, will always be taken into consideration, again regardless of who they’re coming from.

==========================================================================================================

Now, let’s get down to actual ranking implementations. Below is a short list of the most reasonable ones mentioned in the past discussions.
- An ELO system is a fairly simple solution that is implemented widely in many disciplines. Although ELO is normally suited for 1v1 competitions, battles could be treated as a combination of several 1v1 battles. A player gains points for every player they beat, and loses points for every player they were beaten by. More points gained for beating a better player, and more points lost for losing to a weaker player.
- Mila’s solution for the belma ranking was similar to ELO, but instead of Ra’=Ra+k(-1/(10^((Rb-Ra)/400))) it’s Ra’=Ra*(1+k*exp(q*(Rb-Ra))) (and divide instead of multiply for players that defeated him).
- Another option is that every player donates a certain % of his ranking points into the battle pool, and at the end of the battle the players are awarded “points” – e.g. 5, 4, 3, 2, 1. The total amount of points is equal to 15, so the first player would get 5/15 of the pool, the second 4/15 of the pool, etc. This is similar to ELO in that a better player has to beat more players to maintain his ranking – he gives up more to the pool, so has to beat more players to get it all back.
I know these don’t necessarily explain the rankings very well, but it’s just a quick outline before I go into detail.

==========================================================================================================

ELO ranking system
This is the first (and so far only) system that I have implemented. I like it mainly because of its simplicity and popularity.
This implementation assumes 1v1 battles between all players, and works on the basic principle of expected result versus actual result. The actual result (s) is 1 for a win and 0 for a loss. So, in a battle of 10 players, the winner’s actual result would be 9 points (as he beated 9 players), and the last player’s results would be 0 points. The expected result (e) for player A against player B is calculated from:
eA = 1/ (1+ 10^((rB-rA)/B) )
and for player B against player A:
eB = 1/ (1+ 10^((rA-rB)/B) )
Where rA and rB are the rankings of players A and B before this battle, and B is some factor.
Then, the new ranking of the players is calculated from:
rA’ = rA + K*(sA-eA)
rB’ = rB + K*(sB-eB)
Where rA’ and rB’ are the updated rankings, sA and sB are the actual results, and eA and eB are the expected results. Also, the initial ranking value given to everyone at the start is equal to 1000.
To illustrate with an example, say there is a battle with 4 players that are ranked as follows:
- Markku with 2000
- Spef with 1200
- bene with 1000
- Zero with 500
Say they finish the battle as follows:
- 1. Zero
- 2. Markku
- 3. bene
- 4. Spef
In this scenario, Zero was strongly expected to lose, but he won, so should be rewarded heavily. Markku was strongly expeceted to win with all others, but was 2nd, so he will be penalised slightly. Spef and bene both did rather poorly, both beaten by a really bad player (Zero), so their ranking should go down (Spef’s even more, because he has 0 wins). The actual point gains and losses using the ELO system with default factor values (K = 1, B = 200) is as follows:
- Markku loses ~1.000 points
- Spef loses 1.909 points
- bene loses 0.088 points
- Zero gains 2.997 points
The K factor defines how much points can be gained/lost from a single battle. The B factor defines the spread of expected result. For example, with B=200, a difference in ranking of 200 means that the better player has a 90.9% chance to win, a difference in ranking of 400 means a 92.5% chance to win, and a difference in ranking of 100 means a 76.0% chance to win. If we increase B, the same skill difference will result in bigger ranking point differences, and if we decrease B, we’ll get smaller ranking point differences.
If any of this is too hard to understand, I recommend you check out the Wikipedia page. Meanwhile, I hear you shout “renaults or riot!!!”. One word: oke.

I varied the factors K and B for testing, and I also implemented a filter for battles with <5 players, and a low effort battle filter. The low effort filter checks which players played less than 1 minute andscored negative points, then, removes them from that battle, and recalculates the scores for that battle.

Here are plots of the ranking for chosen 6 players (click on image for full size):
K = 1, B = 200

K = 1, B = 400 (default)

K = 1, B = 600

K = 1, B = 800

K = 1, B = 1000

K = 4, B = 400

K = 1, B = 400 (default), 5 player rule included

K = 1, B = 400 (default), low effort filter included

And the top40 rankings for all the variations of B factor (I think I had it for the low effort filter and 5 player rule too, but can't find now):

==========================================================================================================

Now, if this post looks unfinished, that's because it is. I run out of effort for tonight but really wanted to get something out there, so there you go. Hopefully enough to start a discussion.

I honestly expect 95% of people to not even bother reading half of this post, but I really hope that some of you will. I'm looking especially at the mods, and at the most experienced and active people out there. But as I said in the beginning, anybody's opinion is welcome. Just please, don't waste my time with unnecessary spam.

And yeah, the post is to be continued. I'll try to post more results of teh ELO simulations (please say here if there are any particular results (e.g. plots for particular players) that you'd like to see). I'll also try to implement the other two systems and generate similar results. But even before that, it would be great to settle the issues that I bolded out earlier in this post.

Sorry if the writing is too hectic, it's late

Let me know if anything is unclear or badly explained/worded! Hope you enjoyed the read

AndrY · Post by **AndrY** » 16 Jun 2018, 08:28

Very good that you do smth about it!
imo not ever need to reprogramming all ranking, maybe at least to fix some errors (if man have more played balles in a day, than he played for example).

pawq wrote: ↑16 Jun 2018, 01:56 2. The ranking should be as simple as possible

pawq wrote: ↑16 Jun 2018, 01:56 eA = 1/ (1+ 10^((rB-rA)/B) )
and for player B against player A:
eB = 1/ (1+ 10^((rA-rB)/B) )
Where rA and rB are the rankings of players A and B before this battle, and B is some factor.
Then, the new ranking of the players is calculated from:
rA’ = rA + K*(sA-eA)
rB’ = rB + K*(sB-eB)

ohhh

Grace · Post by **Grace** » 16 Jun 2018, 08:57

I don't have a lot of specific knowledge on advanced ranking methodologies as I've always just coded my own simple formulae.

That said, I have some comments which might provide some clarity to a couple of the components that are still up in the air.

Losing points.

Regarding losing points, a lot of the concerns with this are not present in practice. Yes, it seems like it could be horror solution to have talli or zero lose 50 points if they have a poor battle performance against tej and some other newer players, but in the end, this contributes to removing an endlessly inflating ranking score. Other systems (such as ELO) use this idea with no issue and have for many years. It's normal if you have a high ranking to have to perform well to keep it.

Quantity of battles played.

This is a really interesting question to compare with a standard ELO solution. It's purely arbitrary where you draw the line, but having the ranking algorithm only consider - lets say your 100 best performances in the past 6 months - eliminates the issues of low effort battles, helps combat the issue with inactive players/old battles. Further, it doesn't tend towards promoting a "not gonna play this battle" attitude, because if the algorithm does assess every battle but only awards points if you performed well, there's no reason not to play. Perhaps importantly, a ranking system like this really rewards activity in bigger events such as battle cups for stronger players, in which there are lots of participants to beat and provide some really strong ranking points.

- Using all battles. If only a certain number of battles are included, many battles won’t count for the most active players, which may be disheartening.

Keep in mind that the above paragraph is using a really simple example of "100 best performances in the past six months". Obviously, you can make such a system significantly more complex by including more components and providing more weight to some components:

Heavily weighted 100 best battle performances in past six months
Medium weighted 200 next best performances in past year
Low weighted 200 next best performances in past year

I'm not sure if a solution like this is even desirable to most people, but it does solve a lot of the issues to do with mans being unsure to participate in battles with low time left or shit levels etc.

Decay

In my opinion, it's super important to implement some sort of decay system in a ranking system such as this. If we backwards calculate ranking, GRob will be ranked very highly, for example, but he hasn't played a battle in years (I guess) and so it's hard to consider him one of the best battlers in 2018. Most good ranking systems have some form of rating decay over time, even if it's minor.

danitah · Post by **danitah** » 16 Jun 2018, 15:55

I don't have too much to add, seems very well thought out and I agree with most of it, so I don't have too much to add, great work!

pawq wrote: ↑16 Jun 2018, 01:56 - Battle size. Beating a good player in a battle with few players should not matter less than beating him in a battle with many players. Battle size should be somehow included in the ranking, but not by excluding battles with fewer players (which would happen if only a certain number of best battles are included).

Yeah, would be nice to get rid of the 5 player limit. Would make night battles more interesting. Feels kinda silly when there are 4 players and the leader asks for one more to join for example.

pawq wrote: ↑16 Jun 2018, 01:56 - Exclusions. For example, 0 apple results or certain battle types could be excluded from the ranking computation. Excluding 0 apple results is good because it excludes situations where a player esced in a megahard lev just to finish higher. But then if somebody does finish, he doesn’t get credit for beating all those who didn’t manage to finish and so had 0 apples. Battle starters could be excluded as well.

Both of these cases should be avoided by the levelmakers. If you make a very hard lev, put enough apples so that people can compete without finishing the lev. Also we have the same problem if first apple is really easy and the rest is really hard, so we don't really solve this problem by excluding 0 apple times. And if you put allow starter, you should have a valid reason, or you would be breaking the rules, and I don't think we should protect against rule breaking in the ranking system. Like if you put that exclusion there it's kinda like saying it's ok to play your own battles because it won't affect rating.

One idea would be to look at the length of runs (i assume that data is available), and sort standings based on that. For example:
Player A enters lev at 14:20:00 - gets 1 apple, but plays for 5 minutes before he dies.
Player B enters lev at 14:21:00 - gets 1 apple in 5 seconds, and instantly escs.
Player B would be the leader with 1 apple after he escs, but after player A dies he will take lead from player B in standings.
Simply put: Sort standings/results by time entered lev instead of time exited lev.
It wouldn't completely solve this issue, for example in a FF everyone starts at the same time and this would be determined by ping I guess. But it should be clearly better overall imo, and it completely gets rid of escing to win.

Spef · Post by **Spef** » 16 Jun 2018, 16:14

- Gaining/losing points.
Yes, losing points is important. You will quickly stabilize around some rating and will maintain it, with little bumps up and down.

- Ranking escalation.
I like having a provisional/some other special ranking for first few battles for new players. I don't understand inflation much though

- Quantity of battles played/- Using all battles.
Every ranked battle should count. Only way you should be at a disadvantage by playing few battles is through decay.

- Battle size
"Battle size should be somehow included in the ranking, but not by excluding battles with fewer players"
Does this mean if the average rating is the same in a battle with 10 and 5 players, winning the one with 10 players gives you more points (and losing it makes you lose more points)? I guess it would make sense, there's more... luck? in a large battle. More chances for someone to make a surprise performance and beat players above. So it's harder to win those.
For example 1st place: +5, 10th place: -5 against 1st place: +2, 5th place: -2. This is without considering rating before battle of course.
It might be a good idea to have a limit for how small a battle can be to count for ranking, I think there has been one at 5 players in the past.

- Exclusions (battle results)/- Low effort battles.
I would put the responsibility on the players here, to check the level with f1+enter or editor and make a decision to battle or not. If you enter a battle, you are participating in the battle and it will count for ranking. No safety net around it allowing you to play without risk, this will always be exploitable. If you care about rankings enough not to play under a disadvantage such as playing for less time than others, that's your choice.
- Exclusions (battle type)
Normal battles should have their own ranking. Special battles could each have their own also, atleast FF(+one-life? cos it's similar) and apple battles as they are very popular. The other special battle types could be in a group of one or a couple smaller ones (for example crippled, speed+slowness+survivor, the rest), or not ranked at all.
A lot of this is personal preference, I always looked at normal battles as the most "serious" and play other types more for fun.

- Inactivity/old battles.
"Keeping their ranking constant means that a good player who briefly topped the ranking 10 years ago and hasn’t played since then could still be on top, which is undesirable. However, becoming inactive doesn’t mean losing skill, which leads to the dilemma"
Luckily, rating isn't an accurate representation of skill, so messing with their rating for inactivity is fine. Otherwise, the scenario described in the quote could be a nightmare. I'm all for decay. It would also make it so a high ranked inactive player loses points, but once they return, their first few battles count for their rating more and they will have a chance to climb back fast to where they were before taking a break. I don't have a solution for low ranked inactive players gaining points by not playing, let me know if that helps.

- Responsiveness.
"However, players should not be able to gain/lose too many points in a short period of time."
Why not? If I'm #1 ranked and start intentionally going for last place in every battle, I should lose a ton of points for that (though this kind of behaviour should not be allowed). A live rating would be nice instead of a scheduled update, but whatever is most manageable to implement.

zebra · Post by **zebra** » 17 Jun 2018, 11:01

Hello! Good to have discussion about ranking systems

My opinions here:

I'm strongly against losing points when battling. That doesn't make any sense. Do we really want to drive players away? If I knew that I might lose points by playing battles, I would only pick the best battles, not play any "noob designer"'s battles. And what about distractions? My wife comes to say something when I'm playing a battle and I have to leave the battle. Should I be punished of that?

In Trackmania there was a good ranking system. Some main points:
- everybody start with 0 points
- you got points only by beating a better player (or equal ranked player)
- at every Sunday, all players' points were scaled down so that the best player had 100000 points.
Simple and good system

ofta · Post by **ofta** » 17 Jun 2018, 13:14

i agree a bit with everyone here. every battle is not and should not be considered ’competitive’. i don’t think anyone wants a system that encourage players to dodge certain battle types. i think the best system would be to make a ’ranked battle’ type with more serious levs, and also some kind of ’social battles’ that are un-ranked, and can be experimental, low-effort, troll levs or whatever. in ranked ballers i think you should be able to lose points, just to give the battle some risk and weight.

i can also see some problems with ’ranking’. first of all, how do we determine ranking? do we go with existing data? since i joined sol in 2013, no-one really every cared about battle ranking on a macro level. only on micro level (you only care about the very battle you are playing atm). it’s a bit like in FIFA ranking in soccer, they take data from friendly exhibition games and create a world ranking from that, which can make very weak teams come up high on ranking, while some high-regarded teams are way down in ranking.

so i might seem like the most fair system is to create a clean-sheet buy resetting all previous data in order to create a new accurate ranking system, but that’s not really fair either, because if you do some kind of qualification contest to determine initiate ranking, you will in that process encourage players to play worse then you actually can. because the worse you perform, the more you will eventually win and the less do you risk to lose.

i don’t think there’s any good solution to all of this, other then to skip the ranking all together and create some kind of weekly/monthly or whatever seasons and that you only measure your performances in relation to other players during a very specific period of time.

FinMan · Post by **FinMan** » 17 Jun 2018, 13:36

Not losing ranking in a lost battle sounds really ridiculous. First of all, not being able to lose points in any circumstances means the more you play, the better your ranking will be which is not competitive at all. Being one of the best in the game would feel pointless if you could just randomly enter battles, be shit in that particular level and not worry about anything.

I don't feel like you should worry about going into a battle even if you might lose a point or two. In the long run it will even out (as spef pointed out). Also, if you play a bit more casually, no one will blame you for not trying to buff the virtual-penisque number that is ranking.

I agree with having the unranked battles being a possibility though, it wouldn't hurt.

I agree with spefs post. Inactivity should be punished at the top, shouldn't be punished below the top, this way the hardcore mans could keep on fighting and not camp a number to look great, at the same time more casual and not so hc-oriented masn could enjoy the battles whenever they want etc.

ps. can not focus on reading everything carefully, hope i understood stuf and if i repeat mans, sry for that, ask me if need be

danitah · Post by **danitah** » 17 Jun 2018, 14:59

zebra wrote: ↑17 Jun 2018, 11:01 Hello! Good to have discussion about ranking systems My opinions here:

I'm strongly against losing points when battling. That doesn't make any sense. Do we really want to drive players away? If I knew that I might lose points by playing battles, I would only pick the best battles, not play any "noob designer"'s battles. And what about distractions? My wife comes to say something when I'm playing a battle and I have to leave the battle. Should I be punished of that?

In Trackmania there was a good ranking system. Some main points:
- everybody start with 0 points
- you got points only by beating a better player (or equal ranked player)
- at every Sunday, all players' points were scaled down so that the best player had 100000 points.
Simple and good system

The biggest flaw of this kind of system is that it doesn't incentivize good players to beat bad players. Let's say zero is playing against only new players, there is nothing (or close to nothing) to gain and nothing to lose.

Kopaka · Post by **Kopaka** » 17 Jun 2018, 22:15

ofta wrote: ↑17 Jun 2018, 13:14i can also see some problems with ’ranking’. first of all, how do we determine ranking? do we go with existing data? since i joined sol in 2013, no-one really every cared about battle ranking on a macro level. only on micro level (you only care about the very battle you are playing atm). it’s a bit like in FIFA ranking in soccer, they take data from friendly exhibition games and create a world ranking from that, which can make very weak teams come up high on ranking, while some high-regarded teams are way down in ranking.

If your ranking is low at the time the ranking is introduced because of noy caring about ranking in past battles, you will be able to rectify this quickly by starting to play for the ranking seriously, so it's not a big issue. That being said, something many games do is seasons as you also eluded to. Imo should at least have a yearly one as well as the over all one.

Stini · Post by **Stini** » 13 Feb 2020, 17:36

Nice to finally have a ranking in place! Great job pawq and Kopaka!

I very much appreciate the work that has been done to make this possible and I think this is a really cool feature. I've been browsing the ranking lists a bit and I think I have some suggestions that could make the ranking even better.

Please do not consider this as harsh criticism of any kind. I think the current ranking system is already really valuable as it is. I'm just trying to make a case for adjusting the current system a bit and pointing out some cases where the current ranking system has some shortcomings.

The ELO algorithm is based on adjusting the ratings depending on whether or not a player got a better or worse score than expected. This means that it is crucial to be able to predict the expected score of a game accurately, because otherwise it is not possible to really tell how much better or worse a player performed than expected, which also leads to poor rating adjustments. Therefore the main criterion for selecting parameters (such as K-factor or what kind of probability distribution to use) should be based on how well the expected scores can be predicted ("predictive accuracy"). This is standard practice to evaluate any kind of predictive model really.

If the rating of a player does not match their true skill, the score predictions will be inaccurate. Since the ELO system is self-correcting, the rating will eventually converge to the true skill of the player with any sensible parameters. However, the rate of which this happens depends greatly on the K-factor and the faster the convergence happens, the more accurately the ratings match the true skills of the players at all times, which consequently also improves predictive accuracy.

If the K-factor is too small, it will take a long time before a player's rating matches their true skill. For example, considering a hypothetical case where K=1 (the current value used by the ranking list) and the player's current rating is 1450 and their true skill is 1550. For simplicity, let's assume that the player plays battles, which always have ten other players rated 1500. If the player plays every battle at his true level of 1550, it would take about 218 battles before the player's rating matches their true skill, which could easily take months. Meanwhile everyone else is punished by unfair rating losses, because they are playing against an underrated opponent and their expected scores are higher than they should be.

Currently all the players have an initial ELO of 1000, which makes the situation even more challenging. Assuming the same scenario as above, but the player starts with 1000 ELO and his actual skill is 1900, it would take about 858 battles for the player to reach 1900 ELO. Certainly this is an unusual scenario, but if we look at for example awsj in the rating table, he has played 663 battles and he has reached a nice rating of 1860. However, as the previous experiment shows, it's not really certain whether this rating yet even matches awsj's true, amazingly spectacular skills.

For comparison, with K=10 it would take under 65 battles to get from 1000 to 1900 in this scenario. With K=32, it would take about 16 battles. Also, to get from 1450 to 1550, it would take about 15 battles with K=10 and 4 battles with K=32 in the first example.

Of course, a too large K-factor is not ideal either, since it makes the ratings too volatile and dependent on the most recent battles, which will affect the predictive accuracy negatively. The most principled way to select the K-factor is to pick the value that has the best predictive accuracy. Essentially this would mean predicting results of the actual battles with different K-factors and pick the value that gives the most accurate predictions. If this analysis seems like too much work, I would guess something like K=10 would be quite reasonable even if it's not absolutely optimal. The typical K-factor range in chess is 10-40 and it has been shown that 24 is probably optimal, but the other values are still being used and they seem "good enough". I think it's dangerous to extrapolate directly from chess to elma though, since there are quite major differences, such as the 1 vs 1 "games" are not independent of each other. Also a single battle is analogous to a whole chess tournament in terms of how the ratings are calculated, and I'd imagine it's much more likely that an average player plays a battle and ends up last than an average chess player losing every single game in a chess tournament. Therefore I'd prefer a bit lower K-factors than in chess, if I had to pick one without proper data analysis.

The other suggestion I have is to change how the yearly/monthly/daily ratings are calculated. In theory, the rating differences in these tables should converge to the rating differences in the overall ELO table given enough battles. This is because it's the same population of players, ELO uses rating differences as predictors of the expected scores and ELO is also self-correcting, so the rating differences between players will converge to such values that the expected scores match the data.

I would imagine that a year is long enough period of time that the rating differences of the overall and yearly tables would be quite close to each other. However, if you look at for example the top-2 of 2019, the rating difference between Zero and awsj is 5 points. In the overall stats it's over 300 points. This strongly indicates that the yearly table has not converged, which is not surprising given the low K-factor and that everyone starts with the initial rating of 1000. If you look at the yearly/monthly/daily results, it's quite apparent that they mainly show which pro players have played the most battles, because none of their ratings has converged so they are constantly underrated and therefore they gain points from almost every battle. Since awsj has been playing more battles in 2019 than Zero, he has been gaining more rating points and therefore the rating difference is just 5 points.

With a larger K-factor a convergence would be faster. However, for the daily and weekly tables it's still possible that the convergence is too slow and these tables are biased towards the pros who play the most battles. For the yearly table, it will quite likely be almost identical to overall table up to that year (in terms of rating differences, not the absolute ELO values). Higher K-factor also emphasizes the latest battles more, so some battles a year ago will have essentially no effect on the overall ELOs today.

Perhaps the simplest solution for these issues would be to simply take a snapshot of the overall rankings, which is also equivalent of using the players' current ELOs instead of resetting them to 1000. The obvious downside with this is that the current year/month/week/day is redundant, because it would be identical to the overall ranking. However, for historical rankings this seems quite sensible.

Another option would be that for daily/weekly/monthly one could use performance ratings instead. Again, this is a common practice in chess, where you often see comments like "Carlsen had 3000 performance in tournament X". Essentially what this means is that if Carlsen would have had 3000 rating, then his results would match his expected score so he would not lose or gain any rating points. In other words, his results matches what you'd expect a 3000 rated player to get.

The yearly table is a bit more problematic with this approach, because it's unlikely your yearly performance is any different than your rating, because good and bad performances will average out. This might be true for the monthly and maybe for weekly list as well, at least if you play a lot. If anything, yearly performance scores would give more weight to the oldest battles than the overall ratings. So maybe the yearly table is redundant in this case.

TL;DR: EOL ranking is awesome! Let's make it even better by increasing the K-factor and change the yearly/monthly/weekly/daily lists so that they are not biased towards pros who play the most.

iCS · Post by **iCS** » 13 Feb 2020, 19:35

Nice ranking system

Shouldn't we set up some minimum battle experience requirement though?? There are tons of players with 1 played battles, seems pointless to show these.

Kopaka · Post by **Kopaka** » 14 Feb 2020, 16:41

Thanks for the write up stini, always nice with thoughtful feedback.

I don't disagree with what you're saying necessarily, but I think there's a couple extra factors to consider. To me the most important part of having the ranking is to make it more fun to play battles and thus encourage people to play more, especially for people who don't win a lot of battle and can't compete in number of wins. This means if you converge to your correct ranking quickly there will be less reason to play a lot to keep rising in rankings. Of course if you're rising too slow it may also discourage people from playing.

Now of couse the the reason you'd want the correct ranking quicker may not be for showing it but merely for being able to calculate rating changes more correctly, which would be nice but I don't think we can accomplish both things perfectly at the same time.

When it comes to period rankings, if we do what you suggest we'd also lose the encouragement factor. Someone low in overall rankings couldn't win daily even if he won every single battle that day. And even if we only add those who were active that day, current number one in overall could just play one battle, even doing badly, every day and win daily every day. Actually snapshot ranking you talk about is (sort of) being saved after every battle, but just for those who played the battle. You can see this in the battle results.

So in summary, I belive overall should be something you can work on over a long period of time, that encourage you to keep playing. While period rankings is something were a talented newcomer can get some top positions while he's rising in overall, but not possible to win by being a super active bad player. It might make sense though to have a higher K factor for the period rankings.

jonsykkel · Post by **jonsykkel** » 14 Feb 2020, 18:13

the thing that separates elma from games like cookie cliker is that its a purely skil based game
if sum nab lets a pro pley on his gomputer, pro wil play sick good times imediately
if pro lets nab pley on his gomputer, the nab will still suk
the only difrence is in brian/finkers, there are no meaningles numbers u need to acumulate over time (like in most other games)
if the db got deleted u didnt just lose al ur elma progress
in cokie cliker etc the reward is similar to what kopa is describing here, the more u pley the biger number will be (another thing is that it dosent make any sense to use a elo formula for that kind of number)
in elma the reward is knowing that if dided a wr or won a batle etc - u did so because u used ur hard earned elma skil to win and thats why u are best
there4, it makes sense that the rating should just be a somwhat acurate reflection of ur skill and nothing else

Stini · Post by **Stini** » 14 Feb 2020, 18:39

Kopaka wrote: ↑14 Feb 2020, 16:41 Thanks for the write up stini, always nice with thoughtful feedback.

I don't disagree with what you're saying necessarily, but I think there's a couple extra factors to consider. To me the most important part of having the ranking is to make it more fun to play battles and thus encourage people to play more, especially for people who don't win a lot of battle and can't compete in number of wins. This means if you converge to your correct ranking quickly there will be less reason to play a lot to keep rising in rankings. Of course if you're rising too slow it may also discourage people from playing.

Yes I agree that the ranking system should encourage playing. I think the discussion is a bit hypothetical right now, since there's no real feedback from the players, especially from new improving players and what they actually find motivating. However, if I reflect on my experience of ELO in chess, I have never heard anyone arguing for lower K-factors so that it would encourage players to play more. I'd rather say that improving players want to see quick progress and they would find it unfair if their rating doesn't match their actual skills. Also, people are definitely annoyed, if the ratings of their opponents do not match their actual skills, because then you end up playing against underrated players and lose rating points unfairly. And that's what it really is, the rating system will be unfair in many ways if it's not sufficiently accurate.

Also in my experience chess players do not usually get unmotivated when their rating plateaus. It is still motivating to get better and improve your rating. You still want to beat that higher rated player in your next game. You will plateau at some point anyways and I don't think people would just stop playing at that point. I see it a bigger risk that even a mere 100 point rating improvement might require months of playing (or years if you play as little as I do) as I demonstrated in my first post. You keep beating the same players over and over again, but still your rating is still worse than theirs. I find it hard to see how this can be motivating. It might even seem that the system is designed to be biased so that it's hard to challenge the established old school pros. I'd rather say it's much more motivating to get immediate feedback that you are improving.

Kopaka wrote: ↑14 Feb 2020, 16:41 When it comes to period rankings, if we do what you suggest we'd also lose the encouragement factor. Someone low in overall rankings couldn't win daily even if he won every single battle that day. And even if we only add those who were active that day, current number one in overall could just play one battle, even doing badly, every day and win daily every day. Actually snapshot ranking you talk about is (sort of) being saved after every battle, but just for those who played the battle. You can see this in the battle results.

Yes I can understand your reasoning here. You could use the performance ratings as I suggested for example, those only take into account your performance and not your rating. Also one way to improve the current implementation would be also to increase the K-factor as you pointed out. However, this would still mean that these tables would be rather unfair, because it's obviously wrong that initially a pro like Markku has the same rating as a nab like me. If someone beats Markku (rating 1000), it's considered to be as good of an achievement as beating me (rating 1000). This incentives the players to skip battles if there are any pros playing, because they are so severely underrated and you will get punished with excessive rating losses. I find it hard to believe that this kind of an unfair system would be exactly motivating, but who knows. Perhaps in practice people don't mind so much about such issues.

Kopaka wrote: ↑14 Feb 2020, 16:41 So in summary, I belive overall should be something you can work on over a long period of time, that encourage you to keep playing. While period rankings is something were a talented newcomer can get some top positions while he's rising in overall, but not possible to win by being a super active bad player. It might make sense though to have a higher K factor for the period rankings.

I hope I was able to clarify why I think that this kind of an implementation is inherently unfair. I understand that for many people grinding your way up sounds like an appealing idea, but it doesn't really fit ELO system very well. This is not only about being mathematically principled, but it has the side-effect of making the rating system unfair and I'm worried that it could be a bigger problem and discouragement in the end.

zebra · Post by **zebra** » 15 Feb 2020, 22:12

danitah wrote: ↑17 Jun 2018, 14:59
zebra wrote: ↑17 Jun 2018, 11:01 Hello! Good to have discussion about ranking systems My opinions here:

I'm strongly against losing points when battling. That doesn't make any sense. Do we really want to drive players away? If I knew that I might lose points by playing battles, I would only pick the best battles, not play any "noob designer"'s battles. And what about distractions? My wife comes to say something when I'm playing a battle and I have to leave the battle. Should I be punished of that?

In Trackmania there was a good ranking system. Some main points:
- everybody start with 0 points
- you got points only by beating a better player (or equal ranked player)
- at every Sunday, all players' points were scaled down so that the best player had 100000 points.
Simple and good system
The biggest flaw of this kind of system is that it doesn't incentivize good players to beat bad players. Let's say zero is playing against only new players, there is nothing (or close to nothing) to gain and nothing to lose.

Ok. That flaw could be fixed with some small constant number of points that is always given to you if you beat a player (whether he has better or worse ranking than you).
I'm still strongly against losing points in a battle...

Hosp · Post by **Hosp** » 16 Feb 2020, 03:30

i hev to disagree with Mr Jon a bit here, if I hev to play elma, and iti s not on mine computer. It will be bad. But I can still do some shit, not like supe-nab, but like, still very bad compared to what I'm capable of in mine norm computer settink.

jonsykkel · Post by **jonsykkel** » 16 Feb 2020, 11:39

Hosp wrote: ↑16 Feb 2020, 03:30 i hev to disagree with Mr Jon a bit here, if I hev to play elma, and iti s not on mine computer. It will be bad. But I can still do some shit, not like supe-nab, but like, still very bad compared to what I'm capable of in mine norm computer settink.

u are mising my point
pro = played elma for 10 years
nab = played elma for 2 weeks
yes u will play 30% worse on diff pc (temporarily, u will get used to it and reach ur full elma potential after geting used to it) but nab still gona get crushed
the point was that if nekit forgets his elma password he dosent lose uphil level 99 and cant uphil anymore
if zveq make a new elma acc his rating should quickly reach the same rating as old acc

FinMan · Post by **FinMan** » 16 Feb 2020, 15:49

I agree with jon and Stini

I don't think artificial feeling of "gaining rank" is a good idea in any game either, which was mentioned in some discussion about the topic. The idea is good, have a player feel good about themselves when they gain ranks etc. But if you know it's still calibrating your rating after 500 battles, you really are just impatiently waiting for the rating to settle where it belongs and it hits you way harder when it finally stops doing that. Besides, one of the most important aspects of any competitive activity is the balance between winning and losing. Of course winning feels good, gaining ratings feels good etc. But it feels even better to gain those goddamn points back after you lost them, making you question your skills and actually care for every single battle. Which is one of the biggest reasons why losing ranking is really important when you lose.

Some games tried it, placing players lower than they are supposed to be at the start of every season, with the idea of seeing players be happy to be gaining ranks as they play. What ended up happening was people were only upset about placing lower than the previous season ranking and not realizing they always quickly climbed back up. Of course that is a bit different from elma so far, as we don't have any kind of set seasons (yet?) but comparable.

What is a good idea, then, if not artificial gaining of ranks? Well, actually getting better and seeing how your ranking went up past week or two, instead of wondering if it is still just the slow calibration process.

Realistically, the ranking of something like eol is not and won't be something where everyone tried their absolute best in every battle. So we can safely say it's not a 100% accurate representation of a players battle skills. Rankings fluctuate, people have bad weeks, bad months, they are on fire on some others. We should let the ranking system show it. It's up for yourself to decide how seriously you take it.

Grace · Post by **Grace** » 17 Feb 2020, 11:44

Grace onion:

K factor @ current (1) is a little limiting and could probably be raised some small amount. 2.5 probably a decent middle ground.

I also think you guys have done a simply amazing job on this and it makes me super happy to see such a well adjusted ranking. Thanks so much for your efforts.

8-ball · Post by **8-ball** » 17 Feb 2020, 16:03

Agree with stini, jon and finman. For those who find an arbitrary number going always up important to keep playing, there can be an XP counter displayed alongside ELO, which never goes down and could be as simple as +n every battle where n is 1 + the number of players you beat, regardless of their skill. This will place high activity medium skill players above low activity high skill players as some people seem to think should happen. Both get what they want.

zebra · Post by **zebra** » 17 Feb 2020, 20:45

If everybody played all battles seriously from start to end, it would make more sense to make ranking system where you could lose points. But in a game like elma, battling should be casual. You should be able to enter a battle with only 2 minutes left and enjoy the battle without the feeling that you are going to lose many points by getting some crap time or by just taking the first apple in a 5-minute-long pipe.

jonsykkel · Post by **jonsykkel** » 17 Feb 2020, 22:29

zebra wrote: ↑17 Feb 2020, 20:45 If everybody played all battles seriously from start to end, it would make more sense to make ranking system where you could lose points. But in a game like elma, battling should be casual. You should be able to enter a battle with only 2 minutes left and enjoy the battle without the feeling that you are going to lose many points by getting some crap time or by just taking the first apple in a 5-minute-long pipe.

im dont understand what the point of having a batle ranking system for people who dont care about batling is

what makes sense is to have a list of al kuskis ordered by sum intresting variable, in this case: aproximated playing skill
you can look at 2 players ratings and determine the chances of one beating the other in balle

if geting bad results in a batle dosent negativly afect ur rating, this makes the ordering of the list be based on sum weird combination of playing skill and how many batles you have played
this kind of thing will favor medium skiled players who play a lot of batles (blaztek) over very skilled players who dont pley very much (sveq)

if blaztek plays 10000 1v1 batles against sveq, he will definitely win less than 50% of them, meaning he should end up below sveq in the ranking
and if the system works this relationship should be true for al combinations of players (in a ideal world)

as finmanik says its up 2 u how seriously u take it, if u dont take batles seriously (which u are stil free 2 do) u shouldnt take ur batling rank seriously either
its not posible to create a system that distignuishes between a bad player and someone who isnt trying

Lousku · Post by **Lousku** » 18 Feb 2020, 02:39

You don't hev to care about your rank.

zebra · Post by **zebra** » 18 Feb 2020, 07:44

jonsykkel wrote: ↑17 Feb 2020, 22:29 if geting bad results in a batle dosent negativly afect ur rating, this makes the ordering of the list be based on sum weird combination of playing skill and how many batles you have played
this kind of thing will favor medium skiled players who play a lot of batles (blaztek) over very skilled players who dont pley very much (sveq)

That's exactly the kind of ranking I would like to have. I would like that the ranking system encourages people to play battles and lure more people to play elma.

If you want to make a ranking system that scares people away (after they have achieved a good ranking compared to their skill level), then so be it. I understand that it has some good aspects too.

Zweq · Post by **Zweq** » 18 Feb 2020, 10:36

in other games i've always liked much more non-increasing ranking system, or however to call it. E.g. back in the WoW days in around 2007 when i played arena and saw a team with 2250 ranking i could instantly get a feel how good they were. If the number was some random ever-increasing value that was based on activity, I think WoW arena would hav been less interesting. Same applies to chess.

ArZeNiK · Post by **ArZeNiK** » 18 Feb 2020, 22:09

idea: what if you could choose if you wished to appear on the ranking leaderboards or not?
the system in itself should be objectively decent but personally it just makes me want to stop playing battles overall because i'm disgusted by my own performance

Grace · Post by **Grace** » 2 Mar 2020, 08:36

I would like to make suggestion:

Currently exists:
EOL Battle ranking
Internal Kinglist
External Packs Kinglist

suggest:

find some formula to make kinglist elo's and make:

Battle Ranking + External Packs Kinglist Ranking + Internal Kinglist ranking = Kuski ranking

could be fun

kuchitsu · Post by **kuchitsu** » 3 Mar 2020, 07:41

ArZeNiK wrote: ↑18 Feb 2020, 22:09 idea: what if you could choose if you wished to appear on the ranking leaderboards or not?

Just to clarify: do you want your score hidden (but still affecting other people's rankings when they play against you) or completely nonexistent?

ArZeNiK · Post by **ArZeNiK** » 3 Mar 2020, 14:05

kuchitsu wrote: ↑3 Mar 2020, 07:41
ArZeNiK wrote: ↑18 Feb 2020, 22:09 idea: what if you could choose if you wished to appear on the ranking leaderboards or not?
Just to clarify: do you want your score hidden (but still affecting other people's rankings when they play against you) or completely nonexistent?

the one which is easier to implement

iltsu · Post by **iltsu** » 3 Mar 2020, 15:51

How are people so sensitive about their rankingstats? Facepalm

ArZeNiK · Post by **ArZeNiK** » 3 Mar 2020, 19:31

iltsu wrote: ↑3 Mar 2020, 15:51 How are people so sensitive about their rankingstats? Facepalm

im not neurotypical

Bjenn · Post by **Bjenn** » 11 May 2020, 16:03

I liked HoNs ranking system.
You start with 1500 MMR. Nabs will get lower MMR like 1300-1400. Mega nabs will get 1100-1200.
Norm good is 1600+, better is 1700+ (where I was most of the time).
Very hard to reach 1800 (I did it once, next game lost and back to 1700+

)

This was exciting because for me getting to 1800 was a real hard target which motivated me.
Also when something is very unlikely to happen, it will be more thrilling if or when you finally reach that point.

Other real pros were easily going 1800+ 1900+ and then there was elite 2000+ and like one person at 2100 MMR.

My point: If it were easier to climb MMR it would feel more pointless.

Bjenn · Post by **Bjenn** » 25 Aug 2021, 08:39

It feels weird that people who haven't played battles in six years can still be in the top of the ranking system.
All other games with a ranking system have it decreasing if you are afk.

Also why is Madness above me? Must be a bug. Thanks in advance for fixing))
https://elma.online/ranking

EDit: also if part of the counting towards the ranking is based on ALL battles all together of ones playing, it's immediately desponding towards its rank if one were to get better and better throughout the years. Now I don't know how it is, just a thought that I got.
Especially true if it were Blaztek (playing many many more battles than the average guy or gal).

EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal

Re: EOL Ranking Discussion v_final_veryfinal_superdefinitelyfinal