Optimal Elma player

dimkadimon · Post by **dimkadimon** » 7 Feb 2009, 07:08

The other day I had an idea: what if we create the optimal Elma player? This is how it works. An Elma player can be modeled as a Reinforcement Learning agent with 5 available actions: throttle, break, left, right and turn. If the agent has collected all the apples he enters the terminal state and receives a reward of 1/(time taken), otherwise in all other states he receives a reward of -1. This way, the agent will aim to minimize the time. We can now do standard TD(lambda) to teach the agent to drive well and hopefully he will learn the optimal policy.

There are several extensions to this. We can change the reward function, such that it grows with each collected apple, ie the first apple gives +1, the second +10, the third +100 and so on - this will be some form of shaping. We can provide "mini-goals" for the agent - these could be checkpoints along a "good" path. We can initialize the agent's driving policy by a good drive through the level (provided by a user) and let him try to make it better. We can also do learning by example, just like its done here: https://research.microsoft.com/en-us/pr ... fault.aspx. The possibilities are endless and if we put our minds together we can achieve great things!

The only problem is the learning agent must live in the Elma environment. How can we access the code of the game?

Here are some more papers related to this:

Learning to fight: https://research.microsoft.com/apps/pub ... x?id=65639
Learning to bike ride: http://www.cs.mcgill.ca/~mgendr12/COMP5 ... arning.pdf
bipedal-walking: http://citeseerx.ist.psu.edu/viewdoc/su ... 1.107.3195
Robot Auto Racing: http://citeseerx.ist.psu.edu/viewdoc/su ... .1.86.9280
Apprenticeship learning: http://citeseerx.ist.psu.edu/viewdoc/su ... 0.1.1.2.92

hex · Post by **hex** » 7 Feb 2009, 13:34

Sla · Post by **Sla** » 7 Feb 2009, 21:38

nooo fuck you. jappe to the max!

zebra · Post by **zebra** » 8 Feb 2009, 00:04

why so stupid replies to such an intelligent post?

I know it should be theoretically possible to make an optimal elma player but really really hard... Those scientific papers you showed... such hard formulas even for such an easy things as driving a bicycle. So for a very complex game like elma, those formulas would be incomprehensible.

I don't see why we should give points for apples. Apples are not 'extra points', they are things that have to be collected.

You can't really get elma code anywhere, only balazs has it. So I think that's the biggest problem, is it?

dimkadimon · Post by **dimkadimon** » 8 Feb 2009, 08:14

why so stupid replies to such an intelligent post?

Yes I was expecting more intelligent replies too...

I know it should be theoretically possible to make an optimal elma player but really really hard... Those scientific papers you showed... such hard formulas even for such an easy things as driving a bicycle. So for a very complex game like elma, those formulas would be incomprehensible.

Don't worry about the formulas, they are not as hard as they look. I did my honours project in Reinforcement Learning and my current PhD supervisor is an expert in Reinforcement Learning, so we should be able to deal with the formulas.

I don't see why we should give points for apples. Apples are not 'extra points', they are things that have to be collected.

Think of a really long level that takes about 3 minutes to complete. If the agent receives a reward only at the very end, then there is nothing to guide him in between. Instead, if you give the agent some small intermediate rewards (eg. points for collecting apples) then it will help in the task. This way, the agent will first learn how to get the first apple quickly, then he will learn how to get the second apple quickly and so on until he learns how to get them all quickly. Remember, one has to learn how to crawl before he can walk.

You can't really get elma code anywhere, only balazs has it. So I think that's the biggest problem, is it?

For me this is the most difficult part. I thought someone might have hacked into the code, or did something clever like that. In the worst case, we can email Balazs and hope that he is kind enough to give us the code. Does he usually reply to emails about Elma?

Grace · Post by **Grace** » 8 Feb 2009, 10:12

he generally ignores you or gives you a cryptic auto email about the release date of elma 2.

oh and the only problem with setting "rewards" at each apple is that sometimes the fastest way of getting to apples 1 will mean that overall it takes longer to get to apple 2 than if you'd taken a different route. for instance it's quicker to run straight into the very top apple in what the heck, but if you do then it will take you longer to get the apples right underneath it.

dimkadimon · Post by **dimkadimon** » 8 Feb 2009, 13:46

he generally ignores you or gives you a cryptic auto email about the release date of elma 2.

Actually we don't need the source. It will be sufficient to simulate keystrokes inside Elma and have Elma provide us with our current state. The state comprises of: current time, number of apples eaten, position of the bike etc.

oh and the only problem with setting "rewards" at each apple is that sometimes the fastest way of getting to apples 1 will mean that overall it takes longer to get to apple 2 than if you'd taken a different route. for instance it's quicker to run straight into the very top apple in what the heck, but if you do then it will take you longer to get the apples right underneath it.

Yep I understand what you are saying. This is why I suggested giving rewards in an incremental order, so the later apples have more influence than the earlier ones. After all, we only care about getting the last apple quickly, but we would also like to get to the earlier apples quickly if possible. The reward for collecting the i-th apple at time t could be something like e^(i+c/t), where c is some constant. So later apples give you much more reward; and apples collected quickly give you more reward.

Grace · Post by **Grace** » 8 Feb 2009, 15:27

Bicycle has less buttons = less complex, also no sexy tricks, ever seen a bicycle do a brutal on uphill battle? how about ramp frenzy brutal or animal farm head compression? nat.

hex · Post by **hex** » 8 Feb 2009, 17:25

dimkadimon, thanks for starting a most interesting topic.

elma may well be a computer game, with simulated physics, but the possibilities is really what makes the game complex. it reminds me of when doom was first released: you could then not only have enemies coming in front or back, but in depth of space too! elma for me is like that. it can be said to be more complex than most other 3d games.

about the concerns in emailing balazs, to me he has in the past been friendly, and cooperative to a certain level (i.e. past the level of ignoring my messages). if any part of your project needs example of well-driven, almost-perfect replays (such as John's 29), just reply here or send me a pm and i'll take a look into it. good luck!

dimkadimon · Post by **dimkadimon** » 9 Feb 2009, 03:17

Bicycle has less buttons = less complex, also no sexy tricks, ever seen a bicycle do a brutal on uphill battle? how about ramp frenzy brutal or animal farm head compression? nat.

I don't agree. A real bicycle has an infinite number of continuous actions: you can lean x degrees left, right, forward, back etc. In all cases, x is a real value. While an elma bicycle has only 5 discreet actions and those actions have only 2 states: on and off. Either you have full throttle or none at all, you can't do half throttle for example.

Grace · Post by **Grace** » 9 Feb 2009, 10:15

sure, the bicycle has many different actions, however as far as i know there are only 2 buttons on an average bicycle, 6 on a mountain bike. Elma has 8.

zebra · Post by **zebra** » 9 Feb 2009, 14:23

You should ask the code from milagros. He has made quite many additions to the game and probably already knows the source code by heart. But I think he has to code it in assembly because balazs hasn't given the original source code.

And btw, there has been a similar kind of topic earlier. It was an april fool though:

http://mopolauta.moposite.com/viewtopic.php?t=3132

John · Post by **John** » 9 Feb 2009, 19:46

SmaXa wrote:sure, the bicycle has many different actions, however as far as i know there are only 2 buttons on an average bicycle, 6 on a mountain bike. Elma has 8.

so driving a 2d motorbike is more complex than riding a real bike in 3d with 2 brakes, gears and variable applied force to the pedals?

FinMan · Post by **FinMan** » 9 Feb 2009, 20:09

John, ez add more different grips, grounds, wind and if the kuski gets tired?

pawq · Post by **pawq** » 10 Feb 2009, 18:15

optimal player would obviously be modified Zweq.exe, with inbuilt winballe.exe, driveWR.exe and pwnJohn.exe. Hard to make even for teh_mila

Chris · Post by **Chris** » 10 Feb 2009, 23:39

AFAIK the collision scripts use type double. So bike is going to act differently each time you execute the same moves sequence. In that case any kind of macro/neural network or howsoever it's called is not going to work. I don't know if you are already aware of this issue.

dimkadimon · Post by **dimkadimon** » 11 Feb 2009, 00:54

You should ask the code from milagros. He has made quite many additions to the game and probably already knows the source code by heart. But I think he has to code it in assembly because balazs hasn't given the original source code.

So milagros reimplemented elma in assembly?!? That sounds hard. How can I contact him?

dimkadimon · Post by **dimkadimon** » 11 Feb 2009, 01:01

alias wrote:AFAIK the collision scripts use type double. So bike is going to act differently each time you execute the same moves sequence. In that case any kind of macro/neural network or howsoever it's called is not going to work. I don't know if you are already aware of this issue.

What collision scripts are you referring to? How can doubles affect the move sequence? - I think you wanted to say that the script is non-deterministic resulting in different move sequence each time you execute it?

dimkadimon · Post by **dimkadimon** » 11 Feb 2009, 06:01

Jappe wrote:apparently dimkadimon isnt an optimal forum user since he doesnt know how to quote properly and makes double posts

Where have I double posted and what is wrong with my quoting?

zebra · Post by **zebra** » 11 Feb 2009, 07:30

dimkadimon wrote:
alias wrote:AFAIK the collision scripts use type double. So bike is going to act differently each time you execute the same moves sequence. In that case any kind of macro/neural network or howsoever it's called is not going to work. I don't know if you are already aware of this issue.
What collision scripts are you referring to? How can doubles affect the move sequence? - I think you wanted to say that the script is non-deterministic resulting in different move sequence each time you execute it?

The finite accuracy of the variables used by the physic engine and your frame rate cause the fact that your ride is non-deterministic even if in so-called 'auto-levels' where you don't have to press any buttons: the bike still acts differently in each ride.

zebra · Post by **zebra** » 11 Feb 2009, 07:31

dimkadimon wrote:
Jappe wrote:apparently dimkadimon isnt an optimal forum user since he doesnt know how to quote properly and makes double posts
Where have I double posted and what is wrong with my quoting?

Just ignore that idiot.

zebra · Post by **zebra** » 11 Feb 2009, 07:33

dimkadimon wrote:
You should ask the code from milagros. He has made quite many additions to the game and probably already knows the source code by heart. But I think he has to code it in assembly because balazs hasn't given the original source code.
So milagros reimplemented elma in assembly?!? That sounds hard. How can I contact him?

No he didn't reimplement elma

But somehow converted the elma.exe into assembly where he can then add his own stuff.

You can contact him in IRCnet channel #across nick teh_mila.

dimkadimon · Post by **dimkadimon** » 11 Feb 2009, 09:08

The finite accuracy of the variables used by the physic engine and your frame rate cause the fact that your ride is non-deterministic even if in so-called 'auto-levels' where you don't have to press any buttons: the bike still acts differently in each ride.

Actually if the variations are small (from run to run) then this might not be such a bad thing. In some sense it will encourage the agent's exploration. This is one of the reasons why TD-Gammon worked so well: http://gammonline.com/articles/td-gammon.html

You can contact him in IRCnet channel #across nick teh_mila.

Does he have an email? - because that would be easier for me.

The_BoneLESS · Post by **The_BoneLESS** » 11 Feb 2009, 15:58

Another thing to take in consideration is that the graphic card configuration has an impact on certain physical properties of the game (such as grip, bounces, etc.). I don't know if your agent can take that in consideration, haven't read the article yet.

Concerning that "collision script", or the "physical engine", or whatever you want to call it, it doesn't seem to react the same everytime with the same combination of button presses so, your agent couldn't rely on a simple button combination, it would have to adapt to the progress of its run. But i guess that's the whole point of the agent for him to be optimal. I'll definitly read this article later today, seems interesting.

For your information, milagros disassembled the elma.exe file if i'm not mistaken.

Memphis · Post by **Memphis** » 11 Feb 2009, 17:27

Damn I wish I knew more about this kind of stuff as it interests me greatly. I think there is no hurt in trying to make the optimal play dimkadimon even if it doesn't suceed it is still a step in the right direction and a learning experience. If there is anything I can do to help you let me know.

Oh ya and don't listen to all the negative comments from noobs like Jappe. they are just pesamistic people.

hex · Post by **hex** » 11 Feb 2009, 18:12

here are some scene fantasies:

(1) april fools 2005:

Code: Select all

A russian Elma player has programmed a software which calculates the theoretical minimum times of the levels. It uses some kind of Euler algorithms which are very efficient in this kind of processing. There is also set a limit value (0.84) for the bug bounces.

(2) interesting stuff starts from linked post: http://mopolauta.moposite.com/viewtopic ... 92#p164592
among other things, has been mentioned the need of an expert on the subject.

zebra · Post by **zebra** » 11 Feb 2009, 21:14

The_BoneLESS wrote:Another thing to take in consideration is that the graphic card configuration has an impact on certain physical properties of the game (such as grip, bounces, etc.). I don't know if your agent can take that in consideration, haven't read the article yet.

Only fps matters. If you are thinking of vsync and stuff, vsync just limits the fps to your screen refresh rate.

dimkadimon · Post by **dimkadimon** » 12 Feb 2009, 03:14

Has anyone played X-Moto? http://xmoto.tuxfamily.org/

It seems similar to Elasto, but slightly easier. All the code is available, so perhaps it will be easier to hack into that...

The_BoneLESS · Post by **The_BoneLESS** » 12 Feb 2009, 03:37

dimkadimon wrote:Has anyone played X-Moto? http://xmoto.tuxfamily.org/
It seems similar to Elasto, but slightly easier. All the code is available, so perhaps it will be easier to hack into that...

thing is, you won't find any xmoto fan here because, the game has horrible physics and is unplayable. But yeah, it is indeed an elma look-alike (clone).

Zweq · Post by **Zweq** » 12 Feb 2009, 06:30

THIS X MOTO GAIM IS SO REDECULOUSLY GOOD ITS HELAROUSE. Yeah xmoto is a swearword around here.

nice topic anyway, although those links you provided seem a bit too far fetched

Grace · Post by **Grace** » 12 Feb 2009, 08:30

X-Moto is pretty much an attempted copy of Elma.

FinMan · Post by **FinMan** » 12 Feb 2009, 15:23

It is a copy of elma, and bad one... Really sux physics etc.

hex · Post by **hex** » 12 Feb 2009, 17:19

i believe milagros could be of help to you. if he did implement real-time multiplayer, he should be capable of implementing the stuff you need. but first, show him your research funds...

zebra wrote:The finite accuracy of the variables used by the physic engine and your frame rate cause the fact that your ride is non-deterministic even if in so-called 'auto-levels' where you don't have to press any buttons: the bike still acts differently in each ride.

but, wouldn't that be countered by an impeccable, constant fps (say 60)? or would just the physics engine and the frame rate somehow need be synched one another? because if the bike gets one degree more inclined say at time zero, at time even four the difference could be a most significant one. i'm just saying that might not be really good for the agent. thanks

milagros · Post by **milagros** » 18 Feb 2009, 17:24

Reinforcement learning or any other gradient approach (neural nets, ..) is completely unsuitable for this problem. It's usually used if you don't know mathematical model exactly (robots in real environment) or if there is some uncertainity (opponents, ..).

Finding the optimal path for given level is very non-convex problems with loads of local minimas (called optimal runs with certain 'style'). You will never get to the optimum using gradient method, at most you can hope for nab style good run. Other problem is that there aren't really any local goals so your player knows what he is supposed to do at the beginning when no weights are trained yet. On the other hand there isn't any problem like this in checkers (get as many points in the next move) or car driving (get infront of the opponent).

This problem is classic optimization problem - given framerate (or set of timeevents), press keys so you touch the flower in minimal time under contraint that you have to take all apples. For this problem you know the mathematical model (set of equations in elma) and there is no uncertainity at all (no need to explain those autolevs now..). Final time (or failed to finish) can be evaluated in P for given set of keys. You don't have to run it through elma, you jsut have to copy the equations and it's going to be faster that way. At the end keys can be sent to elma to drive your final 'solution'.

Some years ago I tried to write some nice code to do that. At that time I was quite nab in asm, so quite many things were not really optimal. Eventhough the algorithm was able to drive better than WR times those times and with some more trying its easy to get some 33tt atleast.

I tried these apporaches:
You set manually local goal (get as fast as you can here, have as high speed as possible or some combination of speed/time) and then you try brute force where keys can change only at low framerate (lets say its running 1000fps and you can check key only 20fps). With that it's possible to count 'optimally' a few secs (with some other branch&bound hacks). From this 'optimal' solution you can improve by doing simple gradient decent by changing times of keypresses with 1ms accuracy. To get better solution you can always remember best X runs for each local goal and then try next local goal from each one of them (similar to some genetic algorithms).
This can be improved a lot using more clever apporach. I can drive some example run and then computer will try to find the optimal way under contraint that some distance meassure between my saveload run and the computer try will be close enough. With this several seconds can be calculated 'optimally' efficiently. Some clever branch&bound algorithm may be useful there.

Using this approach its possible to do perfect hoyla runs (like warmup, ..), impsy tricks where i just show how do i plan to do it and computer optimizes (hard bounce can be done at high fps on some 0.3s brute force), however, it will not find styles for you, you have to tell it what to try.

I haven't really finished coding it and I can't be bothered to do so atm. Some years ago I wanted to do some simple framework so you can code the algorithm in c++ yourself and it would work, however, I found out that most of the ppl here are programming nabs and noone would be interested. The only thing would be that it would help ppl cheat.

Edit:
btw. doing decent real time battle bot is completely different thing. A good one would prolly fail in internals.(some 45tt or smth) I didn't try that one but would be prolly even cooler competition if there were many skilled programmers around here.

welle · Post by **welle** » 18 Feb 2009, 19:03

That's some really interesting stuff mila! While we had different points of view in a little conversation last year, i couldn't agree more with your post. I've been thinking about working on this for quite a while, but I didn't know you already investigated those equations and implemented some optimization algorithms. I'm studying computer sience and mathematics, and consider myself to be a quite skilled programmer. I would be VERY interested in this C++ framework, or even these raw equations, that tell me the time for given keypresses at given points of time for a given level. Keep it up!

dimkadimon · Post by **dimkadimon** » 19 Feb 2009, 08:25

Thank you for that insightful post Milagros! I now see that the problem is easier than I thought originally. You are right - there is no uncertainty and we know the exact model. If for example, we didn't know what level the agent is in then it would be more suitable for RL. This raises the question: is it possible to create a strong RL agent that is reasonably good at any level you give it, without seeding it with any starting styles? I suspect that its only possible if the levels are small (best ride < 20 seconds). However as the level becomes more complex, the goals become further away and without any intermediate rewards the problem probably becomes exponentially harder.

Nevertheless, the problem is still interesting even when you know the exact physics and exact environment. It just becomes an optimization problem, but its still a difficult and interesting optimization problem. I am sure this community would appreciate seeing how close (or maybe even better!) we can get to the current WR times. Perhaps we can learn something from the computer player. Who knows...

I have a background in maths and CS, so I am willing to spend some time on this problem. I am also urging others to join in, the more the merrier

Milagros it would be awesome if you could provide us with those equations for the physical model or any code that you have - just to get us started!

gimp · Post by **gimp** » 19 Feb 2009, 09:05

unfortunately dimkadimon, like milagros stated, most of the people here are programming nabs. the general population only has the ability to encourage you, i think its a great idea though. perhaps flat track would be the ideal level to begin with, seeing as there are no apples, and no unapparent styles that we havent thought of yet. it might be the easiest level to reach the optimal time?

The_BoneLESS · Post by **The_BoneLESS** » 19 Feb 2009, 14:23

I would surely like to help you but, i mostly (and pretty much only) have theoretical knowledge about optimization algorithms at this point.

Well, we could at least start with some algorithm for simplier problems like Warm Up or any level without apples (though i'm sure mila has already done that on his free time).

hex · Post by **hex** » 19 Feb 2009, 14:57

whatever happens you have got the interest of the whole scene.

Ramone · Post by **Ramone** » 19 Feb 2009, 19:29

lika mila said, this will only help ppl cheat. only have i no interest at all in this, I would accually dislike it. if someone could cheat runs better than current wrs and spread would spoil the whole scene and ppl would quit elma.

Fuck that, go play elma for real damnit. or go hack some other game.

FinMan · Post by **FinMan** » 19 Feb 2009, 19:31

8km = this driving all day.

Kopaka · Post by **Kopaka** » 20 Feb 2009, 09:52

I agree with ramone, actually surprised no one said that before, if you make some program that can make perfect runs it would ruin the whole point of the game.

hexeditor wrote:whatever happens you have got the interest of the whole scene.

Talk for yourself please.

hex · Post by **hex** » 20 Feb 2009, 13:45

i thought most hardcore players with time would be more interested in the possibilities of the game than in hoyling the same levels over and over. that being said, i don't mind a computer playing for me and freeing me the task of losing my life at it.

pawq · Post by **pawq** » 20 Feb 2009, 15:57

but unfortunately some probably would like to have better times done by... computer

On the other hand i think that the main reason is just to see what may be done in elma. what are the fastest possible runs in internals ect.

totem · Post by **totem** » 21 Feb 2009, 13:51

having coded a Yam IA last week end, i think programming a game ruins the fun of playing it self, it's verified

On the other hand, mila's post is very interesting to read

teajay · Post by **teajay** » 21 Feb 2009, 14:20

I think it would be a very interesting endeavour, notwithstanding the possibility of cheat. If you guys are really interested in putting time in this, you should co-op with eachother, but only in a closed circle ofcourse. I'm not really impressed by the conservative opinions that find this a threat towards real people's wr's. It's not gonna be a threat.

Mats · Post by **Mats** » 21 Feb 2009, 16:53

here is last computer that tried to take elma wr's by itself

Zweq · Post by **Zweq** » 21 Feb 2009, 17:16

my keyboard was more worn out before it died. i guess that comp just didnt hoyl enough to get wrs

hex · Post by **hex** » 21 Feb 2009, 18:02

i wrote to zweq for his opinion, as he's one of the best players around. here it is, folks:

Zweq wrote:I'm very interested in seeking for limits in elma as long as it is not exploited nor will damage legit tables (styles arent published)

Ramone · Post by **Ramone** » 22 Feb 2009, 15:07

why would zweqs opinion matter more than any elses? I am not at all interested in seeing a list of theoretical times. How could anyone guarantee drives/styles could be kept secret.

This is not intereresting! It sucks.