Replays sharing

Look for replays and levels and ask people's times.

Moderator: Moporators

Post Reply
User avatar
milagros
Cheatless
Posts: 4443
Joined: 19 May 2002, 17:05

Re: Replays sharing

Post by milagros » 22 Jul 2019, 16:04

Labs wrote:
22 Jul 2019, 15:31
Please show some nice internal recs, maybe computer can show us new styles :D
I guess it works only where you can locally create the gradient
for example "get as far left / right / up / down) as possible"
so no other internal than 02 can be done that way

also I guess it needs to be trained for each level separately (so it actually doesn't really observe stuff)

to do it properly on arbitrary internal level, you need to have sufficient computing power to achieve the goal "by chance", which is not really feasible unless you have some millions to waste
[carebox]

User avatar
Stini
39mins club
Posts: 198
Joined: 5 Dec 2002, 22:15
Team: ICE
Location: Helsinki, Finland

Re: Replays sharing

Post by Stini » 22 Jul 2019, 16:54

milagros wrote:
22 Jul 2019, 16:04
Labs wrote:
22 Jul 2019, 15:31
Please show some nice internal recs, maybe computer can show us new styles :D
I guess it works only where you can locally create the gradient
for example "get as far left / right / up / down) as possible"
so no other internal than 02 can be done that way

also I guess it needs to be trained for each level separately (so it actually doesn't really observe stuff)

to do it properly on arbitrary internal level, you need to have sufficient computing power to achieve the goal "by chance", which is not really feasible unless you have some millions to waste
Yeah, I have a very naive reward function at the moment, which essentially just gives some reward for getting closer to the next closest apple/flower as quickly as possible, so most of the internals are not feasible at the moment. It's really challenging to make it explore properly and come up with new and different styles. The model is indeed trained for each level separately right now, but I will try to make a more general AI in the future, which could play multiple and more challenging levels.

I did some tries on other internals as well, but the results weren't good enough for the video:

Warm up, 15.24
Islands in the Sky, 24.91
Uphill Battle, 25.96 - kinda interesting pop style here
Haircut, unfinished - I had a bug in my code, so it stops too early, but looks like ez finish

User avatar
Grace
38mins club
Posts: 4721
Joined: 19 Nov 2005, 10:45
Location: Deep in your Imagination, Twirling your Dreams and Weaving your thoughts.

Re: Replays sharing

Post by Grace » 22 Jul 2019, 16:58

Pretty tight haircut stylefinding.

I'm curious on the logic behind the reward function. In - for example the haircut rec - surely by default the closest rewardable object is the flower. Is there some caveat that flower is unrewarding until all apples given?
Image Cyberscore! Image
___________________________________________________
Image
Targets: 6 Legendary, 19 WC, 24 Pro, 5 Good | 37 Australian Records | AvgTT: 40:09:92

User avatar
Stini
39mins club
Posts: 198
Joined: 5 Dec 2002, 22:15
Team: ICE
Location: Helsinki, Finland

Re: Replays sharing

Post by Stini » 22 Jul 2019, 17:04

Yes it tries to collect all the apples first.

User avatar
Labs
38mins club
Posts: 1037
Joined: 2 May 2005, 14:20
Team: SPEED
Location: Hungary
Contact:

Re: Replays sharing

Post by Labs » 22 Jul 2019, 17:10

Next table spef makes wr on uphill battle :D Nics recs!
Team SPEED

Image

i love everyone :*

User avatar
milagros
Cheatless
Posts: 4443
Joined: 19 May 2002, 17:05

Re: Replays sharing

Post by milagros » 22 Jul 2019, 17:20

Stini wrote:
22 Jul 2019, 16:54
Yeah, I have a very naive reward function at the moment, which essentially just gives some reward for getting closer to the next closest apple/flower as quickly as possible
i wanted to do something similar at some point and concluded the only way to get really good results is to use a reward function depending on how close you are to a future state in a SL rec (you try to be where SL rec was 0.1s in the future, once achieved, 0.2s in the future etc)
all plans were dropped when I realized that all recs would be optimal if you over-use the bug bounces
[carebox]

User avatar
Hosp
38mins club
Posts: 1900
Joined: 30 Aug 2009, 20:55
Team: MiE
Location: Uppsala, Sweden.
Contact:

Re: Replays sharing

Post by Hosp » 22 Jul 2019, 18:30

that's the best 05 rec i hev ever seen
Image
tomat

User avatar
ArZeNiK
39mins club
Posts: 455
Joined: 30 Jul 2016, 09:18
Team: Ferrari

Re: Replays sharing

Post by ArZeNiK » 22 Jul 2019, 19:31

actually i think all these AI recs are sik
who know where bot will be in future. maybe bot is first 33tt
I like turtles
Image

User avatar
Lukazz
36mins club
Posts: 5224
Joined: 4 Jul 2004, 12:10

Re: Replays sharing

Post by Lukazz » 22 Jul 2019, 21:13

Insanely interesting project! Nice 05 style!
TT: 36:59:53 || Avg TT: 38:09:65

Petrenuk
Kuski
Posts: 16
Joined: 12 Jul 2008, 12:54

Re: Replays sharing

Post by Petrenuk » 31 Jul 2019, 02:10

Stini wrote:
22 Jul 2019, 16:54

Yeah, I have a very naive reward function at the moment, which essentially just gives some reward for getting closer to the next closest apple/flower as quickly as possible, so most of the internals are not feasible at the moment. It's really challenging to make it explore properly and come up with new and different styles. The model is indeed trained for each level separately right now, but I will try to make a more general AI in the future, which could play multiple and more challenging levels.
Lol I was waiting for someone to try it out! Very interesting indeed!
Stini, how do you interact with the game? Is there a way to run it much faster than realtime (like in most reinforcement learning projects?) How do you grab the screen pixels if at all? (or is it training from low-dimensional representation?)
If you have access to Elma source code this should be doable, otherwise, I'm curious.
Also, do you run more than one copy of Elma at a time during training?

Other questions: what RL framework do you use? Is it something open-source like OpenAI Baselines, or is this your own implementation?
What kind of RL algorithm do you use? A2C, PPO, DQN?

Having worked with RL myself, I can say this is an extremely challenging problem for contemporary exploration algorithms. For non-trivial levels it basically involves implicit traveling salesman problem.
I think it might be possible to make progress on harder levels by combining some kind of search over apple sequences and routes with RL to produce the actual actions for every route. This might even have some style finding potential on shorter levels.
Totally agree with Milagros though, if bug bounce is patched this is even more interesting!

Also, very surprised at how human the recs look! Totally looks like a rec from an amateur player who never saw a professional rec.

User avatar
Stini
39mins club
Posts: 198
Joined: 5 Dec 2002, 22:15
Team: ICE
Location: Helsinki, Finland

Re: Replays sharing

Post by Stini » 31 Jul 2019, 16:31

Petrenuk wrote:
31 Jul 2019, 02:10
Stini, how do you interact with the game? Is there a way to run it much faster than realtime (like in most reinforcement learning projects?) How do you grab the screen pixels if at all? (or is it training from low-dimensional representation?)
If you have access to Elma source code this should be doable, otherwise, I'm curious.
Also, do you run more than one copy of Elma at a time during training?
Smibu was kind enough to give me access to EOL2 code, so I used its physics code to build an elma simulator, which is about 1000x faster than real time. Currently I only simulate the physics and I don't do any frame rendering, so I don't use any pixel values in my algorithm. I just use some of the internal state of the simulator as features (coordinates, velocities, rotation speeds etc.). And yeah it's quite trivial to run multiple simulators concurrently.
Petrenuk wrote:
31 Jul 2019, 02:10
Other questions: what RL framework do you use? Is it something open-source like OpenAI Baselines, or is this your own implementation?
What kind of RL algorithm do you use? A2C, PPO, DQN?
I wanted to try something simple first, so I just picked cross-entropy method since it's very easy to implement. Turned out it works quite well. I was then asked later to make some runs for the video, but due to limited time I just kept using and improving my implementation of CEM rather than taking a risk to try something more sophisticated.

I have actually implemented OpenAI Gym wrapper for the simulator and I did try some of the OpenAI Baselines algorithms, such as PPO and DQN. However, I wasn't able to get these working as well as my own implementation of CEM. I could maybe take a closer look at these again, but I'm more interested in trying model-based methods, which I also use at work a lot so I'm more familiar with them and I also think they might work quite well.
Petrenuk wrote:
31 Jul 2019, 02:10
Having worked with RL myself, I can say this is an extremely challenging problem for contemporary exploration algorithms. For non-trivial levels it basically involves implicit traveling salesman problem.
I think it might be possible to make progress on harder levels by combining some kind of search over apple sequences and routes with RL to produce the actual actions for every route. This might even have some style finding potential on shorter levels.
Totally agree with Milagros though, if bug bounce is patched this is even more interesting!
Yeah I have been also thinking about some kind of search and other ideas for style finding and exploration. For example, if you look at the recent Go-Explore paper by Uber to solve Montezuma's revenge, it's essentially just a simple search from "most promising" states with lower-dimensional state representation and stuff. However, I think the state-space might be too complex for this kind of approach to work well in elma.

I will probably try using recs for better style finding. I would like to use recs for training the AI in general, but you could also try learning which apple a human would collect next for example. Maybe you could then build some kind of a hierarchical model using this or something.

Petrenuk
Kuski
Posts: 16
Joined: 12 Jul 2008, 12:54

Re: Replays sharing

Post by Petrenuk » 2 Aug 2019, 01:22

Stini, this is awesome!
I do know the Go-Explore paper and indeed there might be some potential in using it for style-finding. But as you said, Elma state space is a lot more complex than Montezumas, because things like velocity and rotation matter a lot. Even if you discretize everything, it might be either too coarse to find anything useful or too huge to store and process.

Maybe initially something like hand-crafted route defined by a set of waypoints could work. Similar to placing a bunch of apples along the "desired" path through the level. And then let the RL algorithm find the low-level actions to drive through waypoints.

Wild thought:
given the fact that you have access to all physic calculations, you can actually use differentiable physics :D
https://arxiv.org/abs/1905.10706

Post Reply