Stini wrote: ↑22 Jul 2019, 16:54
Yeah, I have a very naive reward function at the moment, which essentially just gives some reward for getting closer to the next closest apple/flower as quickly as possible, so most of the internals are not feasible at the moment. It's really challenging to make it explore properly and come up with new and different styles. The model is indeed trained for each level separately right now, but I will try to make a more general AI in the future, which could play multiple and more challenging levels.
Lol I was waiting for someone to try it out! Very interesting indeed!
Stini, how do you interact with the game? Is there a way to run it much faster than realtime (like in most reinforcement learning projects?) How do you grab the screen pixels if at all? (or is it training from low-dimensional representation?)
If you have access to Elma source code this should be doable, otherwise, I'm curious.
Also, do you run more than one copy of Elma at a time during training?
Other questions: what RL framework do you use? Is it something open-source like OpenAI Baselines, or is this your own implementation?
What kind of RL algorithm do you use? A2C, PPO, DQN?
Having worked with RL myself, I can say this is an extremely challenging problem for contemporary exploration algorithms. For non-trivial levels it basically involves implicit traveling salesman problem.
I think it might be possible to make progress on harder levels by combining some kind of search over apple sequences and routes with RL to produce the actual actions for every route. This might even have some style finding potential on shorter levels.
Totally agree with Milagros though, if bug bounce is patched this is even more interesting!
Also, very surprised at how human the recs look! Totally looks like a rec from an amateur player who never saw a professional rec.