-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pong project #18
Comments
The blueprint wrappers and setup is available in this branch: https://github.com/getnamo/tensorflow-ue4-examples/tree/qlearn. You should be able to play the game, speed up time, switch to if statement ai, switch to qlearning ai and more. I never got around to fully training the network however. Play around with it and see if you can get the ai to a good state. |
Yeah, I've currently got the pong game to work on a spyder environment. Anyways, I'm going to try all the tools soon and update when I finished with the pong. Thanks! |
In the example above, instead of learning on pixels of the game it's playing the game in UE4 and sending ball and player locations as inputs. This should drastically reduce the network size needed (more qlearning than deepq). If you want to train based on pixels while still using ue4 to simulate the environment, consider using scene capture (https://docs.unrealengine.com/en-us/Resources/ContentExamples/Reflections/1_7) to render to texture at desired size and then sending the input through. |
Yeah well I've got a task to do a diff game with enemies spawning in random location and going towards you, and the idea it to move the pole up (rotate it up) so it won't bump to the enemy. And thus, sending location of the ball/enemy would probably be insufficient here, I'm not quite sure. I'll use the Scene Capture 2D. |
getnamo, I'm trying to get a crucial thing to work and I'm not sure how to do that, I want to feed it to a neural network |
https://github.com/getnamo/tensorflow-ue4#any-ustruct-example is the key part. You want to encode your image as an array of floats, append that array along with any variables you want in python as a struct and then encode that as a json which you send to the python layer. There are helper functions available which convert textures to float arrays https://github.com/getnamo/tensorflow-ue4/blob/master/Source/TensorFlow/Public/TensorFlowBlueprintLibrary.h#L20 these are blueprint accessible from anywhere. The greyscale version is used for mnist recognition inside the TensorflowComponent (it's a blueprint component you can inspect it). For reference here's the
then in a simple example, only the 'pixels' property is used for e.g. mapping input in mnist https://github.com/getnamo/tensorflow-ue4-examples/blob/master/Content/Scripts/mnistTutorial.py#L26 |
Read the directions in e.g. https://github.com/getnamo/tensorflow-ue4-examples/releases/tag/0.4.1 release thoroughly, you need to copy the matching plugin into your project root (which is https://github.com/getnamo/tensorflow-ue4/releases/tag/0.10.1). Added a section in the readme in troubleshooting to help clarify for future users: https://github.com/getnamo/tensorflow-ue4-examples/blob/master/README.md#startup-error |
Please read the instructions: https://github.com/getnamo/tensorflow-ue4#installation--setup you need to wait until dependencies have installed |
Yeah I used the CPU version now and it works, it trains and then predicts successfully. Thanks! I'm actually able to do almost everything to complete the task. Just one thing left to accomplish. And I'm still not when there's for example a collision/overlapping between player & enemy, send a reward of -1, when there's collision with the floor, send a reward +1, and so on.. |
Hello. Thank you for awesome TF plugin for UE4. I'm trying to use it to provide plausible route planning for pedestrain agents. Still searching for the right model, but RL looks promising. How would you train the network in this PongAI example? |
Keep in mind that I'm not machine learning expert and you may need look up some more recent resources for best practices. A typical reinforcement learning method is Deep Q learning. The pong example uses this here (NB for this particular problem it's an overkill, but meant to serve as an example) https://github.com/getnamo/tensorflow-ue4-examples/blob/qlearn/Content/Scripts/PongAI.py. There are also random input AI and basic if-statement AI scripts found in the same folder to train against and validate performance. Inside PongAI.py you will notice it has the runJsonInput function which is called from blueprint with the unreal game state as input. For the pong game the state is the ball X and Y position, and paddle position (height) https://github.com/getnamo/tensorflow-ue4-examples/blob/f32b9e27589f6cbdeccafb3cf890fb3d5bb7511e/Content/Scripts/PongAI.py#L72 . This is stacked for last 200 frames so that the the ai can learn some temporal features. The current reward along with the last action executed is dequeued so that we can correlate the reward with observation (game state). This is sent through a training step We then take the newest observations and request the ai's next action https://github.com/getnamo/tensorflow-ue4-examples/blob/f32b9e27589f6cbdeccafb3cf890fb3d5bb7511e/Content/Scripts/PongAI.py#L94. the selected action is then fed back to blueprint as a float value inside a JSON object https://github.com/getnamo/tensorflow-ue4-examples/blob/f32b9e27589f6cbdeccafb3cf890fb3d5bb7511e/Content/Scripts/PongAI.py#L105 where the game executes it for the next game tick. That's largely it. Run this loop: tick game, send state, train step on last state and reward, get next action, send action to blueprint to act out and repeat. After a long time it should train if our reward is well selected and if we have chosen a good enough observation structure and other dqn parameters. Note that traditional dqn inputs are pixels like simplified Atari game screens and the network should learn to associate what those pixels mean by itself. In this example I'm doing something much much smaller in state so a dqn is an overkill and likely a regular q learning setup should work. What remains is to select a good reward system and how to speed up training. It's likely that you can reshape the loop to take in the results from many instances of the game running in parallel (e.g. a bunch of pong games in the same level) and update all their observations, rewards, training steps and next actions at the same time. This should drastically increase the speed of training. Hopefully that gives you an idea of how this is setup, I'd point you to other general resources such as https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8 and recommend you adapt the network to your problem following appropriate modern guides to get good results. I never spent the time and got this example to get fully trained so YMMV. |
Thanks again for the answer. I managed to hook the DQN up to my project. I implemented the sensor and reward. Now I need to start training it. The Save/Load model functionality would be very useful, so I tried to use mnistSaveLoad.py as an example. You can see the full .py script here: https://yadi.sk/d/On6_nRs6WerrtA
It seems that I don't understand how to initialize value W (and others values in DQN->CNN tensorflow). |
Saving and loading are largely vanilla tensorflow functions, please see e.g. https://www.tensorflow.org/tutorials/keras/save_and_restore_models for guides. See also this related stack overflow question: https://stackoverflow.com/questions/46514138/tensorflow-attempting-to-use-uninitialized-value-w-in-a-class |
Have you managed to create a pong project yet?
The text was updated successfully, but these errors were encountered: