NOTES.txt

PR idea notes for tensorforce:
 had to convert my input to float32 from int.
 had to write batchnorm and permute
 can't control ordering for conv2d
 hard to visualize the network architecture and understand it / inputs / outputs of each layer
 have to reshape in order to get my data to work with conv2d (adding channel)
 when there's a problem, hard to know which spot in the network spec was the issue
 why are you not using keras or tf?

ConvDQN learning issues

TODO: Find better ways to debug the network to see what is happening.
TODO: add an environment without auction - just draft.

use self play and win / loss rewards only rather than in game rewards for winning a bid

Maybe the problem is that it cannot see the relationship between its action and the reward because of the
simultaneous action selection, done by the random opponents.

Next target - find an environment it CAN learn. Make it harder and harder from there. Be very gradual.
If it stops being learnable, try to go in between the last one it could learn.