diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md new file mode 100644 index 0000000..7c2d299 --- /dev/null +++ b/README.md @@ -0,0 +1,76 @@ +# PyTorch NEAT + +## Background +NEAT (NeuroEvolution of Augmenting Topologies) is a popular neuroevolution algorithm, one of the few such algorithms that evolves the architectures of its networks in addition to the weights. + +HyperNEAT is an extension to NEAT that indirectly encodes the weights of the network (called the substrate) with a separate network (called a CPPN, for compositional pattern-producing network). + +Adaptive HyperNEAT is an extension to HyperNEAT which indirectly encodes both the initial weights and an update rule for the weights such that some learning can occur during a network's "lifetime." + +## About +PyTorch NEAT builds upon [NEAT-Python](https://github.com/CodeReclaimers/neat-python) by providing some functions which can turn a NEAT-Python genome into either a recurrent PyTorch network or a PyTorch CPPN for use in HyperNEAT or Adaptive HyperNEAT. +We also provide some environments in which to test NEAT and Adaptive HyperNEAT, and a more involved example using the CPPN infrastructure with Adaptive HyperNEAT on a T-maze. + +## Examples +The following snippet turns a NEAT-Python genome into a recurrent PyTorch network: +``` +from pytorch_neat.recurrent_net import RecurrentNet + +net = RecurrentNet.create(genome, config, bs) +outputs = net.activate(some_array) +``` + +You can also turn a NEAT-Python genome into a CPPN: +``` +from pytorch_neat.cppn import create_cppn + +cppn_nodes = create_cppn(genome, config) +``` + +A CPPN is represented as a graph structure. For easy evaluation, a CPPN's input and output nodes may be named: +``` +from pytorch_neat.cppn import create_cppn + +[delta_w_node] = create_cppn( + genome, + config, + ["x_in", "y_in", "x_out", "y_out", "pre", "post", "w"], + ["delta_w"], +) + +delta_w = delta_w_node(x_in=some_array, y_in=other_array, ...) +``` + +We also provide some infrastructure for running networks in Gym environments: +``` +from pytorch_neat.multi_env_eval import MultiEnvEvaluator +from pytorch_neat.recurrent_net import RecurrentNet + +def make_net(genome, config, batch_size): + return RecurrentNet.create(genome, config, batch_size) + + +def activate_net(net, states): + outputs = net.activate(states).numpy() + return outputs[:, 0] > 0.5 + +def make_env(): + return gym.make("CartPole-v0") + +evaluator = MultiEnvEvaluator( + make_net, activate_net, make_env=make_env, max_env_steps=max_env_steps, batch_size=batch_size, +) + +fitness = evaluator.eval_genome(genome) +``` +This allows multiple environments to run in parallel for efficiency. + +A simple example using NEAT to solve the Cartpole can be run like this: +``` +python3 -m examples.simple.main +``` + +And a simple example using Adaptive HyperNEAT to partially solve a T-maze can be run like this: +``` +python3 -m examples.adaptive.main +``` diff --git a/examples/adaptive/main.py b/examples/adaptive/main.py new file mode 100644 index 0000000..fba6e04 --- /dev/null +++ b/examples/adaptive/main.py @@ -0,0 +1,125 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import multiprocessing +import os + +import click +import neat + +# import torch +import numpy as np + +from pytorch_neat import t_maze +from pytorch_neat.activations import tanh_activation +from pytorch_neat.adaptive_linear_net import AdaptiveLinearNet +from pytorch_neat.multi_env_eval import MultiEnvEvaluator +from pytorch_neat.neat_reporter import LogReporter + +batch_size = 4 +DEBUG = True + + +def make_net(genome, config, _batch_size): + input_coords = [[-1.0, 0.0], [0.0, 0.0], [1.0, 0.0], [0.0, -1.0]] + output_coords = [[-1.0, 0.0], [0.0, 0.0], [1.0, 0.0]] + return AdaptiveLinearNet.create( + genome, + config, + input_coords=input_coords, + output_coords=output_coords, + weight_threshold=0.4, + batch_size=batch_size, + activation=tanh_activation, + output_activation=tanh_activation, + device="cpu", + ) + + +def activate_net(net, states, debug=False, step_num=0): + if debug and step_num == 1: + print("\n" + "=" * 20 + " DEBUG " + "=" * 20) + print(net.delta_w_node) + print("W init: ", net.input_to_output[0]) + outputs = net.activate(states).numpy() + if debug and (step_num - 1) % 100 == 0: + print("\nStep {}".format(step_num - 1)) + print("Outputs: ", outputs[0]) + print("Delta W: ", net.delta_w[0]) + print("W: ", net.input_to_output[0]) + return np.argmax(outputs, axis=1) + + +@click.command() +@click.option("--n_generations", type=int, default=10000) +@click.option("--n_processes", type=int, default=1) +def run(n_generations, n_processes): + # Load the config file, which is assumed to live in + # the same directory as this script. + config_path = os.path.join(os.path.dirname(__file__), "neat.cfg") + config = neat.Config( + neat.DefaultGenome, + neat.DefaultReproduction, + neat.DefaultSpeciesSet, + neat.DefaultStagnation, + config_path, + ) + + envs = [t_maze.TMazeEnv(init_reward_side=i, n_trials=100) for i in [1, 0, 1, 0]] + + evaluator = MultiEnvEvaluator( + make_net, activate_net, envs=envs, batch_size=batch_size, max_env_steps=1000 + ) + + if n_processes > 1: + pool = multiprocessing.Pool(processes=n_processes) + + def eval_genomes(genomes, config): + fitnesses = pool.starmap( + evaluator.eval_genome, ((genome, config) for _, genome in genomes) + ) + for (_, genome), fitness in zip(genomes, fitnesses): + genome.fitness = fitness + + else: + + def eval_genomes(genomes, config): + for i, (_, genome) in enumerate(genomes): + try: + genome.fitness = evaluator.eval_genome( + genome, config, debug=DEBUG and i % 100 == 0 + ) + except Exception as e: + print(genome) + raise e + + pop = neat.Population(config) + stats = neat.StatisticsReporter() + pop.add_reporter(stats) + reporter = neat.StdOutReporter(True) + pop.add_reporter(reporter) + logger = LogReporter("log.json", evaluator.eval_genome) + pop.add_reporter(logger) + + winner = pop.run(eval_genomes, n_generations) + + print(winner) + final_performance = evaluator.eval_genome(winner, config) + print("Final performance: {}".format(final_performance)) + generations = reporter.generation + 1 + return generations + + +if __name__ == "__main__": + run() # pylint: disable=no-value-for-parameter diff --git a/examples/adaptive/neat.cfg b/examples/adaptive/neat.cfg new file mode 100644 index 0000000..0285ae4 --- /dev/null +++ b/examples/adaptive/neat.cfg @@ -0,0 +1,64 @@ +[NEAT] +pop_size = 10 +# Note: the fitness threshold will never be reached because +# we are controlling the termination ourselves based on simulation performance. +fitness_criterion = mean +fitness_threshold = 200.0 +reset_on_extinction = 0 + +[DefaultGenome] +num_inputs = 7 +num_hidden = 0 +num_outputs = 6 +initial_connection = partial_nodirect 0.5 +feed_forward = True +compatibility_disjoint_coefficient = 1.0 +compatibility_weight_coefficient = 3.0 +conn_add_prob = 0.03 +conn_delete_prob = 0.005 +node_add_prob = 0.02 +node_delete_prob = 0.005 +activation_default = random +activation_options = sigmoid abs gauss sin identity +activation_mutate_rate = 0.1 +aggregation_default = sum +aggregation_options = sum +aggregation_mutate_rate = 0.0 +bias_init_mean = 0.0 +bias_init_stdev = 0.1 +bias_replace_rate = 0.005 +bias_mutate_rate = 0.4 +bias_mutate_power = 0.01 +bias_max_value = 30.0 +bias_min_value = -30.0 +response_init_mean = 1.0 +response_init_stdev = 0.0 +response_replace_rate = 0.0 +response_mutate_rate = 0.1 +response_mutate_power = 0.01 +response_max_value = 30.0 +response_min_value = -30.0 + +weight_max_value = 30 +weight_min_value = -30 +weight_init_mean = 0.0 +weight_init_stdev = 1.0 +weight_mutate_rate = 0.94 +weight_replace_rate = 0.005 +weight_mutate_power = 0.1 +enabled_default = True +enabled_mutate_rate = 0.01 + +single_structural_mutation = True + +[DefaultSpeciesSet] +compatibility_threshold = 4.0 + +[DefaultStagnation] +species_fitness_func = max +max_stagnation = 15 +species_elitism = 1 + +[DefaultReproduction] +elitism = 1 +survival_threshold = 0.2 diff --git a/examples/simple/main.py b/examples/simple/main.py new file mode 100644 index 0000000..264b06d --- /dev/null +++ b/examples/simple/main.py @@ -0,0 +1,75 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +import click +import gym +import neat + +from pytorch_neat.multi_env_eval import MultiEnvEvaluator +from pytorch_neat.neat_reporter import LogReporter +from pytorch_neat.recurrent_net import RecurrentNet + +max_env_steps = 200 + + +def make_env(): + return gym.make("CartPole-v0") + + +def make_net(genome, config, bs): + return RecurrentNet.create(genome, config, bs) + + +def activate_net(net, states): + outputs = net.activate(states).numpy() + return outputs[:, 0] > 0.5 + + +@click.command() +@click.option("--n_generations", type=int, default=100) +def run(n_generations): + # Load the config file, which is assumed to live in + # the same directory as this script. + config_path = os.path.join(os.path.dirname(__file__), "neat.cfg") + config = neat.Config( + neat.DefaultGenome, + neat.DefaultReproduction, + neat.DefaultSpeciesSet, + neat.DefaultStagnation, + config_path, + ) + + evaluator = MultiEnvEvaluator( + make_net, activate_net, make_env=make_env, max_env_steps=max_env_steps + ) + + def eval_genomes(genomes, config): + for _, genome in genomes: + genome.fitness = evaluator.eval_genome(genome, config) + + pop = neat.Population(config) + stats = neat.StatisticsReporter() + pop.add_reporter(stats) + reporter = neat.StdOutReporter(True) + pop.add_reporter(reporter) + logger = LogReporter("neat.log", evaluator.eval_genome) + pop.add_reporter(logger) + + pop.run(eval_genomes, n_generations) + + +if __name__ == "__main__": + run() # pylint: disable=no-value-for-parameter diff --git a/examples/simple/neat.cfg b/examples/simple/neat.cfg new file mode 100644 index 0000000..f1f937b --- /dev/null +++ b/examples/simple/neat.cfg @@ -0,0 +1,61 @@ +# The `NEAT` section specifies parameters particular to the NEAT algorithm +# or the experiment itself. This is the only required section. +[NEAT] +fitness_criterion = max +fitness_threshold = 200 +pop_size = 250 +reset_on_extinction = 0 + +[DefaultGenome] +num_inputs = 4 +num_hidden = 1 +num_outputs = 1 +initial_connection = partial_direct 0.5 +feed_forward = True +compatibility_disjoint_coefficient = 1.0 +compatibility_weight_coefficient = 0.6 +conn_add_prob = 0.2 +conn_delete_prob = 0.2 +node_add_prob = 0.2 +node_delete_prob = 0.2 +activation_default = sigmoid +activation_options = sigmoid +activation_mutate_rate = 0.0 +aggregation_default = sum +aggregation_options = sum +aggregation_mutate_rate = 0.0 +bias_init_mean = 0.0 +bias_init_stdev = 1.0 +bias_replace_rate = 0.1 +bias_mutate_rate = 0.7 +bias_mutate_power = 0.5 +bias_max_value = 30.0 +bias_min_value = -30.0 +response_init_mean = 1.0 +response_init_stdev = 0.0 +response_replace_rate = 0.0 +response_mutate_rate = 0.0 +response_mutate_power = 0.0 +response_max_value = 30.0 +response_min_value = -30.0 + +weight_max_value = 30 +weight_min_value = -30 +weight_init_mean = 0.0 +weight_init_stdev = 1.0 +weight_mutate_rate = 0.8 +weight_replace_rate = 0.1 +weight_mutate_power = 0.5 +enabled_default = True +enabled_mutate_rate = 0.01 + +[DefaultSpeciesSet] +compatibility_threshold = 3.0 + +[DefaultStagnation] +species_fitness_func = max +max_stagnation = 20 + +[DefaultReproduction] +elitism = 2 +survival_threshold = 0.2 diff --git a/pytorch_neat/__init__.py b/pytorch_neat/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/pytorch_neat/activations.py b/pytorch_neat/activations.py new file mode 100644 index 0000000..4d96227 --- /dev/null +++ b/pytorch_neat/activations.py @@ -0,0 +1,55 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +import torch.nn.functional as F + + +def sigmoid_activation(x): + return torch.sigmoid(5 * x) + + +def tanh_activation(x): + return torch.tanh(2.5 * x) + + +def abs_activation(x): + return torch.abs(x) + + +def gauss_activation(x): + return torch.exp(-5.0 * x**2) + + +def identity_activation(x): + return x + + +def sin_activation(x): + return torch.sin(x) + + +def relu_activation(x): + return F.relu(x) + + +str_to_activation = { + 'sigmoid': sigmoid_activation, + 'tanh': tanh_activation, + 'abs': abs_activation, + 'gauss': gauss_activation, + 'identity': identity_activation, + 'sin': sin_activation, + 'relu': relu_activation, +} diff --git a/pytorch_neat/adaptive_linear_net.py b/pytorch_neat/adaptive_linear_net.py new file mode 100644 index 0000000..0d38203 --- /dev/null +++ b/pytorch_neat/adaptive_linear_net.py @@ -0,0 +1,174 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch + +from .activations import identity_activation, tanh_activation +from .cppn import clamp_weights_, create_cppn, get_coord_inputs + + +class AdaptiveLinearNet: + def __init__( + self, + delta_w_node, + input_coords, + output_coords, + weight_threshold=0.2, + weight_max=3.0, + activation=tanh_activation, + cppn_activation=identity_activation, + batch_size=1, + device="cuda:0", + ): + + self.delta_w_node = delta_w_node + + self.n_inputs = len(input_coords) + self.input_coords = torch.tensor( + input_coords, dtype=torch.float32, device=device + ) + + self.n_outputs = len(output_coords) + self.output_coords = torch.tensor( + output_coords, dtype=torch.float32, device=device + ) + + self.weight_threshold = weight_threshold + self.weight_max = weight_max + + self.activation = activation + self.cppn_activation = cppn_activation + + self.batch_size = batch_size + self.device = device + self.reset() + + def get_init_weights(self, in_coords, out_coords, w_node): + (x_out, y_out), (x_in, y_in) = get_coord_inputs(in_coords, out_coords) + + n_in = len(in_coords) + n_out = len(out_coords) + + zeros = torch.zeros((n_out, n_in), dtype=torch.float32, device=self.device) + + weights = self.cppn_activation( + w_node( + x_out=x_out, + y_out=y_out, + x_in=x_in, + y_in=y_in, + pre=zeros, + post=zeros, + w=zeros, + ) + ) + clamp_weights_(weights, self.weight_threshold, self.weight_max) + + return weights + + def reset(self): + with torch.no_grad(): + self.input_to_output = ( + self.get_init_weights( + self.input_coords, self.output_coords, self.delta_w_node + ) + .unsqueeze(0) + .expand(self.batch_size, self.n_outputs, self.n_inputs) + ) + + self.w_expressed = self.input_to_output != 0 + + self.batched_coords = get_coord_inputs( + self.input_coords, self.output_coords, batch_size=self.batch_size + ) + + def activate(self, inputs): + """ + inputs: (batch_size, n_inputs) + + returns: (batch_size, n_outputs) + """ + with torch.no_grad(): + inputs = torch.tensor( + inputs, dtype=torch.float32, device=self.device + ).unsqueeze(2) + + outputs = self.activation(self.input_to_output.matmul(inputs)) + + input_activs = inputs.transpose(1, 2).expand( + self.batch_size, self.n_outputs, self.n_inputs + ) + output_activs = outputs.expand( + self.batch_size, self.n_outputs, self.n_inputs + ) + + (x_out, y_out), (x_in, y_in) = self.batched_coords + + delta_w = self.cppn_activation( + self.delta_w_node( + x_out=x_out, + y_out=y_out, + x_in=x_in, + y_in=y_in, + pre=input_activs, + post=output_activs, + w=self.input_to_output, + ) + ) + + self.delta_w = delta_w + + self.input_to_output[self.w_expressed] += delta_w[self.w_expressed] + clamp_weights_( + self.input_to_output, weight_threshold=0.0, weight_max=self.weight_max + ) + + return outputs.squeeze(2) + + @staticmethod + def create( + genome, + config, + input_coords, + output_coords, + weight_threshold=0.2, + weight_max=3.0, + output_activation=None, + activation=tanh_activation, + cppn_activation=identity_activation, + batch_size=1, + device="cuda:0", + ): + + nodes = create_cppn( + genome, + config, + ["x_in", "y_in", "x_out", "y_out", "pre", "post", "w"], + ["delta_w"], + output_activation=output_activation, + ) + + delta_w_node = nodes[0] + + return AdaptiveLinearNet( + delta_w_node, + input_coords, + output_coords, + weight_threshold=weight_threshold, + weight_max=weight_max, + activation=activation, + cppn_activation=cppn_activation, + batch_size=batch_size, + device=device, + ) diff --git a/pytorch_neat/adaptive_net.py b/pytorch_neat/adaptive_net.py new file mode 100644 index 0000000..e7868bd --- /dev/null +++ b/pytorch_neat/adaptive_net.py @@ -0,0 +1,189 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +from .activations import tanh_activation +from .cppn import create_cppn, clamp_weights_, get_coord_inputs + + +class AdaptiveNet: + def __init__(self, + + w_ih_node, + b_h_node, + w_hh_node, + b_o_node, + w_ho_node, + delta_w_node, + # stateful_node, + + input_coords, + hidden_coords, + output_coords, + + weight_threshold=0.2, + activation=tanh_activation, + + batch_size=1, + device='cuda:0'): + + self.w_ih_node = w_ih_node + + self.b_h_node = b_h_node + self.w_hh_node = w_hh_node + + self.b_o_node = b_o_node + self.w_ho_node = w_ho_node + + self.delta_w_node = delta_w_node + # self.stateful_node = stateful_node + + self.n_inputs = len(input_coords) + self.input_coords = torch.tensor( + input_coords, dtype=torch.float32, device=device) + + self.n_hidden = len(hidden_coords) + self.hidden_coords = torch.tensor( + hidden_coords, dtype=torch.float32, device=device) + + self.n_outputs = len(output_coords) + self.output_coords = torch.tensor( + output_coords, dtype=torch.float32, device=device) + + self.weight_threshold = weight_threshold + + self.activation = activation + + self.batch_size = batch_size + self.device = device + self.reset() + + def get_init_weights(self, in_coords, out_coords, w_node): + (x_out, y_out), (x_in, y_in) = get_coord_inputs(in_coords, out_coords) + + n_in = len(in_coords) + n_out = len(out_coords) + + zeros = torch.zeros( + (n_out, n_in), dtype=torch.float32, device=self.device) + + weights = w_node(x_out=x_out, y_out=y_out, x_in=x_in, y_in=y_in, + pre=zeros, post=zeros, w=zeros) + clamp_weights_(weights, self.weight_threshold) + + return weights + + def reset(self): + with torch.no_grad(): + self.input_to_hidden = self.get_init_weights( + self.input_coords, self.hidden_coords, self.w_ih_node) + + bias_coords = torch.zeros( + (1, 2), dtype=torch.float32, device=self.device) + self.bias_hidden = self.get_init_weights( + bias_coords, self.hidden_coords, self.b_h_node).unsqueeze(0).expand( + self.batch_size, self.n_hidden, 1) + + self.hidden_to_hidden = self.get_init_weights( + self.hidden_coords, self.hidden_coords, self.w_hh_node).unsqueeze(0).expand( + self.batch_size, self.n_hidden, self.n_hidden) + + bias_coords = torch.zeros( + (1, 2), dtype=torch.float32, device=self.device) + self.bias_output = self.get_init_weights( + bias_coords, self.output_coords, self.b_o_node) + + self.hidden_to_output = self.get_init_weights( + self.hidden_coords, self.output_coords, self.w_ho_node) + + self.hidden = torch.zeros((self.batch_size, self.n_hidden, 1), + dtype=torch.float32) + + self.batched_hidden_coords = get_coord_inputs( + self.hidden_coords, self.hidden_coords, batch_size=self.batch_size) + # self.cppn_state = torch.zeros( + # (self.batch_size, self.n_hidden, self.n_hidden)) + + def activate(self, inputs): + ''' + inputs: (batch_size, n_inputs) + + returns: (batch_size, n_outputs) + ''' + with torch.no_grad(): + inputs = torch.tensor( + inputs, dtype=torch.float32, device=self.device).unsqueeze(2) + + self.hidden = self.activation(self.input_to_hidden.matmul(inputs) + + self.hidden_to_hidden.matmul(self.hidden) + + self.bias_hidden) + + outputs = self.activation( + self.hidden_to_output.matmul(self.hidden) + + self.bias_output) + + hidden_outputs = self.hidden.expand( + self.batch_size, self.n_hidden, self.n_hidden) + hidden_inputs = hidden_outputs.transpose(1, 2) + + (x_out, y_out), (x_in, y_in) = self.batched_hidden_coords + + self.hidden_to_hidden += self.delta_w_node( + x_out=x_out, y_out=y_out, x_in=x_in, y_in=y_in, + pre=hidden_inputs, post=hidden_outputs, + w=self.hidden_to_hidden) + # self.cppn_state = self.stateful_node.get_activs() + + return outputs.squeeze(2) + + @staticmethod + def create(genome, + config, + + input_coords, + hidden_coords, + output_coords, + + weight_threshold=0.2, + activation=tanh_activation, + batch_size=1, + device='cuda:0'): + + nodes = create_cppn( + genome, config, + ['x_in', 'y_in', 'x_out', 'y_out', 'pre', 'post', 'w'], + ['w_ih', 'b_h', 'w_hh', 'b_o', 'w_ho', 'delta_w']) + + w_ih_node = nodes[0] + b_h_node = nodes[1] + w_hh_node = nodes[2] + b_o_node = nodes[3] + w_ho_node = nodes[4] + delta_w_node = nodes[5] + + return AdaptiveNet(w_ih_node, + b_h_node, + w_hh_node, + b_o_node, + w_ho_node, + delta_w_node, + + input_coords, + hidden_coords, + output_coords, + + weight_threshold=weight_threshold, + activation=activation, + batch_size=batch_size, + device=device) diff --git a/pytorch_neat/aggregations.py b/pytorch_neat/aggregations.py new file mode 100644 index 0000000..d9660d9 --- /dev/null +++ b/pytorch_neat/aggregations.py @@ -0,0 +1,30 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from functools import reduce +from operator import mul + + +def sum_aggregation(inputs): + return sum(inputs) + + +def prod_aggregation(inputs): + return reduce(mul, inputs, 1) + + +str_to_aggregation = { + 'sum': sum_aggregation, + 'prod': prod_aggregation, +} diff --git a/pytorch_neat/cppn.py b/pytorch_neat/cppn.py new file mode 100644 index 0000000..b9986ad --- /dev/null +++ b/pytorch_neat/cppn.py @@ -0,0 +1,266 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +from neat.graphs import required_for_output + +from .activations import str_to_activation +from .aggregations import str_to_aggregation + + +class Node: + def __init__( + self, + children, + weights, + response, + bias, + activation, + aggregation, + name=None, + leaves=None, + ): + """ + children: list of Nodes + weights: list of floats + response: float + bias: float + activation: torch function from .activations + aggregation: torch function from .aggregations + name: str + leaves: dict of Leaves + """ + self.children = children + self.leaves = leaves + self.weights = weights + self.response = response + self.bias = bias + self.activation = activation + self.activation_name = activation + self.aggregation = aggregation + self.aggregation_name = aggregation + self.name = name + if leaves is not None: + assert isinstance(leaves, dict) + self.leaves = leaves + self.activs = None + self.is_reset = None + + def __repr__(self): + header = "Node({}, response={}, bias={}, activation={}, aggregation={})".format( + self.name, + self.response, + self.bias, + self.activation_name, + self.aggregation_name, + ) + child_reprs = [] + for w, child in zip(self.weights, self.children): + child_reprs.append( + " <- {} * ".format(w) + repr(child).replace("\n", "\n ") + ) + return header + "\n" + "\n".join(child_reprs) + + def activate(self, xs, shape): + """ + xs: list of torch tensors + """ + if not xs: + return torch.full(shape, self.bias) + inputs = [w * x for w, x in zip(self.weights, xs)] + try: + pre_activs = self.aggregation(inputs) + activs = self.activation(self.response * pre_activs + self.bias) + assert activs.shape == shape, "Wrong shape for node {}".format(self.name) + except Exception: + raise Exception("Failed to activate node {}".format(self.name)) + return activs + + def get_activs(self, shape): + if self.activs is None: + xs = [child.get_activs(shape) for child in self.children] + self.activs = self.activate(xs, shape) + return self.activs + + def __call__(self, **inputs): + assert self.leaves is not None + assert inputs + shape = list(inputs.values())[0].shape + self.reset() + for name in self.leaves.keys(): + assert ( + inputs[name].shape == shape + ), "Wrong activs shape for leaf {}, {} != {}".format( + name, inputs[name].shape, shape + ) + self.leaves[name].set_activs(inputs[name]) + return self.get_activs(shape) + + def _prereset(self): + if self.is_reset is None: + self.is_reset = False + for child in self.children: + child._prereset() # pylint: disable=protected-access + + def _postreset(self): + if self.is_reset is not None: + self.is_reset = None + for child in self.children: + child._postreset() # pylint: disable=protected-access + + def _reset(self): + if not self.is_reset: + self.is_reset = True + self.activs = None + for child in self.children: + child._reset() # pylint: disable=protected-access + + def reset(self): + self._prereset() # pylint: disable=protected-access + self._reset() # pylint: disable=protected-access + self._postreset() # pylint: disable=protected-access + + +class Leaf: + def __init__(self, name=None): + self.activs = None + self.name = name + + def __repr__(self): + return "Leaf({})".format(self.name) + + def set_activs(self, activs): + self.activs = activs + + def get_activs(self, shape): + assert self.activs is not None, "Missing activs for leaf {}".format(self.name) + assert ( + self.activs.shape == shape + ), "Wrong activs shape for leaf {}, {} != {}".format( + self.name, self.activs.shape, shape + ) + return self.activs + + def _prereset(self): + pass + + def _postreset(self): + pass + + def _reset(self): + self.activs = None + + def reset(self): + self._reset() + + +def create_cppn(genome, config, leaf_names, node_names, output_activation=None): + + genome_config = config.genome_config + required = required_for_output( + genome_config.input_keys, genome_config.output_keys, genome.connections + ) + + # Gather inputs and expressed connections. + node_inputs = {i: [] for i in genome_config.output_keys} + for cg in genome.connections.values(): + if not cg.enabled: + continue + + i, o = cg.key + if o not in required and i not in required: + continue + + if i in genome_config.output_keys: + continue + + if o not in node_inputs: + node_inputs[o] = [(i, cg.weight)] + else: + node_inputs[o].append((i, cg.weight)) + + if i not in node_inputs: + node_inputs[i] = [] + + nodes = {i: Leaf() for i in genome_config.input_keys} + + assert len(leaf_names) == len(genome_config.input_keys) + leaves = {name: nodes[i] for name, i in zip(leaf_names, genome_config.input_keys)} + + def build_node(idx): + if idx in nodes: + return nodes[idx] + node = genome.nodes[idx] + conns = node_inputs[idx] + children = [build_node(i) for i, w in conns] + weights = [w for i, w in conns] + if idx in genome_config.output_keys and output_activation is not None: + activation = output_activation + else: + activation = str_to_activation[node.activation] + aggregation = str_to_aggregation[node.aggregation] + nodes[idx] = Node( + children, + weights, + node.response, + node.bias, + activation, + aggregation, + leaves=leaves, + ) + return nodes[idx] + + for idx in genome_config.output_keys: + build_node(idx) + + outputs = [nodes[i] for i in genome_config.output_keys] + + for name in leaf_names: + leaves[name].name = name + + for i, name in zip(genome_config.output_keys, node_names): + nodes[i].name = name + + return outputs + + +def clamp_weights_(weights, weight_threshold=0.2, weight_max=3.0): + # TODO: also try LEO + low_idxs = weights.abs() < weight_threshold + weights[low_idxs] = 0 + weights[weights > 0] -= weight_threshold + weights[weights < 0] += weight_threshold + weights[weights > weight_max] = weight_max + weights[weights < -weight_max] = -weight_max + + +def get_coord_inputs(in_coords, out_coords, batch_size=None): + n_in = len(in_coords) + n_out = len(out_coords) + + if batch_size is not None: + in_coords = in_coords.unsqueeze(0).expand(batch_size, n_in, 2) + out_coords = out_coords.unsqueeze(0).expand(batch_size, n_out, 2) + + x_out = out_coords[:, :, 0].unsqueeze(2).expand(batch_size, n_out, n_in) + y_out = out_coords[:, :, 1].unsqueeze(2).expand(batch_size, n_out, n_in) + x_in = in_coords[:, :, 0].unsqueeze(1).expand(batch_size, n_out, n_in) + y_in = in_coords[:, :, 1].unsqueeze(1).expand(batch_size, n_out, n_in) + else: + x_out = out_coords[:, 0].unsqueeze(1).expand(n_out, n_in) + y_out = out_coords[:, 1].unsqueeze(1).expand(n_out, n_in) + x_in = in_coords[:, 0].unsqueeze(0).expand(n_out, n_in) + y_in = in_coords[:, 1].unsqueeze(0).expand(n_out, n_in) + + return (x_out, y_out), (x_in, y_in) diff --git a/pytorch_neat/dask_helpers.py b/pytorch_neat/dask_helpers.py new file mode 100644 index 0000000..3ed0560 --- /dev/null +++ b/pytorch_neat/dask_helpers.py @@ -0,0 +1,37 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import time + +from dask.distributed import Client + + +def setup_dask(scheduler, retries=-1): + if scheduler is None or scheduler == "{scheduler}": + print("Setting up local cluster...") + return Client() + succeeded = False + try_num = 0 + while not succeeded: + try_num += 1 + if try_num == retries: + raise Exception("Failed to connect to Dask client") + try: + client = Client(scheduler, timeout=60) + succeeded = True + except Exception as e: # pylint: disable=broad-except + print(e) + time.sleep(15) + + return client diff --git a/pytorch_neat/maze.py b/pytorch_neat/maze.py new file mode 100644 index 0000000..fea7714 --- /dev/null +++ b/pytorch_neat/maze.py @@ -0,0 +1,157 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging + +import gym +import numpy as np + +logger = logging.getLogger(__name__) + + +class MetaMazeEnv(gym.Env): + def __init__( + self, + size=7, + receptive_size=3, + episode_len=250, + wall_penalty=0.1, + extra_inputs=True, + ): + self.size = size + self.receptive_size = receptive_size + self.center = size // 2 + self.episode_len = episode_len + self.wall_penalty = wall_penalty + self.extra_inputs = extra_inputs + + self.reward = 0.0 + self.step_num = self.episode_len + self.reward_row_pos = self.center + self.reward_col_pos = self.center + self.row_pos = self.center + self.col_pos = self.center + + self.make_maze() + + def make_maze(self): + self.maze = np.ones((self.size, self.size)) # ones are walls + self.maze[1 : self.size - 1, 1 : self.size - 1].fill(0) + for row in range(1, self.size - 1): + for col in range(1, self.size - 1): + if row % 2 == 0 and col % 2 == 0: + self.maze[row, col] = 1 + self.maze[self.center, self.center] = 0 + + def render(self, mode="human"): + raise NotImplementedError() + + def state(self): + if self.extra_inputs: + state = np.zeros(self.receptive_size ** 2 + 3) + else: + state = np.zeros(self.receptive_size ** 2 + 1) + state[: self.receptive_size ** 2] = self.maze[ + self.row_pos + - self.receptive_size // 2 : self.row_pos + + self.receptive_size // 2 + + 1, + self.col_pos + - self.receptive_size // 2 : self.col_pos + + self.receptive_size // 2 + + 1, + ].flatten() + state[-1] = self.reward + if self.extra_inputs: + state[-2] = self.step_num + state[-3] = 1 # bias + return state + + def step(self, action): + assert action in {0, 1, 2, 3} + self.step_num += 1 + assert self.step_num <= self.episode_len + self.reward = 0.0 + + target_row = self.row_pos + target_col = self.col_pos + if action == 0: + target_row -= 1 + elif action == 1: + target_row += 1 + elif action == 2: + target_col -= 1 + elif action == 3: + target_col += 1 + + if self.maze[target_row, target_col] == 1: + self.reward = -self.wall_penalty + else: + self.row_pos = target_row + self.col_pos = target_col + + if self.row_pos == self.reward_row_pos and self.col_pos == self.reward_col_pos: + self.reward += 10.0 + self.row_pos = np.random.randint(1, self.size - 1) + self.col_pos = np.random.randint(1, self.size - 1) + while self.maze[self.row_pos, self.col_pos] == 1: + self.row_pos = np.random.randint(1, self.size - 1) + self.col_pos = np.random.randint(1, self.size - 1) + + return self.state(), self.reward, self.step_num == self.episode_len, {} + + def reset(self): + self.step_num = 0 + self.reward = 0 + self.row_pos = self.center + self.col_pos = self.center + self.reward_row_pos = self.reward_col_pos = 0 + while self.maze[self.reward_row_pos, self.reward_col_pos] == 1: + self.reward_row_pos = np.random.randint(1, self.size - 1) + self.reward_col_pos = np.random.randint(1, self.size - 1) + + return self.state() + + def __repr__(self): + return "MetaMazeEnv({}, step_num={}, pos={}, reward_pos={})".format( + self.maze, + self.step_num, + (self.row_pos, self.col_pos), + (self.reward_row_pos, self.reward_col_pos), + ) + + +class SimpleMazeEnv(MetaMazeEnv): + def __init__(self, size=4, receptive_size=3, episode_len=250, wall_penalty=0.0): + super().__init__( + size=size, + receptive_size=receptive_size, + episode_len=episode_len, + wall_penalty=wall_penalty, + ) + + def make_maze(self): + self.maze = np.ones((self.size, self.size)) # ones are walls + self.maze[1 : self.size - 1, 1 : self.size - 1].fill(0) + + def render(self, mode="human"): + raise NotImplementedError() + + def __str__(self): + return "SimpleMazeEnv({}, step_num={}, pos={}, reward_pos={})".format( + self.maze, + self.step_num, + (self.row_pos, self.col_pos), + (self.reward_row_pos, self.reward_col_pos), + ) diff --git a/pytorch_neat/multi_env_eval.py b/pytorch_neat/multi_env_eval.py new file mode 100644 index 0000000..3133e37 --- /dev/null +++ b/pytorch_neat/multi_env_eval.py @@ -0,0 +1,57 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np + + +class MultiEnvEvaluator: + def __init__(self, make_net, activate_net, batch_size=1, max_env_steps=None, make_env=None, envs=None): + if envs is None: + self.envs = [make_env() for _ in range(batch_size)] + else: + self.envs = envs + self.make_net = make_net + self.activate_net = activate_net + self.batch_size = batch_size + self.max_env_steps = max_env_steps + + def eval_genome(self, genome, config, debug=False): + net = self.make_net(genome, config, self.batch_size) + + fitnesses = np.zeros(self.batch_size) + states = [env.reset() for env in self.envs] + dones = [False] * self.batch_size + + step_num = 0 + while True: + step_num += 1 + if self.max_env_steps is not None and step_num == self.max_env_steps: + break + if debug: + actions = self.activate_net( + net, states, debug=True, step_num=step_num) + else: + actions = self.activate_net(net, states) + assert len(actions) == len(self.envs) + for i, (env, action, done) in enumerate(zip(self.envs, actions, dones)): + if not done: + state, reward, done, _ = env.step(action) + fitnesses[i] += reward + if not done: + states[i] = state + dones[i] = done + if all(dones): + break + + return sum(fitnesses) / len(fitnesses) diff --git a/pytorch_neat/neat_reporter.py b/pytorch_neat/neat_reporter.py new file mode 100644 index 0000000..06f2b88 --- /dev/null +++ b/pytorch_neat/neat_reporter.py @@ -0,0 +1,89 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json +import time +from pprint import pprint + +import numpy as np +from neat.reporting import BaseReporter + + +class LogReporter(BaseReporter): + def __init__(self, fnm, eval_best, eval_with_debug=False): + self.log = open(fnm, "a") + self.generation = None + self.generation_start_time = None + self.generation_times = [] + self.num_extinctions = 0 + self.eval_best = eval_best + self.eval_with_debug = eval_with_debug + self.log_dict = {} + + def start_generation(self, generation): + self.log_dict["generation"] = generation + self.generation_start_time = time.time() + + def end_generation(self, config, population, species_set): + ng = len(population) + self.log_dict["pop_size"] = ng + + ns = len(species_set.species) + self.log_dict["n_species"] = ns + + elapsed = time.time() - self.generation_start_time + self.log_dict["time_elapsed"] = elapsed + + self.generation_times.append(elapsed) + self.generation_times = self.generation_times[-10:] + average = np.mean(self.generation_times) + self.log_dict["time_elapsed_avg"] = average + + self.log_dict["n_extinctions"] = self.num_extinctions + + pprint(self.log_dict) + self.log.write(json.dumps(self.log_dict) + "\n") + + def post_evaluate(self, config, population, species, best_genome): + # pylint: disable=no-self-use + fitnesses = [c.fitness for c in population.values()] + fit_mean = np.mean(fitnesses) + fit_std = np.std(fitnesses) + + self.log_dict["fitness_avg"] = fit_mean + self.log_dict["fitness_std"] = fit_std + + self.log_dict["fitness_best"] = best_genome.fitness + + print("=" * 50 + " Best Genome: " + "=" * 50) + if self.eval_with_debug: + print(best_genome) + + best_fitness_val = self.eval_best( + best_genome, config, debug=self.eval_with_debug + ) + self.log_dict["fitness_best_val"] = best_fitness_val + + n_neurons_best, n_conns_best = best_genome.size() + self.log_dict["n_neurons_best"] = n_neurons_best + self.log_dict["n_conns_best"] = n_conns_best + + def complete_extinction(self): + self.num_extinctions += 1 + + def found_solution(self, config, generation, best): + pass + + def species_stagnant(self, sid, species): + pass diff --git a/pytorch_neat/recurrent_net.py b/pytorch_neat/recurrent_net.py new file mode 100644 index 0000000..1bf811b --- /dev/null +++ b/pytorch_neat/recurrent_net.py @@ -0,0 +1,215 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch +import numpy as np +from .activations import sigmoid_activation + + +# def sparse_mat(shape, conns): +# idxs, weights = conns +# if len(idxs) > 0: +# idxs = torch.LongTensor(idxs).t() +# weights = torch.FloatTensor(weights) +# mat = torch.sparse.FloatTensor(idxs, weights, shape) +# else: +# mat = torch.sparse.FloatTensor(shape[0], shape[1]) +# return mat + + +def dense_from_coo(shape, conns, dtype=torch.float64): + mat = torch.zeros(shape, dtype=dtype) + idxs, weights = conns + if len(idxs) == 0: + return mat + rows, cols = np.array(idxs).transpose() + mat[torch.tensor(rows), torch.tensor(cols)] = torch.tensor( + weights, dtype=dtype) + return mat + + +class RecurrentNet(): + def __init__(self, n_inputs, n_hidden, n_outputs, + input_to_hidden, hidden_to_hidden, output_to_hidden, + input_to_output, hidden_to_output, output_to_output, + hidden_responses, output_responses, + hidden_biases, output_biases, + batch_size=1, + use_current_activs=False, + activation=sigmoid_activation, + n_internal_steps=1, + dtype=torch.float64): + + self.use_current_activs = use_current_activs + self.activation = activation + self.n_internal_steps = n_internal_steps + self.dtype = dtype + + self.n_inputs = n_inputs + self.n_hidden = n_hidden + self.n_outputs = n_outputs + + if n_hidden > 0: + self.input_to_hidden = dense_from_coo( + (n_hidden, n_inputs), input_to_hidden, dtype=dtype) + self.hidden_to_hidden = dense_from_coo( + (n_hidden, n_hidden), hidden_to_hidden, dtype=dtype) + self.output_to_hidden = dense_from_coo( + (n_hidden, n_outputs), output_to_hidden, dtype=dtype) + self.hidden_to_output = dense_from_coo( + (n_outputs, n_hidden), hidden_to_output, dtype=dtype) + self.input_to_output = dense_from_coo( + (n_outputs, n_inputs), input_to_output, dtype=dtype) + self.output_to_output = dense_from_coo( + (n_outputs, n_outputs), output_to_output, dtype=dtype) + + if n_hidden > 0: + self.hidden_responses = torch.tensor(hidden_responses, dtype=dtype) + self.hidden_biases = torch.tensor(hidden_biases, dtype=dtype) + + self.output_responses = torch.tensor( + output_responses, dtype=dtype) + self.output_biases = torch.tensor(output_biases, dtype=dtype) + + self.reset(batch_size) + + def reset(self, batch_size=1): + if self.n_hidden > 0: + self.activs = torch.zeros( + batch_size, self.n_hidden, dtype=self.dtype) + else: + self.activs = None + self.outputs = torch.zeros( + batch_size, self.n_outputs, dtype=self.dtype) + + def activate(self, inputs): + ''' + inputs: (batch_size, n_inputs) + + returns: (batch_size, n_outputs) + ''' + with torch.no_grad(): + inputs = torch.tensor(inputs, dtype=self.dtype) + activs_for_output = self.activs + if self.n_hidden > 0: + for _ in range(self.n_internal_steps): + self.activs = self.activation(self.hidden_responses * ( + self.input_to_hidden.mm(inputs.t()).t() + + self.hidden_to_hidden.mm(self.activs.t()).t() + + self.output_to_hidden.mm(self.outputs.t()).t()) + + self.hidden_biases) + if self.use_current_activs: + activs_for_output = self.activs + output_inputs = (self.input_to_output.mm(inputs.t()).t() + + self.output_to_output.mm(self.outputs.t()).t()) + if self.n_hidden > 0: + output_inputs += self.hidden_to_output.mm( + activs_for_output.t()).t() + self.outputs = self.activation( + self.output_responses * output_inputs + self.output_biases) + return self.outputs + + @staticmethod + def create(genome, config, batch_size=1, activation=sigmoid_activation, + prune_empty=False, use_current_activs=False, n_internal_steps=1): + from neat.graphs import required_for_output + + genome_config = config.genome_config + required = required_for_output( + genome_config.input_keys, genome_config.output_keys, genome.connections) + if prune_empty: + nonempty = {conn.key[1] for conn in genome.connections.values() if conn.enabled}.union( + set(genome_config.input_keys)) + + input_keys = list(genome_config.input_keys) + hidden_keys = [k for k in genome.nodes.keys() + if k not in genome_config.output_keys] + output_keys = list(genome_config.output_keys) + + hidden_responses = [genome.nodes[k].response for k in hidden_keys] + output_responses = [genome.nodes[k].response for k in output_keys] + + hidden_biases = [genome.nodes[k].bias for k in hidden_keys] + output_biases = [genome.nodes[k].bias for k in output_keys] + + if prune_empty: + for i, key in enumerate(output_keys): + if key not in nonempty: + output_biases[i] = 0.0 + + n_inputs = len(input_keys) + n_hidden = len(hidden_keys) + n_outputs = len(output_keys) + + input_key_to_idx = {k: i for i, k in enumerate(input_keys)} + hidden_key_to_idx = {k: i for i, k in enumerate(hidden_keys)} + output_key_to_idx = {k: i for i, k in enumerate(output_keys)} + + def key_to_idx(key): + if key in input_keys: + return input_key_to_idx[key] + elif key in hidden_keys: + return hidden_key_to_idx[key] + elif key in output_keys: + return output_key_to_idx[key] + + input_to_hidden = ([], []) + hidden_to_hidden = ([], []) + output_to_hidden = ([], []) + input_to_output = ([], []) + hidden_to_output = ([], []) + output_to_output = ([], []) + + for conn in genome.connections.values(): + if not conn.enabled: + continue + + i_key, o_key = conn.key + if o_key not in required and i_key not in required: + continue + if prune_empty and i_key not in nonempty: + print('Pruned {}'.format(conn.key)) + continue + + i_idx = key_to_idx(i_key) + o_idx = key_to_idx(o_key) + + if i_key in input_keys and o_key in hidden_keys: + idxs, vals = input_to_hidden + elif i_key in hidden_keys and o_key in hidden_keys: + idxs, vals = hidden_to_hidden + elif i_key in output_keys and o_key in hidden_keys: + idxs, vals = output_to_hidden + elif i_key in input_keys and o_key in output_keys: + idxs, vals = input_to_output + elif i_key in hidden_keys and o_key in output_keys: + idxs, vals = hidden_to_output + elif i_key in output_keys and o_key in output_keys: + idxs, vals = output_to_output + else: + raise ValueError( + 'Invalid connection from key {} to key {}'.format(i_key, o_key)) + + idxs.append((o_idx, i_idx)) # to, from + vals.append(conn.weight) + + return RecurrentNet(n_inputs, n_hidden, n_outputs, + input_to_hidden, hidden_to_hidden, output_to_hidden, + input_to_output, hidden_to_output, output_to_output, + hidden_responses, output_responses, + hidden_biases, output_biases, + batch_size=batch_size, + activation=activation, + use_current_activs=use_current_activs, + n_internal_steps=n_internal_steps) diff --git a/pytorch_neat/strict_t_maze.py b/pytorch_neat/strict_t_maze.py new file mode 100644 index 0000000..b9aa725 --- /dev/null +++ b/pytorch_neat/strict_t_maze.py @@ -0,0 +1,178 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import random + +import gym +import numpy as np + + +class StrictTMazeEnv(gym.Env): + def __init__( + self, + hall_len=3, + n_trials=100, + wall_penalty=0.4, + init_reward_side=1, + reward_flip_mean=50, + reward_flip_range=15, + high_reward=1.0, + low_reward=0.2, + ): + self.hall_len = hall_len + self.n_trials = n_trials + self.trial_num = n_trials + self.init_reward_side = init_reward_side + self.reward_side = init_reward_side + self.reward_flip_mean = reward_flip_mean + self.reward_flip_range = reward_flip_range + self.wall_penalty = wall_penalty + self.high_reward = high_reward + self.low_reward = low_reward + + self.color = 0.0 + self.row_pos = self.hall_len + 1 + self.col_pos = hall_len + 1 + self.direction = 0 + self.reward_flip = reward_flip_mean + self.reset_trial_on_step = False + self.trial_num = self.n_trials + + self.make_maze() + + def make_maze(self): + self.maze = np.ones( + (self.hall_len + 3, 2 * self.hall_len + 3) + ) # ones are walls + self.maze[1:-1, self.hall_len + 1].fill(0) + self.maze[1, 1:-1].fill(0) + + def render(self, mode="human"): + raise NotImplementedError() + + def state(self): + state = np.zeros(4) + assert self.direction in {0, 1, 2, 3} + if self.direction == 0: # up + state[0] = self.maze[self.row_pos, self.col_pos - 1] + state[1] = self.maze[self.row_pos - 1, self.col_pos] + state[2] = self.maze[self.row_pos, self.col_pos + 1] + elif self.direction == 1: # right + state[0] = self.maze[self.row_pos - 1, self.col_pos] + state[1] = self.maze[self.row_pos, self.col_pos + 1] + state[2] = self.maze[self.row_pos + 1, self.col_pos] + elif self.direction == 2: # down + state[0] = self.maze[self.row_pos, self.col_pos + 1] + state[1] = self.maze[self.row_pos + 1, self.col_pos] + state[2] = self.maze[self.row_pos, self.col_pos - 1] + elif self.direction == 3: # left + state[0] = self.maze[self.row_pos + 1, self.col_pos] + state[1] = self.maze[self.row_pos, self.col_pos - 1] + state[2] = self.maze[self.row_pos - 1, self.col_pos] + state[3] = self.color + return state + + def reset_trial(self): + self.color = 0.0 + self.row_pos = self.hall_len + 1 + self.col_pos = self.hall_len + 1 + self.direction = 0 + if self.trial_num == self.reward_flip: + self.reward_side = 1 - self.reward_side + + def step(self, action): # pylint: disable=too-many-branches + assert action in {0, 1, 2} + + if self.reset_trial_on_step: + self.trial_num += 1 + self.reset_trial() + self.reset_trial_on_step = False + return self.state(), 0.0, self.trial_num == self.n_trials, {} + + assert self.trial_num < self.n_trials + + reward = 0 + self.color = 0 + + if action in {0, 2}: + if self.row_pos > 1: + reward -= self.wall_penalty + self.reset_trial_on_step = True + elif ( + self.row_pos == 1 + and self.col_pos == self.hall_len + 1 + and self.direction != 0 + ): # already turned at turning point, don't turn again + reward -= self.wall_penalty + self.reset_trial_on_step = True + elif ( + self.row_pos == 1 and self.col_pos != self.hall_len + 1 + ): # in cross of T, shouldn't be turning + reward -= self.wall_penalty + self.reset_trial_on_step = True + + if action == 0: + self.direction = (self.direction - 1) % 4 + elif action == 2: + self.direction = (self.direction + 1) % 4 + + if action == 1: + target_row = self.row_pos + target_col = self.col_pos + + if self.direction == 0: # up + target_row -= 1 + elif self.direction == 1: # right + target_col += 1 + elif self.direction == 2: # down + target_row += 1 + elif self.direction == 3: # left + target_col -= 1 + + if self.maze[target_row, target_col] == 1: + reward -= self.wall_penalty + self.reset_trial_on_step = True + else: + self.row_pos = target_row + self.col_pos = target_col + + if self.row_pos == 1 and self.col_pos == 1: + self.color = self.high_reward if self.reward_side == 0 else self.low_reward + reward += self.color + self.reset_trial_on_step = True + elif self.row_pos == 1 and self.col_pos == 2 * self.hall_len + 1: + self.color = self.high_reward if self.reward_side == 1 else self.low_reward + reward += self.color + self.reset_trial_on_step = True + + return self.state(), reward, False, {} + + def reset(self): + self.trial_num = 0 + self.reset_trial_on_step = False + self.reward_flip = self.reward_flip_mean + random.randint( + -self.reward_flip_range, self.reward_flip_range + ) + self.reward_side = self.init_reward_side + self.reset_trial() + return self.state() + + def __repr__(self): + return "TurningTMazeEnv({}, step_num={}, pos={}, direction={}, reward_side={})".format( + self.maze, + self.trial_num, + (self.row_pos, self.col_pos), + self.direction, + self.reward_side, + ) diff --git a/pytorch_neat/t_maze.py b/pytorch_neat/t_maze.py new file mode 100644 index 0000000..82efa95 --- /dev/null +++ b/pytorch_neat/t_maze.py @@ -0,0 +1,132 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import random + +import gym +import numpy as np + + +class TMazeEnv(gym.Env): + def __init__( + self, + hall_len=3, + n_trials=100, + wall_penalty=0.4, + init_reward_side=1, + reward_flip_mean=50, + reward_flip_range=15, + high_reward=1.0, + low_reward=0.2, + ): + self.hall_len = hall_len + self.n_trials = n_trials + self.trial_num = n_trials + self.init_reward_side = init_reward_side + self.reward_side = init_reward_side + self.reward_flip_mean = reward_flip_mean + self.reward_flip_range = reward_flip_range + self.wall_penalty = wall_penalty + self.high_reward = high_reward + self.low_reward = low_reward + + self.color = 0.0 + self.row_pos = self.hall_len + 1 + self.col_pos = hall_len + 1 + self.reward_flip = reward_flip_mean + self.reset_trial_on_step = False + self.trial_num = self.n_trials + + self.make_maze() + + def make_maze(self): + self.maze = np.ones( + (self.hall_len + 3, 2 * self.hall_len + 3) + ) # ones are walls + self.maze[1:-1, self.hall_len + 1].fill(0) + self.maze[1, 1:-1].fill(0) + + def render(self, mode="human"): + raise NotImplementedError() + + def state(self): + state = np.zeros(4) + state[0] = self.maze[self.row_pos, self.col_pos - 1] + state[1] = self.maze[self.row_pos - 1, self.col_pos] + state[2] = self.maze[self.row_pos, self.col_pos + 1] + state[3] = self.color + return state + + def reset_trial(self): + self.color = 0.0 + self.row_pos = self.hall_len + 1 + self.col_pos = self.hall_len + 1 + if self.trial_num == self.reward_flip: + self.reward_side = 1 - self.reward_side + + def step(self, action): + assert action in {0, 1, 2} + + if self.reset_trial_on_step: + self.trial_num += 1 + self.reset_trial() + self.reset_trial_on_step = False + return self.state(), 0.0, self.trial_num == self.n_trials, {} + + assert self.trial_num < self.n_trials + + target_row = self.row_pos + target_col = self.col_pos + if action == 0: + target_col -= 1 + elif action == 1: + target_row -= 1 + elif action == 2: + target_col += 1 + + reward = 0 + self.color = 0 + + if self.maze[target_row, target_col] == 1: + reward -= self.wall_penalty + self.reset_trial_on_step = True + else: + self.row_pos = target_row + self.col_pos = target_col + + if self.row_pos == 1 and self.col_pos == 1: + self.color = self.high_reward if self.reward_side == 0 else self.low_reward + reward += self.color + self.reset_trial_on_step = True + elif self.row_pos == 1 and self.col_pos == 2 * self.hall_len + 1: + self.color = self.high_reward if self.reward_side == 1 else self.low_reward + reward += self.color + self.reset_trial_on_step = True + + return self.state(), reward, False, {} + + def reset(self): + self.trial_num = 0 + self.reset_trial_on_step = False + self.reward_flip = self.reward_flip_mean + random.randint( + -self.reward_flip_range, self.reward_flip_range + ) + self.reward_side = self.init_reward_side + self.reset_trial() + return self.state() + + def __repr__(self): + return "TMazeEnv({}, step_num={}, pos={}, reward_side={})".format( + self.maze, self.trial_num, (self.row_pos, self.col_pos), self.reward_side + ) diff --git a/pytorch_neat/turning_t_maze.py b/pytorch_neat/turning_t_maze.py new file mode 100644 index 0000000..b982ed0 --- /dev/null +++ b/pytorch_neat/turning_t_maze.py @@ -0,0 +1,160 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import random + +import gym +import numpy as np + + +class TurningTMazeEnv(gym.Env): + def __init__( + self, + hall_len=3, + n_trials=100, + wall_penalty=0.4, + init_reward_side=1, + reward_flip_mean=50, + reward_flip_range=15, + high_reward=1.0, + low_reward=0.2, + ): + self.hall_len = hall_len + self.n_trials = n_trials + self.trial_num = n_trials + self.init_reward_side = init_reward_side + self.reward_side = init_reward_side + self.reward_flip_mean = reward_flip_mean + self.reward_flip_range = reward_flip_range + self.wall_penalty = wall_penalty + self.high_reward = high_reward + self.low_reward = low_reward + + self.color = 0.0 + self.row_pos = self.hall_len + 1 + self.col_pos = hall_len + 1 + self.direction = 0 + self.reward_flip = reward_flip_mean + self.reset_trial_on_step = False + self.trial_num = self.n_trials + + self.make_maze() + + def make_maze(self): + self.maze = np.ones( + (self.hall_len + 3, 2 * self.hall_len + 3) + ) # ones are walls + self.maze[1:-1, self.hall_len + 1].fill(0) + self.maze[1, 1:-1].fill(0) + + def render(self, mode="human"): + raise NotImplementedError() + + def state(self): + state = np.zeros(4) + assert self.direction in {0, 1, 2, 3} + if self.direction == 0: # up + state[0] = self.maze[self.row_pos, self.col_pos - 1] + state[1] = self.maze[self.row_pos - 1, self.col_pos] + state[2] = self.maze[self.row_pos, self.col_pos + 1] + elif self.direction == 1: # right + state[0] = self.maze[self.row_pos - 1, self.col_pos] + state[1] = self.maze[self.row_pos, self.col_pos + 1] + state[2] = self.maze[self.row_pos + 1, self.col_pos] + elif self.direction == 2: # down + state[0] = self.maze[self.row_pos, self.col_pos + 1] + state[1] = self.maze[self.row_pos + 1, self.col_pos] + state[2] = self.maze[self.row_pos, self.col_pos - 1] + elif self.direction == 3: # left + state[0] = self.maze[self.row_pos + 1, self.col_pos] + state[1] = self.maze[self.row_pos, self.col_pos - 1] + state[2] = self.maze[self.row_pos - 1, self.col_pos] + state[3] = self.color + return state + + def reset_trial(self): + self.color = 0.0 + self.row_pos = self.hall_len + 1 + self.col_pos = self.hall_len + 1 + self.direction = 0 + if self.trial_num == self.reward_flip: + self.reward_side = 1 - self.reward_side + + def step(self, action): + assert action in {0, 1, 2} + + if self.reset_trial_on_step: + self.trial_num += 1 + self.reset_trial() + self.reset_trial_on_step = False + return self.state(), 0.0, self.trial_num == self.n_trials, {} + + assert self.trial_num < self.n_trials + + reward = 0 + self.color = 0 + + if action == 0: + self.direction = (self.direction - 1) % 4 + elif action == 2: + self.direction = (self.direction + 1) % 4 + elif action == 1: + target_row = self.row_pos + target_col = self.col_pos + + if self.direction == 0: # up + target_row -= 1 + elif self.direction == 1: # right + target_col += 1 + elif self.direction == 2: # down + target_row += 1 + elif self.direction == 3: # left + target_col -= 1 + + if self.maze[target_row, target_col] == 1: + reward -= self.wall_penalty + self.reset_trial_on_step = True + else: + self.row_pos = target_row + self.col_pos = target_col + + if self.row_pos == 1 and self.col_pos == 1: + self.color = self.high_reward if self.reward_side == 0 else self.low_reward + reward += self.color + self.reset_trial_on_step = True + elif self.row_pos == 1 and self.col_pos == 2 * self.hall_len + 1: + self.color = self.high_reward if self.reward_side == 1 else self.low_reward + reward += self.color + self.reset_trial_on_step = True + + return self.state(), reward, False, {} + + def reset(self): + self.trial_num = 0 + self.reset_trial_on_step = False + self.reward_flip = self.reward_flip_mean + random.randint( + -self.reward_flip_range, self.reward_flip_range + ) + self.reward_side = self.init_reward_side + self.reset_trial() + return self.state() + + def __repr__(self): + return "TurningTMazeEnv({}, step_num={}, pos={}, direction={}, reward_side={})".format( + self.maze, + self.trial_num, + (self.row_pos, self.col_pos), + self.direction, + self.reward_side, + ) diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..3e2685c --- /dev/null +++ b/requirements.txt @@ -0,0 +1,5 @@ +neat-python==0.92 +numpy==1.14.3 +gym==0.10.5 +click==6.7 +torch==0.4.0 diff --git a/tests/__init__.py b/tests/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tests/test-config.cfg b/tests/test-config.cfg new file mode 100644 index 0000000..edfd5d7 --- /dev/null +++ b/tests/test-config.cfg @@ -0,0 +1,62 @@ +[NEAT] +pop_size = 150 +# Note: the fitness threshold will never be reached because +# we are controlling the termination ourselves based on simulation performance. +fitness_criterion = mean +fitness_threshold = 400.0 +reset_on_extinction = 0 + +[DefaultGenome] +num_inputs = 12 +num_hidden = 0 +num_outputs = 4 +initial_connection = partial_nodirect 0.5 +feed_forward = False +compatibility_disjoint_coefficient = 1.0 +compatibility_weight_coefficient = 1.0 +conn_add_prob = 0.15 +conn_delete_prob = 0.1 +node_add_prob = 0.15 +node_delete_prob = 0.1 +activation_default = clamped +activation_options = clamped +activation_mutate_rate = 0.0 +aggregation_default = sum +aggregation_options = sum +aggregation_mutate_rate = 0.0 +bias_init_mean = 0.0 +bias_init_stdev = 1.0 +bias_replace_rate = 0.02 +bias_mutate_rate = 0.8 +bias_mutate_power = 0.4 +bias_max_value = 30.0 +bias_min_value = -30.0 +response_init_mean = 1.0 +response_init_stdev = 0.0 +response_replace_rate = 0.0 +response_mutate_rate = 0.1 +response_mutate_power = 0.01 +response_max_value = 30.0 +response_min_value = -30.0 + +weight_max_value = 30 +weight_min_value = -30 +weight_init_mean = 0.0 +weight_init_stdev = 1.0 +weight_mutate_rate = 0.8 +weight_replace_rate = 0.02 +weight_mutate_power = 0.4 +enabled_default = True +enabled_mutate_rate = 0.01 + +[DefaultSpeciesSet] +compatibility_threshold = 3.0 + +[DefaultStagnation] +species_fitness_func = mean +max_stagnation = 15 +species_elitism = 4 + +[DefaultReproduction] +elitism = 2 +survival_threshold = 0.2 \ No newline at end of file diff --git a/tests/test-genome.pkl b/tests/test-genome.pkl new file mode 100644 index 0000000..8e7fd52 Binary files /dev/null and b/tests/test-genome.pkl differ diff --git a/tests/test_adaptive_linear.py b/tests/test_adaptive_linear.py new file mode 100644 index 0000000..b5a038f --- /dev/null +++ b/tests/test_adaptive_linear.py @@ -0,0 +1,242 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import torch + +from pytorch_neat.activations import identity_activation as identity +from pytorch_neat.adaptive_linear_net import AdaptiveLinearNet +from pytorch_neat.aggregations import sum_aggregation as sum_ag +from pytorch_neat.cppn import Leaf, Node + + +def slow_tanh(x): + return torch.tanh(0.5 * x) + + +def np_tanh(x): + return torch.tanh(0.5 * torch.tensor(x)).numpy() + + +def test_pre(): + leaves = { + name: Leaf(name=name) + for name in ["x_in", "y_in", "x_out", "y_out", "pre", "post", "w"] + } + + delta_w_node = Node( + [leaves["x_in"], leaves["x_out"], leaves["pre"]], + [1.0, 2.0, 3.0], + 1.0, + 0.0, + identity, + sum_ag, + name="delta_w", + leaves=leaves, + ) + + input_coords = [[-1.0, 0.0], [1.0, 0.0], [0.0, -1.0]] + output_coords = [[-1.0, 0.0], [1.0, 0.0]] + + net = AdaptiveLinearNet( + delta_w_node, + input_coords, + output_coords, + activation=slow_tanh, + cppn_activation=slow_tanh, + device="cpu", + ) + + w = np_tanh( + np.array( + [ + [-1.0 + 2 * -1.0, 1.0 + 2 * -1.0, 2 * -1.0], + [-1.0 + 2 * 1.0, 1.0 + 2 * 1.0, 2 * 1.0], + ], + dtype=np.float32, + ) + ) + w[np.abs(w) < 0.2] = 0 + w[w < 0] += 0.2 + w[w > 0] -= 0.2 + w[w > 3.0] = 3.0 + w[w < -3.0] = -3.0 + w_expressed = w != 0 + assert np.allclose(net.input_to_output.numpy(), w) + + for _ in range(3): + inputs = np.array([[-1.0, 2.0, 3.0]], dtype=np.float32) + outputs = net.activate(inputs) + activs = np.tanh(0.5 * w.dot(inputs[0])) + assert np.allclose(outputs, activs) + + delta_w = np_tanh( + np.array( + [ + [-1.0 + 2 * -1.0, 1.0 + 2 * -1.0, 2 * -1.0], + [-1.0 + 2 * 1.0, 1.0 + 2 * 1.0, 2 * 1.0], + ], + dtype=np.float32, + ) + + 3 * inputs + ) + # delta_w[np.abs(delta_w) < 0.2] = 0 + # delta_w[delta_w < 0] += 0.2 + # delta_w[delta_w > 0] -= 0.2 + w[w_expressed] += delta_w[w_expressed] + w[w > 3.0] = 3.0 + w[w < -3.0] = -3.0 + assert np.allclose(net.input_to_output.numpy(), w) + + +def test_w(): + leaves = { + name: Leaf(name=name) + for name in ["x_in", "y_in", "x_out", "y_out", "pre", "post", "w"] + } + + delta_w_node = Node( + [leaves["x_in"], leaves["x_out"], leaves["w"]], + [1.0, 2.0, 3.0], + 1.0, + 0.0, + identity, + sum_ag, + name="delta_w", + leaves=leaves, + ) + + input_coords = [[-1.0, 0.0], [1.0, 0.0], [0.0, -1.0]] + output_coords = [[-1.0, 0.0], [1.0, 0.0]] + + net = AdaptiveLinearNet( + delta_w_node, + input_coords, + output_coords, + activation=slow_tanh, + cppn_activation=slow_tanh, + device="cpu", + ) + + w = np_tanh( + np.array( + [ + [-1.0 + 2 * -1.0, 1.0 + 2 * -1.0, 2 * -1.0], + [-1.0 + 2 * 1.0, 1.0 + 2 * 1.0, 2 * 1.0], + ], + dtype=np.float32, + ) + ) + w[np.abs(w) < 0.2] = 0 + w[w < 0] += 0.2 + w[w > 0] -= 0.2 + w[w > 3.0] = 3.0 + w[w < -3.0] = -3.0 + w_expressed = w != 0 + assert np.allclose(net.input_to_output.numpy(), w) + + for _ in range(3): + inputs = np.array([[-1.0, 2.0, 3.0]]) + outputs = net.activate(inputs)[0] + activs = np.tanh(0.5 * w.dot(inputs[0])) + assert np.allclose(outputs, activs) + + delta_w = np_tanh( + np.array( + [ + [-1.0 + 2 * -1.0, 1.0 + 2 * -1.0, 2 * -1.0], + [-1.0 + 2 * 1.0, 1.0 + 2 * 1.0, 2 * 1.0], + ], + dtype=np.float32, + ) + + 3 * w + ) + # delta_w[np.abs(delta_w) < 0.2] = 0 + # delta_w[delta_w < 0] += 0.2 + # delta_w[delta_w > 0] -= 0.2 + w[w_expressed] += delta_w[w_expressed] + w[w > 3.0] = 3.0 + w[w < -3.0] = -3.0 + assert np.allclose(net.input_to_output.numpy(), w) + + +def test_post(): + leaves = { + name: Leaf(name=name) + for name in ["x_in", "y_in", "x_out", "y_out", "pre", "post", "w"] + } + + delta_w_node = Node( + [leaves["x_in"], leaves["x_out"], leaves["post"]], + [1.0, 2.0, 3.0], + 1.0, + 0.0, + identity, + sum_ag, + name="delta_w", + leaves=leaves, + ) + + input_coords = [[-1.0, 0.0], [1.0, 0.0], [0.0, -1.0]] + output_coords = [[-1.0, 0.0], [1.0, 0.0]] + + net = AdaptiveLinearNet( + delta_w_node, + input_coords, + output_coords, + activation=slow_tanh, + cppn_activation=slow_tanh, + device="cpu", + ) + + w = np_tanh( + np.array( + [ + [-1.0 + 2 * -1.0, 1.0 + 2 * -1.0, 2 * -1.0], + [-1.0 + 2 * 1.0, 1.0 + 2 * 1.0, 2 * 1.0], + ], + dtype=np.float32, + ) + ) + w[np.abs(w) < 0.2] = 0 + w[w < 0] += 0.2 + w[w > 0] -= 0.2 + w[w > 3.0] = 3.0 + w[w < -3.0] = -3.0 + w_expressed = w != 0 + assert np.allclose(net.input_to_output.numpy(), w) + + for _ in range(3): + inputs = np.array([[-1.0, 2.0, 3.0]]) + outputs = net.activate(inputs)[0] + activs = np.tanh(0.5 * w.dot(inputs[0])) + assert np.allclose(outputs, activs) + + delta_w = np_tanh( + np.array( + [ + [-1.0 + 2 * -1.0, 1.0 + 2 * -1.0, 2 * -1.0], + [-1.0 + 2 * 1.0, 1.0 + 2 * 1.0, 2 * 1.0], + ], + dtype=np.float32, + ) + + 3 * np.expand_dims(activs, 1) + ) + # delta_w[np.abs(delta_w) < 0.2] = 0 + # delta_w[delta_w < 0] += 0.2 + # delta_w[delta_w > 0] -= 0.2 + w[w_expressed] += delta_w[w_expressed] + w[w > 3.0] = 3.0 + w[w < -3.0] = -3.0 + assert np.allclose(net.input_to_output.numpy(), w) diff --git a/tests/test_cppn.py b/tests/test_cppn.py new file mode 100644 index 0000000..53ba56c --- /dev/null +++ b/tests/test_cppn.py @@ -0,0 +1,92 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import torch + +from pytorch_neat.activations import identity_activation as identity +from pytorch_neat.aggregations import sum_aggregation as sum_ag +from pytorch_neat.cppn import Leaf, Node + + +def assert_almost_equal(x, y, tol): + assert abs(x - y) < tol, "{!r} !~= {!r}".format(x, y) + + +def test_cppn_simple(): + shape = (2, 2) + x = Leaf(name="x") + y = Node([x], [1.0], 1.0, 0.0, identity, sum_ag, name="y") + z = Node([x], [1.0], 1.0, 0.0, identity, sum_ag, name="z") + x_activs = torch.full(shape, 3) + x.set_activs(x_activs) + assert np.allclose(x_activs, y.get_activs(shape).numpy()) + assert np.allclose(x_activs, z.get_activs(shape).numpy()) + + +def test_cppn_unconnected(): + shape = (2, 2) + x = Leaf(name="x") + y = Node([], [1.0], 1.0, 0.5, identity, sum_ag, name="y") + x_activs = torch.full(shape, 3) + x.set_activs(x_activs) + assert np.allclose(y.get_activs(shape).numpy(), np.full(shape, 0.5)) + + +def test_cppn_call(): + leaves = {"x": Leaf(name="x"), "y": Leaf(name="y")} + a = Node([leaves["x"]], [1.0], 1.0, 0.0, identity, sum_ag, name="a", leaves=leaves) + b = Node( + [leaves["x"], leaves["y"]], + [1.0, 1.0], + 1.0, + 0.0, + identity, + sum_ag, + name="b", + leaves=leaves, + ) + c = Node([a], [1.0], 1.0, 0.0, identity, sum_ag, leaves=leaves) + + shape = (2, 2) + a_activs = a(x=torch.full(shape, 0.5), y=torch.full(shape, 2.0)).numpy() + assert np.allclose(a_activs, np.full(shape, 0.5)) + b_activs = b(x=torch.full(shape, 1.5), y=torch.full(shape, 2.0)) + assert np.allclose(b_activs, np.full(shape, 3.5)) + c_activs = c(x=torch.full(shape, 5.5), y=torch.full(shape, 3.0)) + assert np.allclose(c_activs, np.full(shape, 5.5)) + + +def test_cppn_deep_call(): + leaves = {"x": Leaf(name="x"), "y": Leaf(name="y")} + a = Node([leaves["y"]], [1.0], 1.0, 0.0, identity, sum_ag, name="a", leaves=leaves) + b = Node( + [leaves["x"], a], + [1.0, 1.0], + 1.0, + 0.0, + identity, + sum_ag, + name="b", + leaves=leaves, + ) + c = Node([a], [1.0], 1.0, 0.0, identity, sum_ag, leaves=leaves) + + shape = (2, 2) + b_activs = b(x=torch.full(shape, 1.5), y=torch.full(shape, 2.0)) + assert np.allclose(b_activs, np.full(shape, 3.5)) + c_activs = c(x=torch.full(shape, 5.5), y=torch.full(shape, 3.0)) + assert np.allclose(c_activs, np.full(shape, 3.0)) + b_activs = b(x=torch.full(shape, 1.5), y=torch.full(shape, 2.0)) + assert np.allclose(b_activs, np.full(shape, 3.5)) diff --git a/tests/test_maze.py b/tests/test_maze.py new file mode 100644 index 0000000..0776326 --- /dev/null +++ b/tests/test_maze.py @@ -0,0 +1,132 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytest + +from pytorch_neat.maze import MetaMazeEnv + + +def test_default_initialization(): + env = MetaMazeEnv() + assert env.size == 7 + assert env.maze.shape == (7, 7) + assert env.receptive_size == 3 + assert env.episode_len == 250 + assert ( + env.maze + == [ + [1, 1, 1, 1, 1, 1, 1], + [1, 0, 0, 0, 0, 0, 1], + [1, 0, 1, 0, 1, 0, 1], + [1, 0, 0, 0, 0, 0, 1], + [1, 0, 1, 0, 1, 0, 1], + [1, 0, 0, 0, 0, 0, 1], + [1, 1, 1, 1, 1, 1, 1], + ] + ).all() + assert env.maze[3, 3] == 0 + assert env.center == 3 + + +def test_step_without_reset(): + env = MetaMazeEnv() + with pytest.raises(AssertionError): + env.step(3) + + +def test_render(): + env = MetaMazeEnv() + with pytest.raises(NotImplementedError): + env.render() + + +def test_step_with_reset(): + env = MetaMazeEnv() + obs = env.reset() + assert obs.shape == (12,) + assert env.row_pos == env.col_pos == 3 + obs, reward, done, _ = env.step(3) + assert obs.shape == (12,) + assert isinstance(reward, float) + assert isinstance(done, bool) + + +def test_step_reward(): + env = MetaMazeEnv() + obs = env.reset() + assert (obs == [1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0.0]).all() + env.reward_row_pos = env.reward_col_pos = 1 + assert env.row_pos == env.col_pos == 3 + + obs, reward, done, _ = env.step(2) + assert (obs == [0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0.0]).all() + assert not done + assert reward == 0 + assert env.row_pos == 3 + assert env.col_pos == 2 + + obs, reward, done, _ = env.step(1) + assert (obs == [0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 2, -0.1]).all() + assert not done + assert reward == -0.1 + assert env.row_pos == 3 + assert env.col_pos == 2 + + obs, reward, done, _ = env.step(2) + assert env.row_pos == 3 + assert env.col_pos == 1 + + obs, reward, done, _ = env.step(0) + assert env.row_pos == 2 + assert env.col_pos == 1 + + obs, reward, done, _ = env.step(0) + assert reward == 10 + assert obs[-1] == 10 + assert obs[-2] == 5 + assert obs[-3] == 1 + + +def test_no_extra(): + env = MetaMazeEnv(extra_inputs=False) + obs = env.reset() + assert (obs == [1, 0, 1, 0, 0, 0, 1, 0, 1, 0.0]).all() + env.reward_row_pos = env.reward_col_pos = 1 + assert env.row_pos == env.col_pos == 3 + + obs, reward, done, _ = env.step(2) + assert (obs == [0, 1, 0, 0, 0, 0, 0, 1, 0, 0.0]).all() + assert not done + assert reward == 0 + assert env.row_pos == 3 + assert env.col_pos == 2 + + obs, reward, done, _ = env.step(1) + assert (obs == [0, 1, 0, 0, 0, 0, 0, 1, 0, -0.1]).all() + assert not done + assert reward == -0.1 + assert env.row_pos == 3 + assert env.col_pos == 2 + + obs, reward, done, _ = env.step(2) + assert env.row_pos == 3 + assert env.col_pos == 1 + + obs, reward, done, _ = env.step(0) + assert env.row_pos == 2 + assert env.col_pos == 1 + + obs, reward, done, _ = env.step(0) + assert reward == 10 + assert obs[-1] == 10 diff --git a/tests/test_multi_env_eval.py b/tests/test_multi_env_eval.py new file mode 100644 index 0000000..ccbfd22 --- /dev/null +++ b/tests/test_multi_env_eval.py @@ -0,0 +1,101 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from pytorch_neat.multi_env_eval import MultiEnvEvaluator + + +class DummyEnv: + def __init__(self, ep_len=2, reward_mag=1): + self.ep_len = ep_len + self.reward_mag = reward_mag + self.reset() + + def step(self, action): + self.step_num += 1 + if action == 0: + reward = self.reward_mag + else: + reward = -self.reward_mag + return self.step_num, reward, self.step_num == self.ep_len, {} + + def reset(self): + self.step_num = 0 + return self.step_num + + +class EndlessEnv: + def step(self, _action): + assert self.step_num < 10 + self.step_num += 1 + return 0, 0, False, {} + + def reset(self): + self.step_num = 0 + return 0 + + +class DummyNet: + def __init__(self, actions): + self.actions = actions + + def activate(self, states): + return [actions[state] for actions, state in zip(self.actions, states)] + + +env_num = 1 + + +def make_env(): + global env_num + env = DummyEnv(1 + env_num, env_num) + env_num += 1 + return env + + +def make_endless_env(): + return EndlessEnv() + + +def make_net(_genome, _config, _batch_size): + return DummyNet( + [ + [0, 0], # r=2*1 + [0, 1, 0], # r=1*2 + [1, 0, 0, 0], # r=2*3 + [1, 1, 1, 0, 1], # r=-3*4 + ] + ) + + +def activate_net(net, states): + return net.activate(states) + + +def test_multi(): + evaluator = MultiEnvEvaluator( + make_net, activate_net, batch_size=4, make_env=make_env + ) + returns = evaluator.eval_genome(None, None) + assert returns == (2 + 2 + 6 - 12) / 4 + + +def test_endless(): + evaluator = MultiEnvEvaluator( + make_net, + activate_net, + batch_size=4, + make_env=make_endless_env, + max_env_steps=10, + ) + evaluator.eval_genome(None, None) diff --git a/tests/test_recurrent.py b/tests/test_recurrent.py new file mode 100644 index 0000000..b00a5f4 --- /dev/null +++ b/tests/test_recurrent.py @@ -0,0 +1,225 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pickle + +import neat +import numpy as np +import torch + +from pytorch_neat.activations import tanh_activation +from pytorch_neat.recurrent_net import RecurrentNet + + +def assert_almost_equal(x, y, tol): + assert abs(x - y) < tol, "{!r} !~= {!r}".format(x, y) + + +def test_unconnected(): + net = RecurrentNet( + n_inputs=1, + n_hidden=0, + n_outputs=1, + input_to_hidden=([], []), + hidden_to_hidden=([], []), + output_to_hidden=([], []), + input_to_output=([], []), + hidden_to_output=([], []), + output_to_output=([], []), + hidden_responses=[], + output_responses=[1.0], + hidden_biases=[], + output_biases=[0], + ) + + result = net.activate([[0.2]]) + assert result.shape == (1, 1) + assert_almost_equal(net.outputs[0, 0], 0.5, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + result = net.activate([[0.4]]) + assert result.shape == (1, 1) + assert_almost_equal(net.outputs[0, 0], 0.5, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + +def test_simple(): + net = RecurrentNet( + n_inputs=1, + n_hidden=0, + n_outputs=1, + input_to_hidden=([], []), + hidden_to_hidden=([], []), + output_to_hidden=([], []), + input_to_output=([(0, 0)], [1.0]), + hidden_to_output=([], []), + output_to_output=([], []), + hidden_responses=[], + output_responses=[1.0], + hidden_biases=[], + output_biases=[0], + ) + + result = net.activate([[0.2]]) + assert result.shape == (1, 1) + assert_almost_equal(net.outputs[0, 0], 0.731, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + result = net.activate([[0.4]]) + assert result.shape == (1, 1) + assert_almost_equal(net.outputs[0, 0], 0.881, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + +def test_hidden(): + net = RecurrentNet( + n_inputs=1, + n_hidden=1, + n_outputs=1, + input_to_hidden=([(0, 0)], [1.0]), + hidden_to_hidden=([], []), + output_to_hidden=([], []), + input_to_output=([], []), + hidden_to_output=([(0, 0)], [1.0]), + output_to_output=([], []), + hidden_responses=[1.0], + output_responses=[1.0], + hidden_biases=[0], + output_biases=[0], + use_current_activs=True, + ) + + result = net.activate([[0.2]]) + assert result.shape == (1, 1) + assert_almost_equal(net.activs[0, 0], 0.731, 0.001) + assert_almost_equal(net.outputs[0, 0], 0.975, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + result = net.activate([[0.4]]) + assert result.shape == (1, 1) + assert_almost_equal(net.activs[0, 0], 0.881, 0.001) + assert_almost_equal(net.outputs[0, 0], 0.988, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + +def test_recurrent(): + net = RecurrentNet( + n_inputs=1, + n_hidden=1, + n_outputs=1, + input_to_hidden=([(0, 0)], [1.0]), + hidden_to_hidden=([(0, 0)], [2.0]), + output_to_hidden=([], []), + input_to_output=([], []), + hidden_to_output=([(0, 0)], [1.0]), + output_to_output=([], []), + hidden_responses=[1.0], + output_responses=[1.0], + hidden_biases=[0], + output_biases=[0], + use_current_activs=True, + ) + + result = net.activate([[0.2]]) + assert result.shape == (1, 1) + assert_almost_equal(net.activs[0, 0], 0.731, 0.001) + assert_almost_equal(net.outputs[0, 0], 0.975, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + result = net.activate([[-1.4]]) + assert result.shape == (1, 1) + assert_almost_equal(net.activs[0, 0], 0.577, 0.001) + assert_almost_equal(net.outputs[0, 0], 0.947, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + +def test_dtype(): + net = RecurrentNet( + n_inputs=1, + n_hidden=1, + n_outputs=1, + input_to_hidden=([(0, 0)], [1.0]), + hidden_to_hidden=([(0, 0)], [2.0]), + output_to_hidden=([], []), + input_to_output=([], []), + hidden_to_output=([(0, 0)], [1.0]), + output_to_output=([], []), + hidden_responses=[1.0], + output_responses=[1.0], + hidden_biases=[0], + output_biases=[0], + use_current_activs=True, + dtype=torch.float32, + ) + + result = net.activate([[0.2]]) + assert result.shape == (1, 1) + assert_almost_equal(net.activs[0, 0], 0.731, 0.001) + assert_almost_equal(net.outputs[0, 0], 0.975, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + result = net.activate([[-1.4]]) + assert result.shape == (1, 1) + assert_almost_equal(net.activs[0, 0], 0.577, 0.001) + assert_almost_equal(net.outputs[0, 0], 0.947, 0.001) + assert result[0, 0] == net.outputs[0, 0] + + +def test_match_neat(): + with open("tests/test-genome.pkl", "rb") as f: + genome = pickle.load(f) + + # use tanh since neat sets output nodes with no inputs to 0 + # (sigmoid would output 0.5 for us) + def neat_tanh_activation(z): + return float(torch.tanh(2.5 * torch.tensor(z, dtype=torch.float64))) + + for node in genome.nodes.values(): + node.response = 0.5 + + config = neat.Config( + neat.DefaultGenome, + neat.DefaultReproduction, + neat.DefaultSpeciesSet, + neat.DefaultStagnation, + "tests/test-config.cfg", + ) + + for _ in range(500): + genome.mutate(config.genome_config) + # print(genome) + + neat_net = neat.nn.RecurrentNetwork.create(genome, config) + for i, (node, _activation, aggregation, bias, response, links) in enumerate( + neat_net.node_evals + ): + neat_net.node_evals[i] = ( + node, + neat_tanh_activation, + aggregation, + bias, + response, + links, + ) + + torch_net = RecurrentNet.create( + genome, config, activation=tanh_activation, prune_empty=True + ) + + for _ in range(5): + inputs = np.random.randn(12) + # print(inputs) + neat_result = neat_net.activate(inputs) + torch_result = torch_net.activate([inputs])[0].numpy() + assert np.allclose(neat_result, torch_result, atol=1e-8) diff --git a/tests/test_strict_t_maze.py b/tests/test_strict_t_maze.py new file mode 100644 index 0000000..8eb94d6 --- /dev/null +++ b/tests/test_strict_t_maze.py @@ -0,0 +1,361 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytest + +from pytorch_neat.multi_env_eval import MultiEnvEvaluator +from pytorch_neat.strict_t_maze import StrictTMazeEnv + + +def test_default_initialization(): + env = StrictTMazeEnv() + assert env.hall_len == 3 + assert env.n_trials == 100 + assert env.maze.shape == (6, 9) + assert ( + env.maze + == [ + [1, 1, 1, 1, 1, 1, 1, 1, 1], + [1, 0, 0, 0, 0, 0, 0, 0, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 1, 1, 1, 1, 1], + ] + ).all() + + +def test_step_without_reset(): + env = StrictTMazeEnv() + with pytest.raises(AssertionError): + env.step(1) + + +def test_render(): + env = StrictTMazeEnv() + with pytest.raises(NotImplementedError): + env.render() + + +def test_step_with_reset(): + env = StrictTMazeEnv() + obs = env.reset() + assert obs.shape == (4,) + assert env.row_pos == env.col_pos == 4 + assert (obs == [1, 0, 1, 0]).all() + + obs, reward, done, _ = env.step(0) + assert (obs == [1, 1, 0, 0]).all() + assert reward == -0.4 + assert not done + + obs, reward, done, _ = env.step(0) + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0.0 + assert not done + + +def test_full_trial(): + env = StrictTMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 1 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 1]).all() + assert reward == 1 + assert env.direction == 1 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_repeat_turn_penalty(): + env = StrictTMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(2) + assert env.direction == 2 + assert (obs == [0, 0, 0, 0]).all() + assert reward == -0.4 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert env.direction == 0 + + +def test_cross_turn_penalty(): + env = StrictTMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 1 + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(2) + assert env.direction == 2 + assert (obs == [0, 1, 0, 0]).all() + assert reward == -0.4 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert env.direction == 0 + + +def test_init_reward_side(): + env = StrictTMazeEnv(init_reward_side=0) + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(0) + assert env.direction == 3 + assert (obs == [0, 0, 1, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 3 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 1]).all() + assert reward == 1 + assert env.direction == 3 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_low_reward(): + env = StrictTMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(0) + assert env.direction == 3 + assert (obs == [0, 0, 1, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 3 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 0.2]).all() + assert reward == 0.2 + assert env.direction == 3 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_deployment(): + env = StrictTMazeEnv(n_trials=3) + for _ in range(5): + obs = env.reset() + for _ in range(3): + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 1 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 1]).all() + assert reward == 1 + assert env.direction == 1 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert done + + +def test_reward_flip(): + env = StrictTMazeEnv(n_trials=10, reward_flip_mean=5, reward_flip_range=3) + for _ in range(5): + obs = env.reset() + for i in range(10): + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 1 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs[:-1] == [1, 1, 1]).all() + assert reward == obs[-1] + assert reward in {0.2, 1.0} + if i < 2: + assert reward == 1.0 + elif i > 8: + assert reward == 0.2 + assert env.direction == 1 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert done + + +class OptimalNet: + def __init__(self, n_envs): + self.n_envs = n_envs + self.sides = [0] * n_envs + + def act(self, states): + actions = [] + for i, state in enumerate(states): + if all(state == [1, 0, 1, 0]): + actions.append(1) + elif all(state == [0, 1, 0, 0]): + actions.append(0 if self.sides[i] == 0 else 2) + elif all(state == [1, 0, 0, 0]): + actions.append(1) + elif all(state == [0, 0, 1, 0]): + actions.append(1) + elif all(state[:-1] == [1, 1, 1]): + actions.append(1) + assert state[-1] in {0.2, 1.0} + if state[-1] == 0.2: + self.sides[i] = 1 - self.sides[i] + else: + raise ValueError("Invalid state") + return actions + + +def make_net(_genome, _config, n_envs): + return OptimalNet(n_envs) + + +def activate_net(net, states): + return net.act(states) + + +def test_optimal(): + envs = [StrictTMazeEnv(init_reward_side=i, n_trials=100) for i in [1, 0, 1, 0]] + + evaluator = MultiEnvEvaluator( + make_net, activate_net, envs=envs, batch_size=4, max_env_steps=1600 + ) + + fitness = evaluator.eval_genome(None, None) + assert fitness == 98.8 diff --git a/tests/test_t_maze.py b/tests/test_t_maze.py new file mode 100644 index 0000000..4305c3f --- /dev/null +++ b/tests/test_t_maze.py @@ -0,0 +1,179 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytest + +from pytorch_neat.t_maze import TMazeEnv + + +def test_default_initialization(): + env = TMazeEnv() + assert env.hall_len == 3 + assert env.n_trials == 100 + assert env.maze.shape == (6, 9) + print(env.maze) + assert ( + env.maze + == [ + [1, 1, 1, 1, 1, 1, 1, 1, 1], + [1, 0, 0, 0, 0, 0, 0, 0, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 1, 1, 1, 1, 1], + ] + ).all() + + +def test_step_without_reset(): + env = TMazeEnv() + with pytest.raises(AssertionError): + env.step(1) + + +def test_render(): + env = TMazeEnv() + with pytest.raises(NotImplementedError): + env.render() + + +def test_step_with_reset(): + env = TMazeEnv() + obs = env.reset() + assert obs.shape == (4,) + assert env.row_pos == env.col_pos == 4 + assert (obs == [1, 0, 1, 0]).all() + obs, reward, done, _ = env.step(0) + assert (obs == [1, 0, 1, 0]).all() + assert reward == -0.4 + assert not done + + +def test_full_trial(): + env = TMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + for _ in range(3): + assert (obs == [0, 1, 0, 0]).all() + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert not done + assert (obs == [0, 1, 1, 1]).all() + assert reward == 1 + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_init_reward_side(): + env = TMazeEnv(init_reward_side=0) + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + for _ in range(3): + assert (obs == [0, 1, 0, 0]).all() + assert reward == 0 + obs, reward, done, _ = env.step(0) + assert not done + assert (obs == [1, 1, 0, 1]).all() + assert reward == 1 + obs, reward, done, _ = env.step(1) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_low_reward(): + env = TMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + for _ in range(3): + assert (obs == [0, 1, 0, 0]).all() + assert reward == 0 + obs, reward, done, _ = env.step(0) + assert not done + assert (obs == [1, 1, 0, 0.2]).all() + assert reward == 0.2 + obs, reward, done, _ = env.step(1) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_deployment(): + env = TMazeEnv(n_trials=3) + for _ in range(3): + obs = env.reset() + for _ in range(3): + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + for _ in range(3): + assert (obs == [0, 1, 0, 0]).all() + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert not done + assert (obs == [0, 1, 1, 1]).all() + assert reward == 1 + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert done + + +def test_reward_flip(): + env = TMazeEnv(n_trials=10, reward_flip_mean=5, reward_flip_range=3) + for _ in range(10): + obs = env.reset() + for i in range(10): + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + for _ in range(3): + assert (obs == [0, 1, 0, 0]).all() + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert not done + assert (obs[:-1] == [0, 1, 1]).all() + assert reward == obs[-1] + assert reward in {0.2, 1.0} + if i < 2: + assert reward == 1.0 + elif i > 8: + assert reward == 0.2 + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.row_pos == env.col_pos == 4 + assert done diff --git a/tests/test_turning_t_maze.py b/tests/test_turning_t_maze.py new file mode 100644 index 0000000..6a47fec --- /dev/null +++ b/tests/test_turning_t_maze.py @@ -0,0 +1,254 @@ +# Copyright (c) 2018 Uber Technologies, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytest + +from pytorch_neat.turning_t_maze import TurningTMazeEnv + + +def test_default_initialization(): + env = TurningTMazeEnv() + assert env.hall_len == 3 + assert env.n_trials == 100 + assert env.maze.shape == (6, 9) + assert ( + env.maze + == [ + [1, 1, 1, 1, 1, 1, 1, 1, 1], + [1, 0, 0, 0, 0, 0, 0, 0, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 0, 1, 1, 1, 1], + [1, 1, 1, 1, 1, 1, 1, 1, 1], + ] + ).all() + + +def test_step_without_reset(): + env = TurningTMazeEnv() + with pytest.raises(AssertionError): + env.step(1) + + +def test_render(): + env = TurningTMazeEnv() + with pytest.raises(NotImplementedError): + env.render() + + +def test_step_with_reset(): + env = TurningTMazeEnv() + obs = env.reset() + assert obs.shape == (4,) + assert env.row_pos == env.col_pos == 4 + assert (obs == [1, 0, 1, 0]).all() + + obs, reward, done, _ = env.step(0) + assert (obs == [1, 1, 0, 0]).all() + assert reward == 0.0 + assert not done + + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 0, 0]).all() + assert reward == -0.4 + assert not done + + +def test_full_trial(): + env = TurningTMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 1 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 1]).all() + assert reward == 1 + assert env.direction == 1 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_init_reward_side(): + env = TurningTMazeEnv(init_reward_side=0) + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(0) + assert env.direction == 3 + assert (obs == [0, 0, 1, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 3 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 1]).all() + assert reward == 1 + assert env.direction == 3 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_low_reward(): + env = TurningTMazeEnv() + obs = env.reset() + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(0) + assert env.direction == 3 + assert (obs == [0, 0, 1, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 3 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 0.2]).all() + assert reward == 0.2 + assert env.direction == 3 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert not done + + +def test_deployment(): + env = TurningTMazeEnv(n_trials=3) + for _ in range(5): + obs = env.reset() + for _ in range(3): + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 1 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs == [1, 1, 1, 1]).all() + assert reward == 1 + assert env.direction == 1 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert done + + +def test_reward_flip(): + env = TurningTMazeEnv(n_trials=10, reward_flip_mean=5, reward_flip_range=3) + for _ in range(5): + obs = env.reset() + for i in range(10): + for _ in range(3): + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + obs, reward, done, _ = env.step(1) + assert not done + assert reward == 0 + assert (obs == [0, 1, 0, 0]).all() + assert env.direction == 0 + assert reward == 0 + obs, reward, done, _ = env.step(2) + assert env.direction == 1 + assert (obs == [1, 0, 0, 0]).all() + assert reward == 0 + assert not done + for _ in range(2): + obs, reward, done, _ = env.step(1) + assert env.direction == 1 + assert (obs == [1, 0, 1, 0]).all() + assert reward == 0 + assert not done + obs, reward, done, _ = env.step(1) + assert (obs[:-1] == [1, 1, 1]).all() + assert reward == obs[-1] + assert reward in {0.2, 1.0} + if i < 2: + assert reward == 1.0 + elif i > 8: + assert reward == 0.2 + assert env.direction == 1 + assert not done + obs, reward, done, _ = env.step(2) + assert reward == 0 + assert (obs == [1, 0, 1, 0]).all() + assert env.direction == 0 + assert env.row_pos == env.col_pos == 4 + assert done