- The training of supervised learning is improved, and the batch_size and sequence_length are different from those in reinforcement learning, which enhances the learning effect;
- For reinforcement learning, we use use_action_type_mask now, which improves learning efficiency;
- Added options to turn off entity_encoder and autoregressive_embedding for testing the impact of these modules;
- Fixed some known issues;
- A trained RL model is provided, which is first trained by the SL method and then be trained by the RL method;
- Use mimic_forward to replace forward in "rl_unroll", which increase the training accuracy;
- Make RL training supports multi-GPU now;
- Make RL training supports multi-process training based on multi-GPU now;
- Use new architecture for RL loss, which reduces 86% GPU memory;
- Use new architecture for RL to increase the sampling speed by 6x faster;
- Validate UPGO and V-trace loss again;
- By a "multi-process plus multi-thread" training, increase the sampling speed more by 197%;
- Fix the GPU memory leak and reduce the CPU memory leak;
- Increase the RL training win rate (without units loss) on level-2 to 0.57!
- First success in selecting Probes to build Pylons in the correct positions;
- Increase the selection accuracy in SL and initial status of RL;
- Improve the win rate against the built-in AI;
- Increase the win rate against built-in AI to 0.8 and the killed points to 5900;
- Add result videos;
- Improve baseline (alias for state value estimation in RL) in accuracy and speed;
- Improve the reproduce ability of the RL trained results (use random seed and single thread);
- Greatly re-factor the code writing of the RL loss, add the entity mask;
- Fix the big loss problem in the RL loss calculation by the outlier_remove function;
- Reduce the code lines of rl_loss by 58.2%;
- Improve log_prob, KL, entropy coding;
- Reduce the code lines of rl_algo by 31.4%;
- We refactor most of the codes, from transforming replays to reinforcement learning;
- Through new architecture, the transformed replay tensor files have saved 70% disk space now;
- Memory usage is also optimized. Now the number of processing replays in SL is 3 times or more than before;
- With the new tensor replay files, the training speed of SL is much faster, about 10x faster than before;
- Using weighted loss makes results of SL significantly increased. The accuracy of actions rises to above 90%;
- By a new SL training pattern, the accuracy of the action arguments like units improved from 0.05 to 0.95;
- The loss of RL calculations are refactored, making the readability improved;
- Due to the optimized algorithm, the sampling speed of RL is now 10x faster than before.
- Add mask for supervised learning loss calculation;
- Add camera and non-camera accuracy for evaluation;
- Refactor most code for multi-GPU training;
- Add support for multi-GPU supervised training;
- Add guides for "how run RL?" in USAGE.MD;
- Fix some bugs and a bug in SL training;
- Add USAGE.MD;
- Fix a bug by split the map_channels and scatter_channels;
- Fix the right version to see replays of AlphaStar Final Terran;
- Change from z_2 = F.relu(self.fc_2(z_1)) to z_2 = self.fc_2(F.relu(z_1)) in selected_units_head;
- Add relu after each downsampling conv2d in spatial encoder;
- Change the bias of most conv and convtranspose from False to True (if not has a bn after or in 3rd lib);
- Add "masked by the missing entries" in entity_encoder;
- Formalized a lib function to be used: unit_tpye_to_unit_type_index;
- Fix a TODO in calculate_unit_counts_bow() by unit_tpye_to_unit_type_index();
- Change back to use All_Units_Size equals to all unit_type_index;
- Add scatter entities map in spatial_encoder;
- Change to using the win-loss reward in winloss_baseline (the thing AlphaStar should do);
- Refine pseudo reward, scale Leven reward, right log prob ( which should be negative CrossEntropy);
- Fix some problems due to wrong original codes of AlphaStar;
- Add analysis of move camera count in the AlphaStar replay;
- Make some improvements for SL training;
- Fix some bug in SL training;
- Fix a RL training bug due to the cudnn of GPU and regroup the directory;
- Add time decay scale for unit count reward (hamming distance);
- Add the code for analyzing the statistics of replays;
- Implement the RL training with the z reward (from the experts' statistics in replays);
- Replenish the entropy_loss_for_all_arguments;
- True implementation of teacher_logits in human_policy_kl_loss and test all the above RL training on server;
- Decouple the sl_loss with the agent, and pass the test of SL training on the server;
- Fix some warnings due to PyTorch 1.5, and fill up many TODOs;
- Fix the clipped_rhos calculation problem for v-trace policy gradient loss;
- Fix the filter_by function for the other 5 arguments in the vtrace_pg_loss;
- Implement the vtrace_advantages method for the all 6 arguments of actions;
- Complete the implementation of split_vtrace_pg_loss for all arguments;
- Replenish the right process flow for baselinse of build_order, built_units, upgrades, effects;
- Fix the 5 other arguments loss calculation in the UPGO;
- Add the implement the unit_counts_bow;
- Add the implementation of the build_order;
- Implement upgrade, effect, last and available action, plus time coding;
- Add reward calculation for build order and unit counts
- Add the calculation of td_lambda_loss based on the reward of the Levenstein and Hamming;
- Fix the eval problem in SL, add support that to use SL model to fine-tune RL training;
- Fix the too slow speed problem of the SL training;
- Fix the Rl training problem and prepared the code of testing against the built-in AI;
- Add code for Levenshtein and Hamming distance;
- Add the function for fighting against built-in AI computer;
- Fix the SL training loss problem for select units;
- We release the mini-AlphaStar project (v_0.7), which is a mini source version of the original AlphaStar program by DeepMind.
- "v_0.7" means we think we have implemented above 70 percent code of it.
- "mini" means that we make the original AlphaStar hyperparameter adjustable so that it can run on a small scale.