This repository contains the source codes of using multi-resolution data samples for training DeepCAM and CosmoFlow. The goal of this approach is to reduce the model training time while maintaing the same model accuracy. The detailed information about the multi-resolution data training can be found in the paper published in the CCGrid 2022 shown below.
-
DeepCAM is a parallel deep learning climate segmentation benchmark. The source codes of DeepCAM are available both on the github and MLPerf. The input training data files are available at GLOBUS.
-
CosmoFlow is a parallel deep learning application developed for studying data generated from cosmological N-body dark matter simulations. The source codes of CosmoFlow are available on both github and MLPerf. The CosmoFlow source codes in this repo have been updated to incorporate the LBANN model and parallelized using Horovod. The training data files are available from NERSC.
- Kewei Wang, Sunwoo Lee, Jan Balewski, Alex Sim, Peter Nugent, Ankit Agrawal, Alok Choudhary, Kesheng Wu, and Wei-keng Liao. Using Multi-Resolution Data to Accelerate Neural Network Training in Scientific Applications. In the 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 2022.
- Northwestern University
- Kewei Wang <[email protected]>
- Sunwoo Lee <[email protected]>
- Wei-keng Liao <[email protected]> (point of contact)
- Lawrence Berkeley National Laboratory
- Alex Sim <[email protected]>
- Jan Balewski <[email protected]>
- Peter Nugent <[email protected]>
- John Wu <[email protected]>
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. This project is a joint work of Northwestern University and Lawrence Berkeley National Laboratory supported by the RAPIDS Institute. This work is also supported in part by the DOE awards, United States DE-SC0014330 and DE-SC0019358.