mlperf-deepcam

Short Description
The DeepCAM training application benchmark from the MLPerf HPC v0.5 benchmark suite, based on the Exascale Deep Learning for Climate Analytics paper which shared the 2018 Gordon Bell prize. The application trains a deep learning segmentation model for identifying extreme weather phenomena in climate simulation data.
Institution
Lawrence Berkeley National Laboratory
Sponsors
DOE/ASCR
Parent Application/Code
None
Keywords
climate, segmentation, machine learning, deep learning
Programming Languages/Paradigms
python, PyTorch
Release/Version Number
d9636a321eaa7b35f48557648866c54f3e93a103
Detailed description
This application is an updated version of the model and training code from the Exascale Deep Learning for Climate Analytics paper, adopted as a benchmark in the MLPerf HPC v0.5 suite. It involves training a deep neural network for semantic segmentation on CAM5 climate simulation data to predict pixel segmentation masks corresponding to three classes: atmospheric river, tropical cyclone, or background. The reference implementation for MLPerf HPC is written in the PyTorch framework and uses PyTorch's native distributed library for data-parallel training. The CAM5 dataset is stored in HDF5 format, is 8.8TB total, and is hosted at NERSC. Each image has size 768x1152 with 16 feature channels. The target objective in MLPerf HPC v0.5 is to train the model to a validation IOU > 0.82. However, the problem size can be scaled down and the training throughput can be used as the primary objective for a small scale or shorter timescale benchmark.