mlperf-deepcam

Short Description

The DeepCAM training application benchmark from the MLPerf HPC v0.5 benchmark suite, based on the Exascale Deep Learning for Climate Analytics paper which shared the 2018 Gordon Bell prize. The application trains a deep learning segmentation model for identifying extreme weather phenomena in climate simulation data.

Institution

Lawrence Berkeley National Laboratory

Sponsors

DOE/ASCR

Parent Application/Code

None

Keywords

climate, segmentation, machine learning, deep learning

Programming Languages/Paradigms

python, PyTorch

Website URL

https://github.com/azrael417/mlperf-deepcam/blob/master/README.md

Git/SVN Repository URL

https://github.com/azrael417/mlperf-deepcam

Release/Version Number

d9636a321eaa7b35f48557648866c54f3e93a103

Detailed description

This application is an updated version of the model and training code from the Exascale Deep Learning for Climate Analytics paper, adopted as a benchmark in the MLPerf HPC v0.5 suite. It involves training a deep neural network for semantic segmentation on CAM5 climate simulation data to predict pixel segmentation masks corresponding to three classes: atmospheric river, tropical cyclone, or background. The reference implementation for MLPerf HPC is written in the PyTorch framework and uses PyTorch's native distributed library for data-parallel training. The CAM5 dataset is stored in HDF5 format, is 8.8TB total, and is hosted at NERSC. Each image has size 768x1152 with 16 feature channels. The target objective in MLPerf HPC v0.5 is to train the model to a validation IOU > 0.82. However, the problem size can be scaled down and the training throughput can be used as the primary objective for a small scale or shorter timescale benchmark.