CCF-1919113
SPX:
Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications

PIs: Ali R. Butt, Kirk W. Cameron, Yue Cheng, and Xun Jian

Distributed Systems & Storage Lab.

Table of Contents:

  1. Project Overview
  2. Motivation
  3. Team Members
  4. Major Outcomes
  5. Completed Thesis
  6. Summary of Significant Results
  7. Broader Impact
  8. Publications

Project Overview:

METIS is a memory-assisted, efficient, and high-performance storage subsystem that uses novel, holistic, end-to-end, hardware-supported memory and storage abstractions attuned to the demands of DL HPC applications. The project is designed to adopt novel main memory compression architectures, use the freed-up physical memory to develop a cooperative in-memory I/O cache, architect a high-performance NVMe burst buffer as a backend for the cache, and explore comprehensive power models to capture the impact of I/O re-design. The fundamental novelty and scientific value of this research can be summarized into four tightly coupled research thrusts.

overview

Metis architecture overview.

Top

Motivation:

Top

Metis Team:

PIs:

  1. Ali R. Butt
  2. Kirk W. Cameron
  3. Yue Cheng
  4. Xun Jian

PhD:

  1. Arnab K. Paul
  2. Jingoo Han
  3. Redwan Ibne Seraj Khan

MSc:

  1. Subil Abraham

REU Scholars:

  1. ?

The PIs are committed to supporting women in systems research. The Metis project involves multiple women in various roles (PhD, Msc, and REU).

Top

Major Outcomes:

Top

Summary of Significant Results:

Top

Completed Thesis and student supervision:

Top

Broader Impact:

Top

Publications:

  • Jingoo Han, Luna Xu, M. Mustafa Rafique, Ali R. Butt, and Seung-Hwan Lim. A Quantitative Study of Deep Learning Training on Heterogeneous Supercomputers. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster), Albuquerque, NM, pages 12, September 2019.
  • Top