Streamlining Scientific Workflows: Computational Storage Strategies for HPC

webinar

Author(s)/Presenter(s):

Dominic Manno

Library Content Type

Presentation

Library Release Date

Focus Areas

Computational Storage

Abstract

Scientific simulation in HPC data centers generates and analyzes large datasets to gain insight and test hypotheses. Exploring magnetic reconnection, simulating plasmas flowing over one another, determining particle interactions and trajectories, and simulating an asteroid impact large enough to cause dinosaur extinction are some examples of scientific simulation. These workflows at large scales are time consuming and resource intensive. Single timestep datasets are tens of terabytes and can easily grow into petabyte scale, demanding highly performant storage systems. It often isn’t enough to simply provide high throughput file systems and storage media, instead we must work with domain scientists to understand their workflows then push to gain efficiency throughout the layers of I/O. Computational storage plays a key role in improving this efficiency by providing the capability for function pushdown of storage server operations, data filtering, data indexing, and more. Pairing this alongside popular industry file formats, open-source file systems and protocols, data analytics engines, open scientific datasets, and great industry partners, we have produced multiple proof-of-concepts that show how impactful computational storage can be in scientific simulation. These same principals can be extended to other computing verticals. In this talk we will cover these scientific workflows and how they can benefit from different computational storage configurations.