Abstract
ML (Machine Learning) has fueled innovation across a broad range of applications including autonomous vehicles, speech and facial recognition, genomics, manufacturing, fraud detection, financial analytics and more. Current GPU advancements allow data scientists to build increasingly complex models at multi-petabyte scale based on data lakes approaching exabytes in total capacity. GPU-accelerated servers (and dedicated GPU computation engines such as the NVIDIA DGX platform) are stressing datacenter infrastructure, requiring a new approach to data access to avoid GPU “IO Starvation”.
Teams of researchers and data scientists should be freed to focus on their work and not waste their extremely valuable time performing file system housecleaning chores and wrestling with IO bottlenecks. Using realworld examples and extensive benchmark results this talk will illustrate these emerging data infrastructure challenges and discuss a compelling alternative solution from WekaIO.