Zoned Block Device Support in Hadoop HDFS

webinar

Author(s)/Presenter(s):

Shin'ichiro Kawasaki

Library Content Type

Presentation

Library Release Date

Focus Areas

Abstract

Zoned storage devices are a class of block devices with an address space that is divided into zones which, unlike regular storage devices, can only be written sequentially. The most common form of zoned storage today are Shingled Magnetic Recording (SMR) HDDs. This type of disk allows higher capacities without a significant device manufacturing cost increase, thereby resulting in overall storage cost reductions. Support for zoned block devices (ZBD) was introduced in Linux with kernel version 4.10. This support provides an interface for user applications to manipulate zones of a zoned device and also guarantee that writes issued sequentially will be delivered in the same order to the disk, thereby meeting the device sequential write constraint. Hadoop HDFS is a well known distributed file system with high scalability properties, making it an ideal choice for big data computing applications. HDFS is designed for large data sets written mostly sequentially with a streaming like access pattern. This characteristic is ideal for zoned device support, facilitating direct access to the device from HDFS rather than relying on an underlying local file system with ZBD support, an approach that potentially has higher overhead due to the file system garbage collection activity. This talk introduces a candidate implementation of ZBD support in Hadoop HDFS based on the simple Linux zonefs file system. This file system exposes the zones of a zoned device as files. HDFS data blocks are themselves stored in zonefs files. Symbolic links reference the zonefs files in HDFS block file directory structure. File I/Os unique to zonefs files are encapsulated with a new I/O provider. The presentation will give an overview of this implementation and discuss performance results, comparing the performance of HDFS without any modification using a ZBD compliant local file system (btrfs) with the performance obtained with the direct access zonefs approach. The benefits in terms of lower software complexity of this latter approach will also be addressed.

Learning Objectives

Understand zoned storage devices,Learn SMR HDD benefits in Hadoop HDFS systems,Learn zonefs use case