Note: This agenda is a work in progress. Check back for updates on additional sessions as well as the agenda schedule.
Big Data is a ‘buzz’ word. There are lots of systems and software that claim to be doing Big Data. How do you separate noise from facts?
In this talk we aim to give you a comprehensive coverage of modern Big Data landscape. We will talk about various components , open source projects that make up Big Data ‘barn’. The talk will cover a few Big Data use cases and recommended designs for those use cases.
Intended Audience
Developers / Managers / Architects / Directors
Level
Beginner / Intermediate
Today, if events change the decision model, we wait until the next batch model build for new insights. By extending fast “time-to-decisions” into the world of Big Data Analytics to get fast “time-to-insights”, apps will get what used to be batch insights in near real time. Enabling this is technology such as smart in-memory data storage, new storage class memory, and products designed to do one or more parts of an analysis pipeline very well. In this talk we describe how Ampool is building on Apache Geode to allow Big Data analysis solutions to work together with a scalable smart storage class memory layer to allow fast and complex end to end pipelines to be built- closing the loop and providing dramatically lower time to critical insights.
Learning Objectives
Moving big data in and out of the cloud has presented an insurmountable challenge for organizations looking to leverage the cloud for big data applications. Typical file transfer acceleration "gateways" upload data to cloud object storage in two phases, which introduces significant delays, limits the size of the data that can be transferred, and increases local storage costs, and machine compute time and costs. This session will describe direct-to-cloud capabilities that achieve maximum end-to-end transfer speeds and scale out of storage through direct integration with the underlying object storage interfaces, enabling transferred data to be written directly to object storage and available immediately when the transfer completes. It will explore how organizations across different industries are using direct-to-cloud technology for applications that require the movement of gigabytes, terabytes or petabytes of data in, out and across the cloud.
Learning Objectives
As datasets continue to grow, storage has increasingly become the critical bottleneck for enterprises leveraging Big Data frameworks like Spark, MapReduce, Flink, etc. The frameworks themselves are driving much of the exciting innovation in Big Data, but the complexity of the underlying storage systems is slowing the pace that data assets can be leveraged by these frameworks. Traditional storage architectures are inadequate for distributed computing and the size of today’s datasets.
In this talk, Haoyuan Li, co-creator of Tachyon (and a founding committer of Spark) and CEO of Tachyon Nexus will explain how the next wave of innovation in storage will be driven by separating the functional layer from the persistent storage layer, and how memory-centric architecture through Tachyon is making this possible. Li will describe the future of distributed file storage and highlight how Tachyon supports specific use cases.
Li will share the vision of the Tachyon project, highlight exciting new capabilities, and give a preview at upcoming new features. The project is one of the fastest growing big data open source projects. It is deployed at many companies, including Alibaba, Baidu and Barclays. Tachyon manages hundreds of machines in some of the production deployments and brings orders of magnitude improvement. In addition, Tachyon has attracted more than 200 contributors from over 50 institutions, including Alibaba, Redhat, Baidu, Intel, and IBM.
The FBI publicly demanded that Apple help the FBI unlock a dead terrorist’s cell phone by providing a special proprietary “back door”. Apple refused, noting that such a tool would invariably escape into the wild and jeopardize the security and privacy of the entire cell phone community: consumer and business. An anonymous 3rd party broke the impasse by providing such a backdoor. But, the issue remains: privacy/security controls versus selected data recovery demands. What is your opinion? Come to this BoF and help us find a win/win solution.
At this Birds of a Feather session, we’ll discuss how open source is enabling new storage and datacenter architectures. All are welcome who have an interest in open source, scale-up and scale-out storage, hyperconvergence and exploring open solutions for the datacenter.
Today there is a great excitement in increasing the immersion of virtual reality productions using real-time computer generated content, offline rendered content, captured content, and their combinations. Camera vendors are striving to rapidly enable film makers to capture more immersive reality. Headset vendors are working to allow viewers to experience a greater sense of presence to actually experience the virtual environments in a life-like manner. Consequently, these emerging forms of entertainment are expanding across multiple dimensions, integrating data from multiple cameras to create 360 degree and 360 degree stereoscopic video. The resulting separate frames are stitched together to form a 360 degree by 180 degree view from a single point in space. More cameras lead to synthesized 3D stereoscopic view from a single point in all directions. This situation is further complicated by the emerging promise of Light Field cameras that dramatically increase the compute, storage and bandwidth requirements over conventional feature films and interactive 3D game applications.
This BOF will start with a short presentation that overviews the various emerging entertainment forms and their implications. Next, a single visual effects shot will be dissected to illustrate the technical issues involved. Finally, a group discussion will be facilitated to discuss how emerging storage technologies will impact these emerging entertainment forms.
Since arriving over a decade ago, the adoption of data deduplication has become widespread throughout the storage and data protection communities. This tutorial assumes a basic understanding of deduplication and covers topics that attendees will find helpful in understanding today’s expanded use of this technology. Topics will include trends in vendor deduplication design and practical use cases, e.g., primary storage, data protection, replication, etc., and will also cover other data reduction technologies, e.g., compression, etc.
Learning Objectives
There are growing interest on object storage as a backup and version storage for their massive data on primary storage due to its intuitive interfaces and relatively low cost of ownership. However, object storage in its early stage now does not consider capacity optimization very well (especially on its open source implementation like Openstack Swift and Ceph). This presentation introduces data reduction techniques from the viewpoint of object storage; we will cover deduplication, compression, and more interesting techniques for capacity optimization on object storage.
Learning Objectives
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on-demand can be very attractive to IT departments, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files.
Hear from Jim Thompson, Senior Systems Engineer at Avere Systems, on the IT and business benefits that cloud bursting provides, including increased compute capacity, lower IT investment, financial agility, and, ultimately, faster time to market.
Learning Objectives
According to the Ponemon Institute, 30 percent of business information is stored in the cloud. Like any relationship, both sides are wide-eyed about the limitless possibilities, attentive and full of promise. What promises, you might ask? Higher IT control, centralized management and delivery efficiencies are just a few.
But not all relationships last forever. Maybe a cloud storage provider isn’t meeting up to its pre-defined expectations, or they’ve decided to change the terms and conditions of their agreement. Perhaps they’ve had repeated outages. Or it might be as simple as coming to the end of the service agreement or contract. Whatever the reason might be, one of the biggest mistakes a company can make is not plotting out every single step of their exit plan well before entering the cloud. This could triple the likelihood of losing customer trust, loyalty and long-term business. And when you factor in the hefty legal fines and repercussions, it’s the kind of damage that’s nearly impossible to bounce back from.
In this session, leading IT, cloud infrastructure and tech experts from Blancco Technology Group and other firms will outline the step-by-step process of developing a written exit plan for data stored in the cloud and how to plot their data exit plan against key regulatory criteria so that they can best minimize the likelihood of data being accessed or stolen by cyber thieves.
Learning Objectives
It is increasingly common to combine as-a-service and as-a-product consumption models for elements of an organization's data infrastructure, including applications; development platforms; databases; and networking, processing, and storage resources. Some refer to this as "hybrid" architecture.
Using technical (not marketing) language, and without naming specific vendors or products, this presentation covers some improved storage capabilities becoming available in service and product offerings, and some scenarios for integrating these kinds of offerings with other data infrastructure services and products.
Learning Objectives
As flash storage becomes mainstream, storage pros are frequently bombarded by vendors and the press about the importance of latency when considering the performance of storage systems.
Simultaneously, the public cloud has emerged as a remote computing resource that is disrupting the way businesses use IT. In a world of geographically dispersed islands of compute, however, the latency problem takes on a different complexion: System designers need to consider the impact of physical distance and the speed of light more carefully than the latency of storage media.
In this discussion, we will cover some of the implications of latency on the performance of distributed systems and, in particular storage systems, in the context of the public cloud. We’ll detail:
Learning Objectives
Cloud platforms that provide a scalable, virtualized infrastructure are becoming ubiquitous. As the underlying storage can meet extreme demands of scalability in this platform, running storage analytics applications in cloud is gaining momentum. Gartner estimates 85% of Fortune 500 companies do not reap the full benefit of their data analytics, causing them to loose potential opportunities. Different Cloud providers do supply various metrics but they seem to be not uniform and inadequate sometimes. This mandates for a Cloud Storage analytics solution that follows a scientific process of transforming storage data metrics into insight for making better decisions.
Learning Objectives
Cinder is the block storage management service for OpenStack. Cinder allows provisioning iSCSI, fibre channel, and remote storage services to attach to your cloud instances. LVM, Ceph, and other external storage devices can be managed and consumed through the use of configurable backend storage drivers.
Led by a core member of Cinder, this session will provide an introduction to the block storage services in OpenStack as well as give an overview of the Cinder project itself.
Whether you are looking for more information on how to use block storage in OpenStack, are looking to get involved in an open source project, or are just curious about how storage fits into the cloud, this session will provide a starting point to get going.
Learning Objectives
With the rise of cloud systems, IT spending on storage system is increasing. In order to minimize costs, architects must optimize system capacities and characteristics. Current capacity planning is mostly based on trial and errors as well as rough resource estimations. With increasing hardware diversity and software stack complexity this approach is not efficient enough. To meet both Storage capacity and SLA/SLOs requirements needs kind of trade-off.
If you are planning to deploy a storage cluster, growth is what you should be concerned with and prepared for. So how exactly can you architect a storage system, without breaking the bank, while sustaining a sufficient capacity and performance across the scaling spectrum?
The session is designed to present a novel simulation approach which shows flexibility and high accuracy to be used for cluster capacity planning, performance evaluation and optimization before system provisioning. We will focus specifically on storage capacity planning and provide criteria for getting the best price-performance configuration by setting Memory, SSD and Magnetic Disk ratio. We will also highlight performance optimization ability via evaluating different OS parameters (e.g. log flush and write barrier), software configurations (e.g. proxy and object worker numbers) and hardware setups (e.g. CPU, cluster size, the ratio of proxy server to storage server, network topology selection CLOS vs. Fat Tree).
Learning Objectives
Swift is a highly available, distributed, scalable, eventually consistent object/blob store available as open source. It is designed to handle non-relational (that is, not just simple row-column data) or unstructured data at large scale with high availability and durability. For example, it can be used to store files, videos, documents, analytics results, Web content, drawings, voice recordings, images, maps, musical scores, pictures, or multimedia. Organizations can use Swift to store large amounts of data efficiently, safely, and cheaply. It scales horizontally without any single point of failure. It offers a single multi-tenant storage system for all applications, the ability to use low-cost industry-standard servers and drives, and a rich ecosystem of tools and libraries. It can serve the needs of any service provider or enterprise working in a cloud environment, regardless of whether the installation is using other OpenStack components. Use cases illustrate the wide applicability of Swift.
The storage industry is being transformed by the adoption of Cloud Storage. Challenges that were overlooked during the initial stages of cloud storage industry growth are now becoming core issues of today and the future. In this session we discuss the major challenges that corporations will face to avail themselves of the best of services from multiple cloud providers; or to move from one cloud provider to another in a seamless manner.
The SNIA CDMI standard addresses these challenges by offering interoperability between clouds storage. SNIA and Tata Consultancy Services (TCS) have partnered to create a SNIA CDMI Conformance Test Program to help cloud storage companies achieve conformance to CDMI and ensure interoperability between clouds. The TCS CDMI Conformance Assurance Solution (CAS) provides cloud storage product testing and detailed reports for conformance to the CDMI specification.
As interoperability becomes critical, end user companies should include the CDMI standard in their RFPs and demand conformance to CDMI from vendors.
Learning Objectives
As cloud and storage projections continue to rise, the number of organizations moving to the Cloud is escalating and it is clear cloud storage is here to stay. However, is it secure? Data is the lifeblood for government entities, countries, cloud service providers and enterprises alike and losing or exposing that data can have disastrous results. There are new concepts for data storage on the horizon that will deliver secure solutions for storing and moving sensitive data around the world. In this session, attendees will learn about new best practices to bypass the Internet.
Learning Objectives
Containers are called to the next big wave in application delivery, offering better resource utilization, agility and performance than traditional virtualization techniques. Once enterprises start running databases and applications with persistent storage needs, a new challenge appears with this new paradigm. This session will discuss how Veritas uses Software Defined Storage solutions to provide efficient and agile persistent storage for containers, offering enterprise capabilities like resilience, snapshots, I/O acceleration and Disaster Recovery. A reference architecture using commodity servers and server side storage will be presented. Finally, future challenges around quality of service, manageability and visibility will be covered in this session.
Learning Objectives
Storage has historically stayed in its “box.” Any developments in the industry were contained to limited dimensions such as costs, feeds or speeds. But in today’s digital world, it’s not enough to just deliver data – all parts of your infrastructure must be able to answer ever-present data security questions, extract valuable insights, identify sensitive information and help protect critical data from incoming threats.
As you go through your next technology refresh, what benefits should your storage deliver to your business, and by which dimensions should you measure them? In this session, DataGravity Chief Technology Officer David Siles will address the new standards for aligning your storage platform with your data’s needs, and share tips for highlighting the consequences of outdated storage to your CEO. Your company’s critical data wasn’t created to live and die in a system that can’t maximize its value or protect it from risks.
Learning Objectives
The increasing need for data storage capacity due to enormous amounts of newly created data year after year is an endless story. However, budgets for data storage are not nearly increasing at the same rate as data capacity growth, while retaining the data still remains very important. In consequence, having an inexpensive but reliable storage solution is paramount for this situation. Fortunately, most data generated is "cold data", which is rarely accessed but still needs to be retained for quite a long period of time. Tape storage, which has a long proven history with applications in various industries, is suitable for retaining such cold data because of its low TCO (total cost of ownership), advanced performance, high reliability and promising future outlook compared to other candidate technologies for cold storage (e.g. HDD, Optical Discs).
In this presentation, we will go through the reasons why tape storage is suitable for retaining cold data and will present the latest tape technologies and future outlook.
Learning Objectives
The tightly coupled architecture used by the vast majority of enterprises today is archaic and a new approach is needed to manage the explosion of data in a world of shrinking IT budgets. Enter: hyper-scale IT architecture.
Sudhakar Mungamoori will explain why a modern, software-driven and loosely coupled architecture is required for hyper-scale IT. He will highlight how this innovative architecture approach mitigates complexity, improves agility and reliability through on demand IT resources and reduces costs by as much as 10X.
Mungamoori will highlight how enterprises can learn from companies like Google and Facebook who built their own loosely coupled IT architectures to capitalize on its advantages. He will discuss use cases and best practices for IT departments that cannot similarly build their own, but are strategically looking to adopt loosely coupled architectures in order to remain competitive without blowing their budget in the face of today’s data deluge.
Learning Objectives
Extending the enterprise backup paradigm with disk-based technologies allow users to significantly shrink or eliminate the backup time window. This tutorial focuses on various methodologies that can deliver efficient and cost effective solutions. This includes approaches to storage pooling inside of modern backup applications, using disk and file systems within these pools, as well as how and when to utilize Continuous Data Protection, deduplication and virtual tape libraries (VTL), as well as the cloud.
Learning Objectives
Many disk technologies, both old and new, are being used to augment tried and true backup and data protection methodologies to deliver better information and application restoration performance; These technologies work in parallel with the existing backup paradigm. This session will discuss many of these technologies in detail; Important considerations of data protection include performance, scale, regulatory compliance, recovery objectives and cost; Technologies include contemporary backup, disk based backups, snapshots, continuous data protection and capacity optimized storage, as well as cloud services. This tutorial will cover how these technologies interoperate, as well as best practices recommendations for deployment in today's heterogeneous data centers.
Learning Objectives
Data growth is in an explosive state, and these "Big Data" repositories need to be protected. In addition, new regulations are mandating longer data retention, and the job of protecting these ever-growing data repositories is becoming even more daunting. This presentation will outline the challenges, methodologies, and best practices to protect the massive scale "Big Data" repositories.
Learning Objectives
After reviewing the diverging data protection legislation in the EU member states, the European Commission (EC) decided that this situation would impede the free flow of data within the EU zone. The EC response was to undertake an effort to "harmonize" the data protection regulations and it started the process by proposing a new data protection framework. This proposal includes some significant changes like defining a data breach to include data destruction, adding the right to be forgotten, adopting the U.S. practice of breach notifications, and many other new elements. Another major change is a shift from a directive to a rule, which means the protections are the same for all 27 countries and includes significant financial penalties for infractions. This tutorial explores the new EU data protection legislation and highlights the elements that could have significant impacts on data handling practices.
Learning Objectives
Aberdeen delivers extreme computing power with our ultra-dense, extreme performance storage devices, packing over 3/4 Petabyte into only 4U of rack space. While consuming just 4U of rack space, Aberdeen’s ultra-dense storage devices suit a wide range of capacity hungry applications, including big data analytics or massive block or object storage
Aberdeen custom configures your server or storage products to your exact specifications. The ease of our online configurator lets you choose the storage device to fit precisely what you want. Exploring the features of Aberdeen’s NAS line, including our N49 4U 78 Bay, Ultra Dense, High Performance 12Gb/s SAS, storage device.
Where are today’s storage performance bottlenecks, how do you find them and how does adding flash storage affect them? Demartek will report the results (IOPS, throughput and latency) of vendor-neutral performance tests run on database and virtualization workloads typical of those found in today’s data centers. The tests cover both hybrid and all-flash solutions from several manufacturers and using a variety of form factors and interfaces. You will come away with reasonable estimates of what to expect in practice, observe how different workloads affect storage system performance and notice the difference in performance results depending on where the measurements were taken. Technologies discussed include server-side flash, hybrid storage arrays, all-flash arrays and various interfaces including NVMe.
Learning Objectives
Using NVMe drives in a centralized manner introduces the need for high availability. Without it, a simple failure in the NVMe enclosure will result in loss of access to a big group of disks. Loss of a single NVMe disk will impact all hosts mapped to this disk. We will review the state of the industry in approaching these problems, the challenges in performing HA and RAID at the speeds and latency of NVMe, and introduce new products in this space.
Panelists: Jorge Campello, Global Director of Systems and Solutions, Western Digital; Mark Carlson, Principal Engineer, Industry Standards, Toshiba, Chair, SNIA Technical Council; Josh Bingaman, Firmware Engineering Manager, Seagate Technology
The unyielding growth of digital data continues to drive demand for higher capacity, lower-cost storage. With the advent of Shingled Magnetic Recording (SMR), which overlaps HDD tracks to provide a 25 percent capacity increase versus conventional magnetic recording technology, storage vendors are able to offer extraordinary drive capacities within existing physical footprints. That said, IT decision makers and storage system architects need to be cognizant of the different data management techniques that come with SMR technology, namely Drive Managed, Host Managed and Host Aware. This panel session will offer an enterprise HDD market overview from prominent storage analyst Tom Coughlin as well as presentations on SMR data management methods from leading SMR HDD manufacturers (Seagate, Toshiba and Western Digital).
Learning Objectives
Windows and POSIX are different, and bridging the gap between the two—particularly with Network File Systems—can be a daunting endeavor ...and annoying, too. This tutorial will provide an overview of the SMB3 network file protocol (the heart and soul of Windows Interoperability) and describe some of the unique and powerful features that SMB3 provides. We will also point out and discuss some of the other protocols and services that are integrated with SMB3 (such as PeerDist), and show how the different pieces are stapled together and made to fly. The tutorial will also cover the general structure of Microsoft's protocol documentation, the best available cartography for those lost in the Interoperability Jungle. Some simple code examples will be used sparingly as examples, wherever it may seem clever and useful to do so.
Learning Objectives
A number of scale out storage solutions, as part of open source and other projects, are architected to scale out by incrementally adding and removing storage nodes. Example projects include:
Hadoop’s HDFS
CEPH
Swift (OpenStack object storage)
The typical storage node architecture includes inexpensive enclosures with IP networking, CPU, Memory and Direct Attached Storage (DAS). While inexpensive to deploy, these solutions become harder to manage over time. Power and space requirements of Data Centers are difficult to meet with this type of solution. Object Drives further partition these object systems allowing storage to scale up and down by single drive increments.
Learning Objectives
Internet changed the world and continues to revolutionize how people are connected, exchange data and do business. This radical change is one of the causes of the rapid explosion of data volume that required a new data storage approach and design. One of the common elements is that unstructured data rules the IT world. How famous Internet services we all use every day can support and scale with thousands of new users and hundreds of TB added daily and continue to deliver an enterprise-class SLA ? What are various technologies behind a Cloud Storage service to support hundreds of millions users? This tutorial covers technologies introduced by famous papers about Google File System and BigTable, Amazon Dynamo or Apache Hadoop. In addition, Parallel, Scale-out, Distributed and P2P approaches with open source and proprietary ones are illustrated as well. This tutorial adds also some key features essential at large scale to help understand and differentiate industry vendors and open source offerings.
Learning Objectives
Joy's Law, named after Sun Microsystems' founder, is proving painfully true: “The best people work for someone else” and the explosive growth of open source now dictates that you are always out numbered by skilled developers you could never obtain or afford. This is not however a bad thing. Collaboration always beats competition when building community infrastructure and the OpenZFS project is taking Sun's next generation file system to unforeseen and unimaginable levels. OpenZFS-powered projects like FreeNAS have inverted the conversation from "I wish I could have this enterprise technology at home" to "Why aren't we using this at work?"
Learning Objectives
Few, if any, enterprise organizations will be willing to consume an upstream version of Ceph. This session will cover general guidelines for implementing Ceph for the enterprise and cover available reference architectures from SUSE.
Learning Objectives
The unique challenges in the field of nuclear high energy physics are already pushing the limits of storage solutions today, however, the projects planned for the next ten years call for storage capacities, performance and access patterns that exceed the limits of many of today's solutions.
This talk will present the limitations in network and storage and explain the architecture chosen for tomorrow's storage implementations in this field. Tests of various file systems (Lustre, NFS, Block Object storage, GPFS ..) have been performed and the results of performance measurements for different hardware solutions and access patterns will be presented.
Learning Objectives
Many computing sites need long-term retention of mostly cold data often “data lakes”. The main function of this storage tier is capacity but non trivial bandwidth/access requirements exist. For many years, tape was the best economic solution. Data sets have grown larger more quickly than tape bandwidth improvements and access demands have increased. Disk can be more economically for this storage tier. The Cloud Community has moved towards erasure based object stores to gain scalability and durability using commodity hardware. The Object Interface works for new applications but legacy applications utilize POSIX for their interface. MarFS is a Near-POSIX File System using cloud storage for data and many POSIX file systems for metadata. MarFS will scale the POSIX namespace metadata to trillions of files and billions of files in a single directory while storing the data in efficient massively parallel ways in industry standard erasure protected cloud style object stores.
Learning Objectives
Rozo Systems develops a new generation of Scale-Out NAS with a radical new design to deliver a new level of performance. RozoFS is a high scalable, high performance and high resilient file storage product, fully hardware agnostic, that relies on an unique patented Erasure Coding technology developed at University of Nantes in France. This new philosophy in file serving extends what is capable and available today on the market with super fast and seamless data protection techniques. Thus RozoFS is the perfect companion for high demanding environments such HPC, Life Sciences, Media and Entertainment, Oil and Gas.
Learning Objectives
We all know about ENERGY STAR labels on refrigerators and other household appliances. In an effort to drive energy efficiency in data centers, storage systems can now get ENERGY STAR labels through the EPA announced its ENERGY STAR Data Center Storage program. This program uses the taxonomies and test methods described in the SNIA Emerald Power Efficiency Measurement specification, which is part of the SNIA Green Storage Initiative. In this session, Dennis Martin, President of Demartek, the first SNIA Emerald Recognized Tester company, will discuss the similarities and differences in power supplies used in computers you build yourself and in data center storage equipment, 80PLUS ratings, and why it is more efficient to run your storage systems at 230v or 240v rather than 115v or 120v. Dennis will share his experiences running the EPA ENERGY STAR Data Center Storage tests for storage systems and why vendors want to get approved.
Learning Objectives
Everyone wants to save energy in one form or another and energy efficiency is right at the top of lists of data center owner/architect pain points and key concerns. As worldwide data grows at an exponential rate, data storage solutions are creating an ever-increasing footprint for power in the data center. Understanding the key factors of power utilization for storage solutions is critical to optimizing that power footprint whether it be for purposes of system design or application in a data center or elsewhere. This talk will provide a high-level overview of storage technologies and compare/contrast them from a power perspective. More importantly, it will identify the best and simplest opportunities for reducing overall energy usage. Electrical engineering and/or power technical knowledge is not required as this is targeted for both technologists and facilities/business decision-makers.
Learning Objectives
With a design started in 2006, OpenIO is a new flavor among the dynamic object storage market segment. Beyond Ceph and OpenStack Swift, OpenIO is the last coming player in that space. The product relies on an open source core object storage software with several object APIs, file sharing protocols and applications extensions. The inventors of the solution took a radical new approach to address large scale environment challenges. Among them, the product avoids any rebalance like consistent hashing based systems always trigger. The impact is immediate as new machines contribute immediately without any extra tasks that impact the platform service. OpenIO also introduces the Conscience, an intelligent data placement service, that optimizes the location of the data based on various criteria such nodes workload, storage space… OpenIO is fully hardware agnostic, running on commodity x86 servers promoting a total independence.
Learning Objectives
Businesses are extracting value from more data, more sources and at increasingly real-time rates. Spark and HANA are just the beginning. This presentation details existing and emerging solutions for in-memory computing solutions that address this market trend and the disruptions that happen when combining big-data (Petabyes) with in-memory/real-time requirements. We will also be providing use cases and survey results from users who have implemented in memory computing applications, It provides an overview and trade-offs of key solutions (Hadoop/Spark, Tachyon, Hana, NoSQL-in-memory, etc) and related infrastructure (DRAM, Nand, 3D-crosspoint, NV-DIMMs, high-speed networking) and the disruption to infrastructure design and operations when "tiered-memory" replaces "tiered storage". And it also includes real customer data on how they are addressing and planning for this transition with this architectural framework in mind. Audience will leave with a framework to evaluate and plan for their adoption of in-memory computing.
Learning Objectives
In modern data center with thousands of servers, thousands of switches and storage devices, and millions of cables, failures could arise anywhere in compute, network or storage layer. The infrastructures provides multiple sources of huge volumes of data - time series data of events, alarms, statistics, IPC, system-wide data structures, traces and logs. Interestingly, data is gathered in different formats and at different rates by different subsystems. In this heterogeneous data representation, the ability to blend and ingest the data to discover hidden correlations and patterns is important. Robust data architecture and machine learning techniques are required to predict impending functional or performance issues and to propose desired actions that can mitigate an unwanted situation before it happens. This presentation will outline the challenges and address machine learning based solutions to und