Training Deep Learning Models Q&A

Erin Farr

May 19, 2023

title of post
The estimated impact of Deep Learning (DL) across all industries cannot be understated. In fact, analysts predict deep learning will account for the majority of cloud workloads, and training of deep learning models will represent the majority of server applications in the next few years. It’s the topic the SNIA Cloud Storage Technologies Initiative (CSTI) discussed at our webinar “Training Deep Learning Models in the Cloud.” If you missed the live event, it’s available on-demand at the SNIA Educational Library where you can also download the presentation slides. The audience asked our expert presenters, Milind Pandit from Habana Labs Intel and Seetharami Seelam from IBM several interesting questions. Here are their answers: Q. Where do you think most of the AI will run, especially training? Will it be in the public cloud or will it be on-premises or both [Milind:] It's probably going to be a mix. There are advantages to using the public cloud especially because it's pay as you go. So, when experimenting with new models, new innovations, new uses of AI, and when scaling deployments, it makes a lot of sense. But there are still a lot of data privacy concerns. There are increasing numbers of regulations regarding where data needs to reside physically and in which geographies. Because of that, many organizations are deciding to build out their own data centers and once they have large-scale training or inference successfully underway, they often find it cost effective to migrate their public cloud deployment into a data center where they can control the cost and other aspects of data management. [Seelam]: I concur with Milind. We are seeing a pattern of dual approaches. There are some small companies that don't have the right capital necessary nor the expertise or teams necessary to acquire GPU based servers and deploy them. They are increasingly adopting public cloud. We are seeing some decent sized companies that are adopting this same approach as well. Keep in mind these GPU servers tend to be very power hungry and so you need the right floor plan, power, cooling, and so forth. So, public cloud definitely helps you have easy access and to pay for only what you consume. We are also seeing trends where certain organizations have constraints that restrict moving certain data outside their walls. In those scenarios, we are seeing customers deploy GPU systems on-premises. I don't think it's going to be one or the other. It is going to be a combination of both, but by adopting more of a common platform technology, this will help unify their usage model in public cloud and on-premises. Q. What is GDR? You mentioned using it with RoCE. [Seelam]: GDR stands for GPUDirect RDMA. There are several ways a GPU on one node can communicate to a GPU on another node. There are three different ways (at least) of doing this: The GPU can use TCP where GPU data is copied back into the CPU which orchestrates the communication to the CPU and GPU on another node. That obviously adds a lot of latency going through the whole TCP protocol. Another way to do this is through RoCEv2 or RDMA where CPUs, FPGAs and/or GPUs actually talk to each other through industry standard RDMA channels. So, you send and receive data without the added latency of traditional networking software layers. A third method is GDR where a GPU on one node can talk to a GPU on another node directly. This is done through network interfaces where basically the GPUs are talking to each other, again bypassing traditional networking software layers. Q. When you are talking about RoCE do you mean RoCEv2? [Seelam]: That is correct I'm talking only about RoCEv2. Thank you for the clarification. Q. Can you comment on storage needs for DL training and have you considered the use of scale out cloud storage services for deep learning training? If so, what are the challenges and issues? [Milind]: The storage needs are 1) massive and 2) based on the kind of training that you're doing, (data parallel versus model parallel). With different optimizations, you will need parts of your data to be local in many circumstances. It's not always possible to do efficient training when data is physically remote and there's a large latency in accessing it. Some sort of a caching infrastructure will be required in order for your training to proceed efficiently. Seelam may have other thoughts on scale out approaches for training data. [Seelam]: Yes, absolutely I agree 100%. Unfortunately, there is no silver bullet to address the data problem with large-scale training. We take a three-pronged approach. Predominantly, we recommend users put their data in object storage and that becomes the source of where all the data lives. Many training jobs, especially training jobs that deal with text data, don't tend to be huge in size because these are all characters so we use object store as a source directly to read the data and feed the GPUs to train. So that's one model of training, but that only works for relatively smaller data sets. They get cached once you access the first time because you shard it quite nicely so you don't have to go back to the data source many times. There are other data sets where the data volume is larger. So, if you're dealing with pictures, video or these kinds of training domains, we adopt a two-pronged approach. In one scenario we actually have a distributed cache mechanism where the end users have a copy of the data in the file system and that becomes the source for AI training. In another scenario, we deployed that system with sufficient local storage and asked users to copy the data into that local storage to use that local storage as a local cache. So as the AI training is continuing once the data is accessed, it's actually cached on the local drive and subsequent iterations of the data come from that cache. This is much bigger than the local memory. It’s about 12 terabytes of cache local storage with the 1.5 terabytes of data. So, we could get to these data sets that are in the 10-terabyte range per node just from the local storage. If they exceed that, then we go to this distributed cache. If the data sets are small enough, then we just use object storage. So, there are at least three different ways, depending on the use case on the model you are trying to train. Q. In a fully sharded data parallel model, there are three communication calls when compared to DDP (distributed data parallel). Does that mean it needs about three times more bandwidth? [Seelam]: Not necessarily three times more, but you will use the network a lot more than you would use in a DDP. In a DDP or distributed data parallel model you will not use the network at all in the forward pass. Whereas in an FSDP (fully sharded data parallel) model, you use the network both in forward pass and in backward pass. In that sense you use the network more, but at the same time because you don't have parts of the model within your system, you need to get the model from the other neighbors and so that means you will be using more bandwidth. I cannot give you the 3x number; I haven't seen the 3x but it's more than DDP for sure. The SNIA CSTI has an active schedule of webinars to help educate on cloud technologies. Follow us on Twitter @sniacloud_com and sign up for the SNIA Matters Newsletter, so that you don’t miss any.                      

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Web 3.0 – The Future of Decentralized Storage

Joseph White

May 8, 2023

title of post
Decentralized storage is bridging the gap between Web 2.0 and Web 3.0, and its impact on enterprise storage is significant. The topic of decentralized storage and Web 3.0 will be the focus of an expert panel discussion the SNIA Networking Storage Forum is hosting on June 1, 2023, “Why Web 3.0 is Important to Enterprise Storage.” In this webinar, we will provide an overview of enterprise decentralized storage and explain why it is more relevant now than ever before. We will delve into the benefits and demands of decentralized storage and discuss the evolution of on-premises, to cloud, to decentralized storage (cloud 2.0). We will also explore various use cases of decentralized storage, including its role in data privacy and security and the potential for decentralized applications (dApps) and blockchain technology. As part of this webinar, we will introduce you to the Decentralized Storage Alliance, a group of like-minded individuals and organizations committed to advancing the adoption of decentralized storage. We will provide insights into the members of the Alliance and the working groups that are driving innovation and progress in this exciting field and answer questions such as:
  • Why is enterprise decentralized storage important?
  • What are the benefits, the demand, and why now?
  • How will on-premises, to cloud, to decentralized storage evolve?
  • What are the use cases for decentralized storage?
  • Who are the members and working groups of the Decentralized Storage Alliance?
Join us on June 1st to gain valuable insights into the future of decentralized storage and discover how you can be part of this game-changing technology.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Web 3.0 – The Future of Decentralized Storage

Joseph White

May 8, 2023

title of post
Decentralized storage is bridging the gap between Web 2.0 and Web 3.0, and its impact on enterprise storage is significant. The topic of decentralized storage and Web 3.0 will be the focus of an expert panel discussion the SNIA Networking Storage Forum is hosting on June 1, 2023, “Why Web 3.0 is Important to Enterprise Storage.” In this webinar, we will provide an overview of enterprise decentralized storage and explain why it is more relevant now than ever before. We will delve into the benefits and demands of decentralized storage and discuss the evolution of on-premises, to cloud, to decentralized storage (cloud 2.0). We will also explore various use cases of decentralized storage, including its role in data privacy and security and the potential for decentralized applications (dApps) and blockchain technology. As part of this webinar, we will introduce you to the Decentralized Storage Alliance, a group of like-minded individuals and organizations committed to advancing the adoption of decentralized storage. We will provide insights into the members of the Alliance and the working groups that are driving innovation and progress in this exciting field and answer questions such as:
  • Why is enterprise decentralized storage important?
  • What are the benefits, the demand, and why now?
  • How will on-premises, to cloud, to decentralized storage evolve?
  • What are the use cases for decentralized storage?
  • Who are the members and working groups of the Decentralized Storage Alliance?
Join us on June 1st to gain valuable insights into the future of decentralized storage and discover how you can be part of this game-changing technology. The post Web 3.0 – The Future of Decentralized Storage first appeared on SNIA on Network Storage.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Threat Detection Q&A

Michael Hoard

Apr 28, 2023

title of post
Stealing data, compromising data, and holding data hostage have always been the main goals of cybercriminals. Threat detection and response methods continue to evolve as the bad guys become increasingly sophisticated, but for the most part, storage has been missing from the conversation. Enter “Cyberstorage,” a topic the SNIA Cloud Storage Technologies Initiative recently covered in our live webinar, “Cyberstorage and XDR: Threat Detection with a Storage Lens.” It was a fascinating look at enhancing threat detection at the storage layer. If you missed the live event, it’s available on-demand along with the presentation slides. We had some great questions from the live event as well as interesting results from our audience poll questions that we wanted to share here. Q. You mentioned antivirus scanning is redundant for threat detection in storage, but could provide value during recovery. Could you elaborate on that? A. Yes, antivirus can have a high value during recovery, but it's not always intuitive on why this is the case. If malware makes it to your snapshots or your backups, it's because it was unknown and it was not detected. Then, at some point, that malware gets activated on your live system and your files get encrypted. Suddenly, you now know something happened, either because you can’t use the files or because there’s a ransomware banner note. Next, the incident responders come in and a signature for that malware is now identified. The malware becomes known. The antivirus/EDR vendors quickly add a patch to their signature scanning software, for you to use. Since malware can dwell on your systems without being activated for days or weeks, you want to use that updated signature scan and/or utilize a file malware scanner to validate that you're not reintroducing malware that was sitting dormant in your snapshots or backups. This way you can ensure as you restore data, you are not reintroducing dormant malware. Audience Poll Results Here’s how our live audience responded to our poll questions. Let us know what you think by leaving us a comment on this blog. Q. What are other possible factors to consider when assessing Cyberstorage solutions? A. Folks generally tend to look at CPU usage for any solution and looking at that for threat detection capabilities also makes sense. However, you might want to look at this in the context of where the threat detection is occurring across the data life cycle. For example, if the threat detection software runs on your live system, you'll want lower CPU usage. But, if the detection is occurring against a snapshot outside your production workloads or if it's against secondary storage, higher CPU usage may not matter as much.  

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Threat Detection Q&A

Michael Hoard

Apr 28, 2023

title of post
Stealing data, compromising data, and holding data hostage have always been the main goals of cybercriminals. Threat detection and response methods continue to evolve as the bad guys become increasingly sophisticated, but for the most part, storage has been missing from the conversation. Enter “Cyberstorage,” a topic the SNIA Cloud Storage Technologies Initiative recently covered in our live webinar, “Cyberstorage and XDR: Threat Detection with a Storage Lens.” It was a fascinating look at enhancing threat detection at the storage layer. If you missed the live event, it’s available on-demand along with the presentation slides. We had some great questions from the live event as well as interesting results from our audience poll questions that we wanted to share here. Q. You mentioned antivirus scanning is redundant for threat detection in storage, but could provide value during recovery. Could you elaborate on that? A. Yes, anitvirus can have a high value during recovery, but it’s not always intuitive on why this is the case. If malware makes it to your snapshots or your backups, it’s because it was unknown and it was not detected. Then, at some point, that malware gets activated on your live system and your files get encrypted. Suddenly, you now know something happened, either because you can’t use the files or because there’s a ransomware banner note. Next, the incident responders come in and a signature for that malware is now identified. The malware becomes known. The antivirus/EDR vendors quickly add a patch to their signature scanning software, for you to use. Since malware can dwell on your systems without being activated for days or weeks, you want to use that updated signature scan to validate that you’re not reintroducing malware that was sitting dormant in your snapshots or backups. This way you can ensure as you restore data, you are not reintroducing dormant malware. Audience Poll Results Here’s how our live audience responded to our poll questions. Let us know what you think by leaving us a comment on this blog. Q. What are other possible factors to consider when assessing Cyberstorage solutions? A. Folks generally tend to look at CPU usage for any solution and looking at that for threat detection capabilities also makes sense. However, you might want to look at this in the context of where the threat detection is occurring across the data life cycle. For example, if the threat detection software runs on your live system, you’ll want lower CPU usage. But, if the detection is occurring against a snapshot outside your production workloads or if it’s against secondary storage, higher CPU usage may not matter as much.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Survey Says…Here are Data & Cloud Storage Trends Worth Noting

Michael Hoard

Apr 7, 2023

title of post
With the move to cloud continuing, application modernization, and related challenges such as hybrid and multi-cloud adoption and regulatory compliance requirements, enterprises must ensure they understand the current data and storage landscape. The SODA Foundation’s annual comprehensive global survey on data and storage trends does just that, providing a comprehensive look at the intersection of cloud computing, data and storage management, the configuration of environments that end-user organizations are gravitating to, and priorities of selected capabilities over the next several years On April 13, 2023, SNIA Cloud Storage Technologies Initiative (CSTI) is pleased to host SODA in a live webcast “Top 12 Trends in Data and Cloud Storage” where SODA members who led this research will share key findings. I hope you will join us for a live discussion and in-depth look at this important research to hear the trends that are driving data and storage decisions, including:
  • The top 12 trends in data and storage
  • Data security challenges facing container deployments
  • Approaches to public cloud deployment
  • Challenges for storage observability
  • Focus on hybrid and multi-cloud deployments
  • Top use cases for cloud storage services
  • The impact of open source working with data and storage
Register here and bring your questions. You will also have the opportunity to download the 42-page full report. We look forward to your joining us!      

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Survey Says…Here are Data & Cloud Storage Trends Worth Noting

Michael Hoard

Apr 7, 2023

title of post
With the move to cloud continuing, application modernization, and related challenges such as hybrid and multi-cloud adoption and regulatory compliance requirements, enterprises must ensure they understand the current data and storage landscape. The SODA Foundation’s annual comprehensive global survey on data and storage trends does just that, providing a comprehensive look at the intersection of cloud computing, data and storage management, the configuration of environments that end-user organizations are gravitating to, and priorities of selected capabilities over the next several years On April 13, 2023, SNIA Cloud Storage Technologies Initiative (CSTI) is pleased to host SODA in a live webcast “Top 12 Trends in Data and Cloud Storage” where SODA members who led this research will share key findings. I hope you will join us for a live discussion and in-depth look at this important research to hear the trends that are driving data and storage decisions, including:
  • The top 12 trends in data and storage
  • Data security challenges facing container deployments
  • Approaches to public cloud deployment
  • Challenges for storage observability
  • Focus on hybrid and multi-cloud deployments
  • Top use cases for cloud storage services
  • The impact of open source working with data and storage
Register here and bring your questions. You will also have the opportunity to download the 42-page full report. We look forward to your joining us!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Michael Hoard

Jan 5, 2023

title of post
Kubernetes cloud orchestration platforms offer all the flexibility, elasticity, and ease of use — on premises, in a private or public cloud, even at the edge. The flexibility of turning on services when you want them, turning them off when you don’t, is an enticing prospect for developers as well as application deployment teams, but it has not been without its challenges. At our recent SNIA Cloud Storage Technologies Initiative webcast “Kubernetes Trials & Tribulations: Cloud, Data Center, Edge” our experts, Michael St-Jean and Pete Brey, debated both the challenges and advantages of Kubernetes. If you missed the session, it is available on-demand along with the presentation slides. The live audience raised several interesting questions. Here are answers to them from our presenters. Q: Are all these trends coming together? Where will Kubernetes be in the next 1-3 years? A: Adoption rates for workloads like databases, artificial intelligence & machine learning, and data analytics in a container environment are on the rise. These applications are stateful and diverse, so a multi-protocol persistent storage layer built with Kubernetes services is essential. Additionally, Kubernetes-based platforms pave the way for application modernization, but when, and which applications should you move… and how do you do it? There are companies who still have virtual machines in their environment, and maybe they’re deploying Kubernetes on top of VMs, but then some are trying to move to a bare-metal implementation to avoid VMs altogether. Virtual machines are really good in a lot of instances… say for example, for running your existing applications. But there’s a Kubernetes service called KubeVirt that allows you to run those applications in VMs on top of containers, instead of the other way around. This offers a lot of flexibility to those who are adopting a modern application development approach, while still maintaining existing apps. First, you can rehost traditional apps within VMs on top of Kubernetes. You can even refactor existing applications. For example, you can run Windows applications on Windows VMs within the environment taking advantage of the container infrastructure. Then while you are building new apps and microservices, you can begin to rearchitect your integration points across your application workflows. When the time is right, you can rebuild that functionality and retire the old application. Taking this approach is a lot less painful than rearchitecting entire workloads for cloud-native. Q: Is cloud repatriation really a thing? A: There are a lot of perspectives on repatriation from the cloud. Some hardware value-added resellers are of the opinion that it is happening quite a bit. Many of their customers had an initiative to move everything to the cloud. Then the company was merged or acquired and someone looked at the costs, and sure, they moved expenses from CapEx to OpEx, but there were runaway projects with little accountability and expanding costs. So, they started moving everything back from the cloud to the core datacenter. I think those situations do exist, but I also think the perspective is skewed a bit. I believe the reality of the situation is that where applications run is really more workload dependent. We continue to see workloads moving to public clouds, and at the same time, some workloads are being repatriated. Let’s take for example, a workload that may need processor accelerators like GPUs or Deep Learning accelerators for a short period of time. It would make perfect sense to offload some of that work in a public cloud deployment because the analyst or data scientist could run the majority of their model on less expensive hardware and then burst to the cloud for the resources they need when they need them. In this way, the organization saves money by not making capital purchases for resources that will largely remain idle. At the same time, a lot of data is restricted or governed and cannot live outside of a corporate firewall. Many countries around the world even restrict companies within their borders from housing data on servers outside of the country domain. These workloads are clearly being repatriated to a datacenter. Many other factors such as costs and data gravity will also contribute to some workloads being repatriated. Another big trend we see is the proliferation of workloads to the edge. In some cases, these edge deployments are connected and can interact with cloud resources, and in others they are disconnected, either because they don’t have access to a network, or due to security restrictions. The positive thing to note with this ongoing transformation, which includes hybrid and multi-cloud deployments as well as edge computing, is that Kubernetes can offer a common experience across all of these underlying infrastructures. Q: How are traditional hardware vendors reinventing themselves to compete? A: This is something we will continue to see unfold over time, but certainly, as we see Kubernetes platforms starting to take the place of virtual machines, there is a lot of interest in building architectures to support it. That said, right now, hardware vendors are starting to make their bets on what segments to go after. For example, there is a compact mode deployment available built on servers targeted at public sector deployments. There is also an AI Accelerator product built with GPUs. There are specific designs for Telco and multi-access edge computing and validated platforms and validated designs for Kubernetes that incorporate AI and Deep Learning accelerators all running on Kubernetes. While the platform architectures and the target workloads or market segments are really interesting to follow, another emerging trend is for hardware companies to offer a full managed service offering to customers built on Kubernetes. Full-scale hardware providers also have amassed quite a bit of expertise with Kubernetes and they have a complete services arm that can provide managed services, not just for the infrastructure, but for the Kubernetes-based platform as well. What’s more, the sophisticated hardware manufacturers have redesigned their financing options so that customers can purchase the service as a utility, regardless of where the hardware is deployed. I don’t remember where I heard it, but some time ago someone once said “Cloud is not a ‘where,’ Cloud is a ‘how.’” Now, with these service offerings, and the cloud-like experience afforded by Kubernetes, organizations can operationalize their expenses regardless of whether the infrastructure is in a public cloud, on-site, at a remote location, or even at the edge. Q: Where does the data live and how is the data accessed? Could you help parse the meaning of “hybrid cloud” versus “distributed cloud” particularly as it relates to current industry trends? A: Organizations have applications running everywhere today: In the cloud, on-premises, on bare metal servers, and in virtual machines. Many are already using multiple clouds in addition to a private cloud or datacenter. Also, a lot of folks are used to running VMs, and they are trying to figure out if they should just run containers on top of existing virtual machines, or move to bare metal. They wonder if they can move more of their processing to the edge. Really, there’s rarely an either-or scenario. There’s just this huge mix-match of technologies and methodologies that are taking place which is why we term this the hybrid cloud. It is really hybrid in many ways, and the goal is to get to a development and delivery mechanism that provides a cloud-like experience. The term Distributed Cloud Computing generally just encompasses the typical cloud infrastructure categories of public, private, hybrid, and multi-cloud. Q: What workloads are emerging? How are edge computing architectures taking advantage of data in Kubernetes? A: For many organizations, being able to gather and process data closer to data sources in combination with new technologies like Artificial Intelligence/Machine Learning or new immersive applications can help build differentiation. By doing so, organizations can react faster, connect everything, anywhere, and deliver better experiences and business outcomes. They are able to use data derived from sensors, video, devices, and other edge devices to make faster data-driven decisions, deploy latency sensitive applications with the experience users expect—no matter where they are, and keep data within geographical boundaries to meet regulatory requirements on data storage and processing. Alongside these business drivers, many organizations also benefit from edge computing as it helps limit the data that needs to be sent to the cloud for processing, decreasing bandwidth usage and costs. It creates resilient sites that can continue to operate, even if the connection to the core datacenter or cloud is lost. And you can optimize resource usage and costs as only necessary services and functionality are deployed to address a use case or problem. Q: How and why will Kubernetes succeed? What challenges still need to be addressed?                     A: Looking at the application modernization options you can venture to guess the breakdown of what organizations are doing, i.e. how many are doing rehost, refactor, rearchitect, etc., and what drives those decisions. When we look at the current state of application delivery, most enterprises today have a mix of modern cloud-native apps and legacy apps. Also, a lot of large enterprises have a huge portfolio of existing apps that are built with traditional architectures, and traditional languages (Java or .NET or maybe C++) or even mainframe apps. These are supporting stateful and stateless workloads. In addition, many are building new apps or modernizing some of those existing apps on new architectures (microservices, APIs) with newer languages/frameworks (Spring, Quarkus, Node.js, etc.). We’re also seeing more interest in building in added intelligence through analytics and AI/ML, and even automating workflows through distributed event driven architectures/serverless/functions. So, as folks are modernizing their applications, a lot of questions come up around when and how to transition existing applications, how do they integrate with their business processes, and what development processes and methodologies are they adopting? Are they using an agile or waterfall methodology? Are they ready to adopt CI/CD pipelines and GitOps to operationalize their workflows and create a continuous application lifecycle? Q: Based on slide #12 from this presentation, should we assume that 76% for databases and data cache are larger, stateful container use cases? A: In most cases, it is safe to assume they will be stateful applications that are using databases, but they don’t necessarily have to be large applications. The beauty of cloud-native deployments is that code doesn’t have to be a huge monolithic application. It can be a set of microservices that are coded together, each piece of code being able to address a certain part of the overall workflow for a particular use case. As such, many pieces of code can be small in nature, but use an underlying database to store relational data. Even services like a container registry service or logging and metrics will use an underlying database. For example, a registry service may have an object store of container images, but then have a database that keeps an index and catalog of those images. If you’re looking for more educational information on Kubernetes, please check out the other webcasts we’ve done on this topic in the SNIA Educational Library.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Kubernetes Trials & Tribulations Q&A: Cloud, Data Center, Edge

Michael Hoard

Jan 5, 2023

title of post
Kubernetes cloud orchestration platforms offer all the flexibility, elasticity, and ease of use — on premises, in a private or public cloud, even at the edge. The flexibility of turning on services when you want them, turning them off when you don’t, is an enticing prospect for developers as well as application deployment teams, but it has not been without its challenges. At our recent SNIA Cloud Storage Technologies Initiative webcast “Kubernetes Trials & Tribulations: Cloud, Data Center, Edge” our experts, Michael St-Jean and Pete Brey, debated both the challenges and advantages of Kubernetes. If you missed the session, it is available on-demand along with the presentation slides. The live audience raised several interesting questions. Here are answers to them from our presenters. Q: Are all these trends coming together? Where will Kubernetes be in the next 1-3 years? A: Adoption rates for workloads like databases, artificial intelligence & machine learning, and data analytics in a container environment are on the rise. These applications are stateful and diverse, so a multi-protocol persistent storage layer built with Kubernetes services is essential. Additionally, Kubernetes-based platforms pave the way for application modernization, but when, and which applications should you move… and how do you do it? There are companies who still have virtual machines in their environment, and maybe they’re deploying Kubernetes on top of VMs, but then some are trying to move to a bare-metal implementation to avoid VMs altogether. Virtual machines are really good in a lot of instances… say for example, for running your existing applications. But there’s a Kubernetes service called KubeVirt that allows you to run those applications in VMs on top of containers, instead of the other way around. This offers a lot of flexibility to those who are adopting a modern application development approach, while still maintaining existing apps. First, you can rehost traditional apps within VMs on top of Kubernetes. You can even refactor existing applications. For example, you can run Windows applications on Windows VMs within the environment taking advantage of the container infrastructure. Then while you are building new apps and microservices, you can begin to rearchitect your integration points across your application workflows. When the time is right, you can rebuild that functionality and retire the old application. Taking this approach is a lot less painful than rearchitecting entire workloads for cloud-native. Q: Is cloud repatriation really a thing? A: There are a lot of perspectives on repatriation from the cloud. Some hardware value-added resellers are of the opinion that it is happening quite a bit. Many of their customers had an initiative to move everything to the cloud. Then the company was merged or acquired and someone looked at the costs, and sure, they moved expenses from CapEx to OpEx, but there were runaway projects with little accountability and expanding costs. So, they started moving everything back from the cloud to the core datacenter. I think those situations do exist, but I also think the perspective is skewed a bit. I believe the reality of the situation is that where applications run is really more workload dependent. We continue to see workloads moving to public clouds, and at the same time, some workloads are being repatriated. Let’s take for example, a workload that may need processor accelerators like GPUs or Deep Learning accelerators for a short period of time. It would make perfect sense to offload some of that work in a public cloud deployment because the analyst or data scientist could run the majority of their model on less expensive hardware and then burst to the cloud for the resources they need when they need them. In this way, the organization saves money by not making capital purchases for resources that will largely remain idle. At the same time, a lot of data is restricted or governed and cannot live outside of a corporate firewall. Many countries around the world even restrict companies within their borders from housing data on servers outside of the country domain. These workloads are clearly being repatriated to a datacenter. Many other factors such as costs and data gravity will also contribute to some workloads being repatriated. Another big trend we see is the proliferation of workloads to the edge. In some cases, these edge deployments are connected and can interact with cloud resources, and in others they are disconnected, either because they don’t have access to a network, or due to security restrictions. The positive thing to note with this ongoing transformation, which includes hybrid and multi-cloud deployments as well as edge computing, is that Kubernetes can offer a common experience across all of these underlying infrastructures. Q: How are traditional hardware vendors reinventing themselves to compete? A: This is something we will continue to see unfold over time, but certainly, as we see Kubernetes platforms starting to take the place of virtual machines, there is a lot of interest in building architectures to support it. That said, right now, hardware vendors are starting to make their bets on what segments to go after. For example, there is a compact mode deployment available built on servers targeted at public sector deployments. There is also an AI Accelerator product built with GPUs. There are specific designs for Telco and multi-access edge computing and validated platforms and validated designs for Kubernetes that incorporate AI and Deep Learning accelerators all running on Kubernetes. While the platform architectures and the target workloads or market segments are really interesting to follow, another emerging trend is for hardware companies to offer a full managed service offering to customers built on Kubernetes. Full-scale hardware providers also have amassed quite a bit of expertise with Kubernetes and they have a complete services arm that can provide managed services, not just for the infrastructure, but for the Kubernetes-based platform as well. What’s more, the sophisticated hardware manufacturers have redesigned their financing options so that customers can purchase the service as a utility, regardless of where the hardware is deployed. I don’t remember where I heard it, but some time ago someone once said “Cloud is not a ‘where,’ Cloud is a ‘how.’” Now, with these service offerings, and the cloud-like experience afforded by Kubernetes, organizations can operationalize their expenses regardless of whether the infrastructure is in a public cloud, on-site, at a remote location, or even at the edge. Q: Where does the data live and how is the data accessed? Could you help parse the meaning of “hybrid cloud” versus “distributed cloud” particularly as it relates to current industry trends? A: Organizations have applications running everywhere today: In the cloud, on-premises, on bare metal servers, and in virtual machines. Many are already using multiple clouds in addition to a private cloud or datacenter. Also, a lot of folks are used to running VMs, and they are trying to figure out if they should just run containers on top of existing virtual machines, or move to bare metal. They wonder if they can move more of their processing to the edge. Really, there’s rarely an either-or scenario. There’s just this huge mix-match of technologies and methodologies that are taking place which is why we term this the hybrid cloud. It is really hybrid in many ways, and the goal is to get to a development and delivery mechanism that provides a cloud-like experience. The term Distributed Cloud Computing generally just encompasses the typical cloud infrastructure categories of public, private, hybrid, and multi-cloud. Q: What workloads are emerging? How are edge computing architectures taking advantage of data in Kubernetes? A: For many organizations, being able to gather and process data closer to data sources in combination with new technologies like Artificial Intelligence/Machine Learning or new immersive applications can help build differentiation. By doing so, organizations can react faster, connect everything, anywhere, and deliver better experiences and business outcomes. They are able to use data derived from sensors, video, devices, and other edge devices to make faster data-driven decisions, deploy latency sensitive applications with the experience users expect—no matter where they are, and keep data within geographical boundaries to meet regulatory requirements on data storage and processing. Alongside these business drivers, many organizations also benefit from edge computing as it helps limit the data that needs to be sent to the cloud for processing, decreasing bandwidth usage and costs. It creates resilient sites that can continue to operate, even if the connection to the core datacenter or cloud is lost. And you can optimize resource usage and costs as only necessary services and functionality are deployed to address a use case or problem. Q: How and why will Kubernetes succeed? What challenges still need to be addressed? A: Looking at the application modernization options you can venture to guess the breakdown of what organizations are doing, i.e. how many are doing rehost, refactor, rearchitect, etc., and what drives those decisions. When we look at the current state of application delivery, most enterprises today have a mix of modern cloud-native apps and legacy apps. Also, a lot of large enterprises have a huge portfolio of existing apps that are built with traditional architectures, and traditional languages (Java or .NET or maybe C++) or even mainframe apps. These are supporting stateful and stateless workloads. In addition, many are building new apps or modernizing some of those existing apps on new architectures (microservices, APIs) with newer languages/frameworks (Spring, Quarkus, Node.js, etc.). We’re also seeing more interest in building in added intelligence through analytics and AI/ML, and even automating workflows through distributed event driven architectures/serverless/functions. So, as folks are modernizing their applications, a lot of questions come up around when and how to transition existing applications, how do they integrate with their business processes, and what development processes and methodologies are they adopting? Are they using an agile or waterfall methodology? Are they ready to adopt CI/CD pipelines and GitOps to operationalize their workflows and create a continuous application lifecycle? Q: Based on slide #12 from this presentation, should we assume that 76% for databases and data cache are larger, stateful container use cases? A: In most cases, it is safe to assume they will be stateful applications that are using databases, but they don’t necessarily have to be large applications. The beauty of cloud-native deployments is that code doesn’t have to be a huge monolithic application. It can be a set of microservices that are coded together, each piece of code being able to address a certain part of the overall workflow for a particular use case. As such, many pieces of code can be small in nature, but use an underlying database to store relational data. Even services like a container registry service or logging and metrics will use an underlying database. For example, a registry service may have an object store of container images, but then have a database that keeps an index and catalog of those images. If you’re looking for more educational information on Kubernetes, please check out the other webcasts we’ve done on this topic in the SNIA Educational Library.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Implications of Doing More at the Edge

Alex McDonald

May 10, 2022

title of post

In our SNIA Networking Storage Forum webcast series, “Storage Life on the Edge” we’ve been examining the many ways the edge is impacting how data is processed, analyzed and stored. I encourage you to check out the sessions we’ve done to date:

On July 12, 2022, we continue the series with “Storage Life on the Edge: Accelerated Performance Strategies” where our SNIA experts will discuss the need for faster computing, access to storage, and movement of data at the edge as well as between the edge and the data center, covering:

  • The rise of intelligent edge locations
  • Different solutions that provide faster processing or data movement at the edge
  • How computational storage can speed up data processing and transmission at the edge
  • Security considerations for edge processing

We look forward to having you join us to cover all this and more. We promise to keep you on the edge of your virtual seat! Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Cloud Storage