Author:

J Metz

Company : Rockport Networks

 
 
author

Virtualization and Storage Networking Best Practices from the Experts

J Metz

Nov 26, 2018

title of post
Ever make a mistake configuring a storage array or wonder if you're maximizing the value of your virtualized environment? With all the different storage arrays and connectivity protocols available today, knowing best practices can help improve operational efficiency and ensure resilient operations. That's why the SNIA Networking Storage Forum is kicking off 2019 with a live webcast "Virtualization and Storage Networking Best Practices." In this webcast, Jason Massae from VMware and Cody Hosterman from Pure Storage will share insights and lessons learned as reported by VMware's storage global services by discussing:
  • Common mistakes when setting up storage arrays
  • Why iSCSI is the number one storage configuration problem
  • Configuring adapters for iSCSI or iSER
  • How to verify your PSP matches your array requirements
  • NFS best practices
  • How to maximize the value of your array and virtualization
  • Troubleshooting recommendations
Register today to join us on January 17th. Whether you've been configuring storage for VMs for years or just getting started, we think you will pick up some useful tips to optimize your storage networking infrastructure.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Virtualization and Storage Networking Best Practices from the Experts

J Metz

Nov 26, 2018

title of post
Ever make a mistake configuring a storage array or wonder if you’re maximizing the value of your virtualized environment? With all the different storage arrays and connectivity protocols available today, knowing best practices can help improve operational efficiency and ensure resilient operations. That’s why the SNIA Networking Storage Forum is kicking off 2019 with a live webcast “Virtualization and Storage Networking Best Practices.” In this webcast, Jason Massae from VMware and Cody Hosterman from Pure Storage will share insights and lessons learned as reported by VMware’s storage global services by discussing:
  • Common mistakes when setting up storage arrays
  • Why iSCSI is the number one storage configuration problem
  • Configuring adapters for iSCSI or iSER
  • How to verify your PSP matches your array requirements
  • NFS best practices
  • How to maximize the value of your array and virtualization
  • Troubleshooting recommendations
Register today to join us on January 17th. Whether you’ve been configuring storage for VMs for years or just getting started, we think you will pick up some useful tips to optimize your storage networking infrastructure.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Centralized vs. Distributed Storage FAQ

J Metz

Oct 2, 2018

title of post
To date, thousands have watched our "Great Storage Debate" webcast series. Our most recent installment of this friendly debate (where no technology actually emerges as a "winner") was Centralized vs. Distributed Storage. If you missed it, it's now available on-demand. The live event generated several excellent questions which our expert presenters have thoughtfully answered here: Q. Which performs faster, centralized or distributed storage? A. The answer depends on the type of storage, the type of connections to the storage, and whether the compute is distributed or centralized. The stereotype is that centralized storage performs faster if the compute is local, that is if it's in the same data center as the centralized storage. Distributed storage is often using different (less expensive) storage media and designed for slower WAN connections, but it doesn't have to be so. Distributed storage can be built with the fastest storage and connected with the fastest networking, but it rarely is used that way. Also it can outperform centralized storage if the compute is distributed in a similar way to the distributed storage, letting each compute node access the data from a local node of the distributed storage. Q. What about facilities costs in either environment? Ultimately the data has to physically "land" somewhere and use power/cooling/floor space. There is an economy of scale in centralized data centers, how does that compare with distributed? A. One big difference is in the cost of power between various data centers. Typically, data centers tend to be the places where businesses have had traditional office space and accommodation for staff. Unfortunately, these are also areas of power scarcity and are consequently expensive to run. Distributed data centers can be in much cheaper locations; there are a number for instance in Iceland where geothermally generated electricity is very cheap, and environmental cooling is effectively free. Plus, the thermal cost per byte can be substantially lower in distributed data centers by efficiently packing drives to near capacity with compressed data. Learn more about data centers in Iceland here. Another difference is that distributed storage might consume less space if its data protection method (such as erasure coding) is more efficient than the data protection method used by centralized storage (typically RAID or triple replication). While centralized storage can also use erasure coding, compression, and deduplication, it's sometimes easier to apply these storage efficiency technologies to distributed storage. Q. What is sharding? A. Sharding is the process of breaking up, typically a database, into a number of partitions, and then putting these pieces or shards on separate storage devices or systems. The partitioning is normally a horizontal partition; that is, the rows of the database remain complete in a shard and some criteria (often a key range) is used to make each shard. Sharding is often used to improve performance, as the data is spread across multiple devices which can be accessed in parallel. Sharding should not be confused with erasure coding used for data protection. Although this also breaks data into smaller pieces and spreads it across multiple devices, each part of the data is encoded and can only be understood once a minimum number of the fragments have been read and the data has been reconstituted on some system that has requested it. Q. What is the preferred or recommended choice of NVMe over Fabrics (NVME-oF) for centralized vs. distributed storage systems for prioritized use-case scenarios such as data integrity, latency, number of retries for read-write/resource utilization? A. This is a straightforward cost vs. performance question. This kind of solution only makes sense if the compute is very close to the data; so either a centralized SAN, or a (well-defined) distributed system in one location with co-located compute would make sense. Geographically dispersed data centers or compute on remote data adds too much latency, and often bandwidth issues can add to the cost. Q. Is there a document that has catalogued the impact of latency on the many data types? When designing storage I would start with how much latency an application can withstand. A. We are not aware of any single document that has done so, but many applications (along with their vendors, integrators, and users) have documented their storage bandwidth and latency needs. Other documents show the impact of differing storage latencies on application performance. Generally speaking one could say the following about latency requirements, though exceptions exist to each one:
  • Block storage wants lower latency than file storage, which wants lower latency than object storage
  • Large I/O and sequential workloads tolerate latency better than small I/O and random workloads
  • One-way streaming media, backup, monitoring and asynchronous replication care more about bandwidth than latency. Two-way streaming (e.g. videoconferencing or IP telephony), database updates, interactive monitoring, and synchronous replication care more about latency than bandwidth.
  • Real-time applications (remote control surgery, multi-person gaming, remote AR/VR, self-driving cars, etc.) require lower latency than non-real-time ones, especially if the real-time interaction goes both ways on the link.
One thing to note is that many factors affect performance of a storage system. You may want to take a look at our excellent Performance Benchmark webinar series to find out more. Q. Computation faces an analogous debate between distributed compute vs. centralized compute. Please comment on how the computation debate relates to the storage debate. Typically, distributed computation will work best with distributed storage. Ditto for centralized computation and storage. Are there important applications where a user would go for centralized compute and distributed storage? Or distributed compute and centralized storage? A. That's a very good question, to which there is a range of not so very good answers! Here are some application scenarios that require different thinking about centralized vs. distributed storage. Video surveillance is best with distributed storage (and perhaps a little local compute to do things like motion detection or object recognition) with centralized compute (for doing object identification or consolidation of multiple feeds). Robotics requires lots of distributed compute; think self-driving cars, where the analysis of a scene and the motion of the vehicle needs to be done locally, but where all the data on traffic volumes and road conditions needs multiple data sources to be processed centrally. There are lots of other (often less exciting but just as important) applications that have similar requirements; retail food sales with smart checkouts (that part is all local) and stock management systems & shipping (that part is heavily centralized). In essence, sometimes it's easier to process the data where it's born, rather than move it somewhere else. Data is "sticky", and that sometimes dictates that the compute should be where the data lies. Equally, it's also true that the only way of making sense of distributed data is to centralize it; weather stations can't do weather forecasting, so it needs to be unstuck, collected up & transmitted, and then computed centrally.We hope you enjoyed this un-biased, vendor-neutral debate. You can check out the others in this series below: Follow us @SNIAESF for more upcoming webcasts.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Centralized vs. Distributed Storage FAQ

J Metz

Oct 2, 2018

title of post
To date, thousands have watched our “Great Storage Debate” webcast series. Our most recent installment of this friendly debate (where no technology actually emerges as a “winner”) was Centralized vs. Distributed Storage. If you missed it, it’s now available on-demand. The live event generated several excellent questions which our expert presenters have thoughtfully answered here: Q. Which performs faster, centralized or distributed storage? A. The answer depends on the type of storage, the type of connections to the storage, and whether the compute is distributed or centralized. The stereotype is that centralized storage performs faster if the compute is local, that is if it’s in the same data center as the centralized storage. Distributed storage is often using different (less expensive) storage media and designed for slower WAN connections, but it doesn’t have to be so. Distributed storage can be built with the fastest storage and connected with the fastest networking, but it rarely is used that way. Also it can outperform centralized storage if the compute is distributed in a similar way to the distributed storage, letting each compute node access the data from a local node of the distributed storage. Q. What about facilities costs in either environment? Ultimately the data has to physically “land” somewhere and use power/cooling/floor space. There is an economy of scale in centralized data centers, how does that compare with distributed? A. One big difference is in the cost of power between various data centers. Typically, data centers tend to be the places where businesses have had traditional office space and accommodation for staff. Unfortunately, these are also areas of power scarcity and are consequently expensive to run. Distributed data centers can be in much cheaper locations; there are a number for instance in Iceland where geothermally generated electricity is very cheap, and environmental cooling is effectively free. Plus, the thermal cost per byte can be substantially lower in distributed data centers by efficiently packing drives to near capacity with compressed data. Learn more about data centers in Iceland here. Another difference is that distributed storage might consume less space if its data protection method (such as erasure coding) is more efficient than the data protection method used by centralized storage (typically RAID or triple replication). While centralized storage can also use erasure coding, compression, and deduplication, it’s sometimes easier to apply these storage efficiency technologies to distributed storage. Q. What is sharding? A. Sharding is the process of breaking up, typically a database, into a number of partitions, and then putting these pieces or shards on separate storage devices or systems. The partitioning is normally a horizontal partition; that is, the rows of the database remain complete in a shard and some criteria (often a key range) is used to make each shard. Sharding is often used to improve performance, as the data is spread across multiple devices which can be accessed in parallel. Sharding should not be confused with erasure coding used for data protection. Although this also breaks data into smaller pieces and spreads it across multiple devices, each part of the data is encoded and can only be understood once a minimum number of the fragments have been read and the data has been reconstituted on some system that has requested it. Q. What is the preferred or recommended choice of NVMe over Fabrics (NVME-oF) for centralized vs. distributed storage systems for prioritized use-case scenarios such as data integrity, latency, number of retries for read-write/resource utilization? A. This is a straightforward cost vs. performance question. This kind of solution only makes sense if the compute is very close to the data; so either a centralized SAN, or a (well-defined) distributed system in one location with co-located compute would make sense. Geographically dispersed data centers or compute on remote data adds too much latency, and often bandwidth issues can add to the cost. Q. Is there a document that has catalogued the impact of latency on the many data types? When designing storage I would start with how much latency an application can withstand. A. We are not aware of any single document that has done so, but many applications (along with their vendors, integrators, and users) have documented their storage bandwidth and latency needs. Other documents show the impact of differing storage latencies on application performance. Generally speaking one could say the following about latency requirements, though exceptions exist to each one:
  • Block storage wants lower latency than file storage, which wants lower latency than object storage
  • Large I/O and sequential workloads tolerate latency better than small I/O and random workloads
  • One-way streaming media, backup, monitoring and asynchronous replication care more about bandwidth than latency. Two-way streaming (e.g. videoconferencing or IP telephony), database updates, interactive monitoring, and synchronous replication care more about latency than bandwidth.
  • Real-time applications (remote control surgery, multi-person gaming, remote AR/VR, self-driving cars, etc.) require lower latency than non-real-time ones, especially if the real-time interaction goes both ways on the link.
One thing to note is that many factors affect performance of a storage system. You may want to take a look at our excellent Performance Benchmark webinar series to find out more. Q. Computation faces an analogous debate between distributed compute vs. centralized compute. Please comment on how the computation debate relates to the storage debate. Typically, distributed computation will work best with distributed storage. Ditto for centralized computation and storage. Are there important applications where a user would go for centralized compute and distributed storage? Or distributed compute and centralized storage? A. That’s a very good question, to which there is a range of not so very good answers! Here are some application scenarios that require different thinking about centralized vs. distributed storage. Video surveillance is best with distributed storage (and perhaps a little local compute to do things like motion detection or object recognition) with centralized compute (for doing object identification or consolidation of multiple feeds). Robotics requires lots of distributed compute; think self-driving cars, where the analysis of a scene and the motion of the vehicle needs to be done locally, but where all the data on traffic volumes and road conditions needs multiple data sources to be processed centrally. There are lots of other (often less exciting but just as important) applications that have similar requirements; retail food sales with smart checkouts (that part is all local) and stock management systems & shipping (that part is heavily centralized). In essence, sometimes it’s easier to process the data where it’s born, rather than move it somewhere else. Data is “sticky”, and that sometimes dictates that the compute should be where the data lies. Equally, it’s also true that the only way of making sense of distributed data is to centralize it; weather stations can’t do weather forecasting, so it needs to be unstuck, collected up & transmitted, and then computed centrally.We hope you enjoyed this un-biased, vendor-neutral debate. You can check out the others in this series below: Follow us @SNIAESF for more upcoming webcasts.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

We’re Debating Again: Centralized vs. Distributed Storage

J Metz

Sep 4, 2018

title of post
We hope you’ve been following the SNIA Ethernet Storage Forum (ESF) “Great Storage Debates” webcast series. We’ve done four so far and they have been incredibly popular with 4,000 live and on-demand views to date and counting. Check out the links to all of them at the end of this blog. Although we have “versus” in the title of these presentations, the goal of this series is not to have a winner emerge, but rather provide a “compare and contrast” that educates attendees on how the technologies work, the advantages of each, and to explore common use cases. That’s exactly what we plan to do on September 11, 2018 when we host “Centralized vs. Distributed Storage.” In the history of enterprise storage there has been a trend to move from local storage to centralized, networked storage. Customers found that networked storage provided higher utilization, centralized and hence cheaper management, easier failover, and simplified data protection amongst many advantages, which drove the move to FC-SAN, iSCSI, NAS and object storage. Recently, however, distributed storage has become more popular where storage lives in multiple locations, but can still be shared over a LAN (Local Area Network) and/or WAN (Wide Area Network). The advantages of distributed storage include the ability to scale out capacity. Conversely, in the hyperconverged use case, enterprises can use each node for both compute and storage, and scale-up as more resources are needed. What does this all mean? Register for this live webcast to find out, where my ESF colleagues and I will discuss:
  • Pros and cons of centralized vs. distributed storage
  • Typical use cases for centralized and distributed storage
  • How SAN, NAS, parallel file systems, and object storage fit in these different environments
  • How hyperconverged has introduced a new way of consuming storage
It’s sure to be another un-biased, vendor-neutral look at a storage topic many are debating within their own organizations. I hope you’ll join us on September 11th. In the meantime, I encourage you to watch our on-demand debates: Learn about the work SNIA is doing to lead the storage industry worldwide in developing and promoting vendor-neutral architectures, standards, and educational services that facilitate the efficient management, movement, and security of information by visiting snia.org.    

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Controllers – Your Questions Answered

J Metz

Jun 4, 2018

title of post
The term controller is used constantly, but often has very different meanings. When you have a controller that manages hardware, there are very different requirements than a controller that manages an entire system-wide control plane. You can even have controllers managing other controllers. It can all get pretty confusing very quickly. That's why the SNIA Ethernet Storage Forum (ESF) hosted our 9th "Too Proud to Ask" webcast. This time it was "Everything You Wanted to Know about Storage but were Too Proud to Ask: Part Aqua – Storage Controllers." Our experts from Microsemi, Cavium, Mellanox and Cisco did a great job explaining the differences between the many types of controllers, but of course there were still questions. Here are answers to all that we received during the live event which you can now view on-demand. Q.Is there a standard for things such as NVMe over TCP/IP? A. NVMe™ is in the process of standardizing a TCP transport. It will be called NVMe over TCP (NVMe™/TCP) and the technical proposal should be completed and public later in 2018. Q. What are the length limits on NVMe over fibre? A. There are no length limits. Multiple Fibre Channel frames can be combined to create any length transfer needed. The Fibre Channel Industry Association has a very good presentation on Long-Distance Fibre Channel, which you can view here. Q. What does the term "Fabrics" mean in the storage context? A. Fabrics typically applies to the switch or switches interconnecting the hosts and storage devices. Specifically, a storage "fabric" maintains some knowledge about itself and the devices that are connected to it, but some people use it to mean any networked devices that provide storage. In this context, "Fabrics" is also shorthand for "NVMe over Fabrics," which refers to the ability to run the NVMe protocol over an agnostic networking transport, such as RDMA-based Ethernet, Fibre Channel, and InfiniBand (TCP/IP coming soon). Q. How does DMA result in lower power consumption? A. DMA is typically done using a harder DMA engine on the controller. This offloads the transfer from the host CPU which is typically higher power than the logic of the DMA engine. Q. How does the latency of NVMe over Fibre compare to NVMe over PCIe? A. The overall goal of having NVMe transported over any fabric is not to exceed 20us of latency above and beyond a PCIe-based NVMe solution. Having said that, there are many aspects of networked storage that can affect latency, including number of hops, topology size, oversubscription ratios, and cut-through/store-and-forward switching. Individual latency metrics are published by specific vendors. We recommend you contact your favorite Fibre Channel vendor for their numbers. Q. Which of these technologies will grow and prevail over the next 5-10 years... A. That is the $64,000 question, isn't it? J The basic premise of this presentation was to help illuminate what controllers are, and the different types that exist within a storage environment. No matter what specific flavor becomes the most popular, these basic tenets will remain in effect for the foreseeable future. Q. I am new to Storage matters, but I have been an IT tech for almost 10 years. Can you explain Block vs. File IO? A. We're glad you asked! We highly recommend you take a look at another one of our webinars, Block vs. File vs. Object Storage, which covers that very subject! If you have an idea for another topic you're "Too Proud to Ask" about, let us know by commenting in this blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Controllers – Your Questions Answered

J Metz

Jun 4, 2018

title of post
The term controller is used constantly, but often has very different meanings. When you have a controller that manages hardware, there are very different requirements than a controller that manages an entire system-wide control plane. You can even have controllers managing other controllers. It can all get pretty confusing very quickly. That’s why the SNIA Ethernet Storage Forum (ESF) hosted our 9th “Too Proud to Ask” webcast. This time it was “Everything You Wanted to Know about Storage but were Too Proud to Ask: Part Aqua – Storage Controllers.” Our experts from Microsemi, Cavium, Mellanox and Cisco did a great job explaining the differences between the many types of controllers, but of course there were still questions. Here are answers to all that we received during the live event which you can now view on-demand Q. Is there a standard for things such as NVMe over TCP/IP? A. NVMe™ is in the process of standardizing a TCP transport. It will be called NVMe over TCP (NVMe™/TCP) and the technical proposal should be completed and public later in 2018. Q. What are the length limits on NVMe over fibre? A. There are no length limits. Multiple Fibre Channel frames can be combined to create any length transfer needed. The Fibre Channel Industry Association has a very good presentation on Long-Distance Fibre Channel, which you can view here. Q. What does the term “Fabrics” mean in the storage context? A. Fabrics typically applies to the switch or switches interconnecting the hosts and storage devices. Specifically, a storage “fabric” maintains some knowledge about itself and the devices that are connected to it, but some people use it to mean any networked devices that provide storage. In this context, “Fabrics” is also shorthand for “NVMe over Fabrics,” which refers to the ability to run the NVMe protocol over an agnostic networking transport, such as RDMA-based Ethernet, Fibre Channel, and InfiniBand (TCP/IP coming soon). Q. How does DMA result in lower power consumption? A. DMA is typically done using a harder DMA engine on the controller. This offloads the transfer from the host CPU which is typically higher power than the logic of the DMA engine. Q. How does the latency of NVMe over Fibre compare to NVMe over PCIe? A. The overall goal of having NVMe transported over any fabric is not to exceed 20us of latency above and beyond a PCIe-based NVMe solution. Having said that, there are many aspects of networked storage that can affect latency, including number of hops, topology size, oversubscription ratios, and cut-through/store-and-forward switching. Individual latency metrics are published by specific vendors. We recommend you contact your favorite Fibre Channel vendor for their numbers. Q. Which of these technologies will grow and prevail over the next 5-10 years… A. That is the $64,000 question, isn’t it? J The basic premise of this presentation was to help illuminate what controllers are, and the different types that exist within a storage environment. No matter what specific flavor becomes the most popular, these basic tenets will remain in effect for the foreseeable future. Q. I am new to Storage matters, but I have been an IT tech for almost 10 years. Can you explain Block vs. File IO? A. We’re glad you asked! We highly recommend you take a look at another one of our webinars, Block vs. File vs. Object Storage, which covers that very subject! If you have an idea for another topic you’re “Too Proud to Ask” about, let us know by commenting in this blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Controllers – Are You Too Proud to Ask?

J Metz

Apr 5, 2018

title of post
Are you a control freak? Have you ever wondered what the difference was between a storage controller, a RAID controller, a PCIe Controller, or a metadata controller? What about an NVMe controller? Aren't they all the same thing? On May 15, 2018, the SNIA Ethernet Storage Forum will tackle these questions and more in "Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Aqua: Storage Controllers."  In this live webcast, our experts will take an unusual step of focusing on a term that is used constantly, but often has different meanings. When you have a controller that manages hardware, there are very different requirements than a controller that manages an entire system-wide control plane. From the outside looking in, it may be easy to get confused. You can even have controllers managing other controllers! In Part Aqua we'll be revisiting some of the pieces we talked about in Part Chartreuse, where we covered the basics, but with a bit more focus on the variety we have to play with:
  • What do we mean when we say "controller?"
  • How are the systems managed differently?
  • How are controllers used in various storage entities: drives, SSDs, storage networks, software-defined
  • How do controller systems work, and what are the trade-offs?
  • How do storage controllers protect against Spectre and Meltdown?
I hope you will register today and join us on May 15th to learn more about the workhorse behind your favorite storage systems.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Controllers – Are You Too Proud to Ask?

J Metz

Apr 5, 2018

title of post
Are you a control freak? Have you ever wondered what the difference was between a storage controller, a RAID controller, a PCIe Controller, or a metadata controller? What about an NVMe controller? Aren’t they all the same thing? On May 15, 2018, the SNIA Ethernet Storage Forum will tackle these questions and more in “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Aqua: Storage Controllers.” In this live webcast, our experts will take an unusual step of focusing on a term that is used constantly, but often has different meanings. When you have a controller that manages hardware, there are very different requirements than a controller that manages an entire system-wide control plane. From the outside looking in, it may be easy to get confused. You can even have controllers managing other controllers! In Part Aqua we’ll be revisiting some of the pieces we talked about in Part Chartreuse, where we covered the basics, but with a bit more focus on the variety we have to play with:
  • What do we mean when we say “controller?”
  • How are the systems managed differently?
  • How are controllers used in various storage entities: drives, SSDs, storage networks, software-defined
  • How do controller systems work, and what are the trade-offs?
  • How do storage controllers protect against Spectre and Meltdown?
I hope you will register today and join us on May 15th to learn more about the workhorse behind your favorite storage systems.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Why is Blockchain Storage Different?

J Metz

Nov 19, 2017

title of post
The SNIA Ethernet Storage Forum (ESF), specifically ESF Vice Chair, Alex McDonald, spent Halloween explaining storage requirements for modern transactions in our webcast, “Transactional Models & Their Storage Requirements.” Starting with the fascinating history of the first transactional system in a bakery in 1951 (really!), to a discussion on Bitcoin, it was an insightful look at the changing role of storage amid modern transactions. If you missed it, you can watch it on-demand at your convenience. We received some great questions during the live event. Here are answers to them all: Q. How many nodes are typical in the blockchain ledger? A. As many as are required to ensure that a single node or small number of nodes can’t crack the hard problem that gives you the right to add your blockchain as the next. There are estimated to be more than 11,000 Bitcoin nodes right now (see https://bitnodes.earn.com/), but not all blockchain systems have as many nodes (or such a hard problem to crack!) Q. A traditional DB like Oracle…how does it fit into CAP? What two features would it have? A. A traditional database doesn’t have an unreliable geographically spread network connecting parts of the database (or at least, it shouldn’t). The CAP theorem (Consistency, Availability & Partition tolerance – pick any two from three) doesn’t apply to traditional databases like Oracle. These kinds of systems provide ACIDity; Wikipedia has an excellent article on the subject. Q. Is block same as node? I thought each node has a copy of each block. A. That’s correct. Ledger transactions and blocks get distributed to all the nodes, and every node has a copy of the entire blockchain. Q. Hey! Bitcoin isn’t just limited to criminals! C’mon! A. Blockchain is more than cryptocurrencies, and is increasingly important for legit businesses too. For instance, just this week American Express Is Getting Into Blockchain-Based Payments With Ripple. Other applications across a broad range are being introduced; in healthcare for instance, for digital identity, smart contracts and digital voting to name a few. Q. Why is blockchain storage different from more regular storage we might have today? A. Let’s take the example of a cryptocurrency like Bitcoin. Blockchain ledgers are transactional in nature. So if I have 200 transactions in my wallet, they are spread across all the blockchains where they are recorded, and there isn’t a central value of what’s in my wallet. To get that, I need to add up all the wallet transactions recorded in every blockchain; and the blockchains can be pretty big. The Bitcoin blockchain is more than 140GB as of November 2017, and growing. That demands fast (both high bandwidth and low latency) storage; in memory as much as possible, with Flash based SSD for persistence. Other blockchain technologies will have equivalent or even more demanding requirements. If you have questions about this topic, please comment on this blog below. And if you’re interested in more vendor-neutral education on storage topics, check out the full library of SNIA ESF webcasts where we cover all thing related to Ethernet storage, including block, file and object storage, RDMA and NVMe over Fabrics, storage performance benchmarking, containers and our very popular 101 series “Everything You Wanted To Know About Storage But Were Too Proud To Ask.” Happy viewing!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to J Metz