In order to fully unlock the potential of the NVMe® IP
based SANs, we first need to address the manual and error prone process that is
currently used to establish connectivity between NVMe Hosts and NVM subsystems.
Several leading companies in the industry have joined together through NVM
Express to collaborate on innovations to simplify and automate this discovery
process.
This was the topic of discussion at our recent SNIA Networking
Storage Forum webcast “NVMe-oF:
Discovery Automation for IP-based SANs” where our experts, Erik
Smith and Curtis Ballard, took a deep dive on the work that is being done to
address these issues. If you missed the live event, you can watch it on
demand here and get a copy of the slides. Erik and Curtis did not
have time to answer all the questions during the live presentation. As
promised, here are answers to them all.
Q. Is the Centralized Discovery Controller (CDC)
highly available, and is this visible to the hosts? Do they see a pair of CDCs on the network and
retry requests to a secondary if a primary is not available?
A. Each CDC instance is intended to be highly
available. How this is accomplished will be specific to vendor and deployment
type. For example, a CDC running inside of an ESX based VM can leverage
VMware’s high availability (HA) and fault tolerant (FT) functionality. For most
implementations the HA functionality for a specific CDC is expected to be
implemented using methods that are not visible to the hosts. In addition, each host will be able to access
multiple CDC instances (e.g., one per “IP-Based SAN”). This ensures
any problems encountered with any single CDC instance will not impact all paths
between the host and storage. One point to note, it is not expected that there
will be multiple CDC instances visible to each host via a single host
interface. Although this is allowed per
the specification, it does make it much harder for administrators to
effectively manage connectivity.
Q. First: Isn’t the CDC the perfect Denial-of-Service
(DoS) attack target? Being the ‘name server’ of NVMe-oF, when the CDC is
compromised no storage is available anymore. Second: the CDC should be running
as a multi-instance cluster to realize high availability or even better, have
the CDC distributed like name-server in Fibre Channel (FC).
A. With regard to denial-of-service attacks. Both FC’s
Name Server and NVMe-oF’s CDC are susceptible to this type of problem and both
have the ability to mitigate these types of concerns. FC can fence or shut a
port that has a misbehaving end device is attached to it. The same can be done
with Ethernet, especially when the “underlay config service”
mentioned during the presentation is in use. In addition, the CDC’s role is
slightly different than FC’s Name Server. If a denial-of-service attack was
successfully executed against a CDC instance, existing host to storage
connections would remain intact. Hosts that are rebooted or disconnected and
reconnected could have a problem connecting to the CDC and could have a problem
reconnecting to storage via the IP SAN that is experiencing the DoS attack.
For the second concern, it’s all about the implementation. Nothing in the
standard prevents the CDC from running in an HA or FT mode. When the CDC is
deployed as a VM, hypervisor resident tools can be leveraged to provide HA and
FT functionality. When the CDC is deployed as a collection of microservices
that are running on the switches in an IP SAN, the services will be distributed.
One implementation available today uses distributed microservices to
enable scaling to meet or exceed what is possible with an FC SAN today.
Q. Would Dell/SFSS CDC be offered as a generic Open
Source in public domain so other third parties don’t have to develop their own
CDC? If other third parties do not use Dell’s open CDC and develop their own
CDC, how would that multi-CDC control work for a customer with multi-vendor
storage arrays with multiple CDCs?
A. Dell/SFSS CDC will not be open source. However, the reason Dell worked with HPE and other storage vendors on the 8009 and 8010 specifications, is to ensure that whichever CDC instance the customer chooses to deploy, all storage vendors will be able to discover and interoperate with it. Dell’s goal is to create an NVMe IP-Based SAN ecosystem. As a result, Dell will work to make the CDC implementation (SFSS) interoperate with every NVMe IP-Based product, regardless of the vendor. The last thing anyone wants is for customers to have to worry about basic compatibility.
Q. Does this work only for IPv4? We’re going towards
IPv6 only environments?
A. Both IPv4 and
IPv6 can be used.
Q. Will the
discovery domains for this be limited to a single Ethernet broadcast domain or will
there be mechanisms to scale out discovery to alternate subnets, like we see
with DHCP & DHCP Relay?
A. By default
mDNS is constrained to a single broadcast domain. As a result, when you deploy
a CDC as a VM, if the IP SAN consists of multiple broadcast domains per IP SAN
instance (e.g., IP SAN A = VLANs/subnets 1, 2 and 3; IP SAN B = VLANs/subnets
10, 11 and 13) then you’ll need to ensure an interface from the VM is attached
to each VLAN/subnet. However, creating a VM interface for each subnet is a sub-optimal
user experience and as a result, there is the concept of an mDNS proxy in the
standard. The mDNS proxy is just an mDNS responder that resides on the switch
(similar concept to a DHCP proxy) that can respond to mDNS requests on each
broadcast domain and point the end devices to the IP Address of the CDC (which could
be on a different subnet). When you are selecting a switch vendor to use for
your IP SAN, ensure you ask if they support an mDNS proxy. If they do not, you
will need to do extra work to get your IP SAN configured properly. When you
deploy the CDC as a collection of services running on the switches in an IP
SAN, one of these services could be an mDNS responder. This is how Dell’s SFSS
will be handling this situation.
One final point about IP SANs that span multiple subnets: Historically, these
types of configurations have been difficult to administer because of the need
to configure and maintain static route table entries on each host. NVM Express
has done an extensive amount of work with 8010 to ensure that we can eliminate
the need to configure static routes. For more information about the solution to
this problem, take a look at nvme-stas on github.
Q. A question for
Erik: If mDNS turned out to be a problem, how did you work around it?
A. mDNS is
actually a problem right now because none of the implementations in the first
release actually support it. In the second release this limitation is resolved.
In any case, the only issue I am expecting with mDNS will be environments that
don’t want to use it (for one reason or another) or can’t use it (because the
switch vendor does not support an mDNS proxy). In these situations, you can
administratively configure the IP address of the CDC on the host and storage
subsystem.
Q. Just a
comment. In my humble opinion, slide 15 is the most important slide at
helping people see- at a glance – what these things are. Nice slide.
A. Thanks, slide
15 was one of the earliest diagrams we created to communicate the concept of
what we’re trying to do.
Q. With Fibre Channel there are really
only two HBA vendors and two switch vendors, so interoperability, even with
vendor-specific implementations, is manageable. For Ethernet, there are many
NIC and Switch vendors. How is interoperability going to be ensured in this
more complex ecosystem?
A. The FC discovery protocol is very stable. We have not
seen any basic interop issues related to login and discovery for years.
However, back in the early days (’98-02), we needed to validate each
HBA/Driver/FW version and do so for each OS we wanted to support. With
NVMe/TCP, each discovery client is software based and OS specific (not HBA
specific). As a result, we will only have
two host-based discovery client implementations for now (ESX and Linux – see nvme-stas)
and a discovery client for each storage OS. To date, we have been pleasantly surprised
at the lack of interoperability issues we’ve seen as storage platforms have
started integrating with CDC instances. Although it is likely we will see some issues
as other storage vendors start to integrate with CDC instances from different
storage vendors.
Q. A lot of companies will want to use NVMe-oF via IP/Ethernet in a micro-segmented network. There are a lot of L3/routing steps to reach the target. This presentation did not go into this part of scalability, only into scalability of discovery. Today, all networks are now L3 with smaller and smaller subnets with lot of L3-points.
A. This is a fair
point. We didn’t have time to go into the work we have done to address the
types of routing issues we’re anticipating and what we’ve done to mitigate
them. However, nvme-stas, The Dell sponsored open-source
discovery client for Linux, demonstrates how we use the CDC to work around
these types of issues. Erik is planning to write about this topic on his blog brasstacksblog.typepad.com. You can follow him on twitter @provandal to make sure you don’t miss it.
Leave a Reply