At our recent SNIA Networking Storage Forum (NSF) webcast “How Fibre Channel Hosts and Targets Really Communicate” our Fibre Channel (FC) experts explained exactly how Fibre Channel works, starting with the basics on the FC networking stack, link initialization, port types, and flow control, and then dove into the details on host/target logins and host/target IO. It was a great tutorial on Fibre Channel. If you missed it, you can view it on-demand. The audience asked several questions during the live event. Here are answers to them all:
Q. What is the most common problem that we face in the FC protocol?
A. Much the same as any other network protocol, congestion is the most common problem found in FC SANs. It can take a couple of forms including, but not limited to, host oversubscription and “Fan-in/Fan-out” ratios of host ports to storage ports, but it is probably the single largest generator of support cases. Another common problem is the ‘Host cannot see target’ kind of problem.
Q. What are
typical latencies for N-to-N (node-to-node) Port and N-F-N (one switch
between)?
A. Latencies
vary from switch type to switch type and also vary based on the type of
forwarding that is done. Port to port on a switch, I would say is from 1us to 5us
in general.
Q. Has the Fabric Shortest Path First (FSPF) always been there or is there a minimum FC speed at which it was introduced in? Also, how is the FSPF determined? Is it via shortest path only or does it also take into account for speeds of the switches along the path?
A. While
Fibre Channel has existed since 1993 at 133Mbit speed, FSPF was developed by the INCITS T11 Technical Committee
and was published in 2000 as
a cost-based Link State routing protocol. Costs are based on link speeds. The
higher the link speed, the lower the cost. The cost is 1012 /
Bandwidth(bps) = cost. There have been variations of implementations that
allowed the network administrator to artificially set a link cost and force
traffic into a path, but the better case is to simply allow FSPF to do its
normal work. And yes, the link costs are considered for all of the intermediate
devices along the path.
Q. All of this FSPF happens without even us noticing, right? Or
do we need to manually configure?
A. Yes, all of the FSPF routing
happens without any manual configuration. Most users don’t even realize there
is an underlying routing protocol.
Q. Is it a best practice to have all
ports in the system run at the same speed? We have storage connected at 32Gb
interfaces and a hundred clients with 16Gb interfaces. Would this make the
switch’s job easier?
A. It’s virtually impossible to have
all ports of a FC SAN (or any network of size) connect at the same speed. In fact,
the more common environment is for multiple versions of server and storage
technology to have been “organically grown over time” in the datacenter. Even
if that was somehow done, then there still can be congestion caused by hosts
and targets requesting data from multiple simultaneous sources. So, having a
uniform speed doesn’t really fix anything even if it might make some things a
bit better. That said, it is always helpful to make certain that your HBA
device drivers and firmware versions are up to date.
Q. From your experience, is there any place where
the IO has gone wrong?
A. Not sure what ‘IO gone wrong’
means. All frames that transverse the SAN are cyclic
redundancy check (CRC) checked. That might happen on each hop or it
might just happen at the end devices. But frames that are found to be corrupted
should never be incorporated into the LUN.
Q. Is there a fabric notification feature for
these backpressure events?
A. Yes, the recent standards have several mechanisms for notification. This is called ‘Fabric Performance Impact Notifications (FPIN). It includes several things such as ELS (extended link service) notifications sent through software to identify congestion, link integrity and SCSI command delivery issues. In Gen 7/64Gb platforms it also includes an in-band hardware signal for credit stall and oversubscription conditions. Today both RHEL and AIX support the receipt of FPIN link integrity notifications and integrate it into their respective MPIO interfaces allowing them to load balance/avoid a “sick but not dead” link. Additional operating systems are on the way and the first of the array vendors to support this are expected “soonish.” While there is no “silver bullet” that solves every congestion problem, FPIN as a tool which engages the whole ecosystem instead of leaving the “switch in the middle” to interpret data on its own is a huge potential benefit.
Q. There is so much good information
here. Are the slides available?
A. Yes, the session has been
recorded and is available on-demand along with the slides at the SNIA Educational Library
where you can also search for countless educational content storage.
Leave a Reply