The Evolution of Congestion Management in Fibre Channel

Erik Smith

Jun 20, 2024

title of post
The Fibre Channel (FC) industry introduced Fabric Notifications as a key resiliency mechanism for storage networks in 2021 to combat congestion, link integrity, and delivery errors. Since then, numerous manufacturers of FC SAN solutions have implemented Fabric Notifications and enhanced the overall user experience when deploying FC SANs. On August 27, 2024, the SNIA Data, Networking & Storage Forum is hosting a live webinar, “The Evolution of Congestion Management in Fibre Channel,” for a deep dive into Fibre Channel congestion management. We’ve convened a stellar, multi-vendor group of Fibre Channel experts with extensive Fibre Channel knowedge and different technology viewpoints to explore the evolution of Fabric Notifications and the available solutions of this exciting new technology. You’ll learn:
  • The state of Fabric Notifications as defined by the Fibre Channel standards.
  • The mechanisms and techniques for implementing Fabric Notifications.
  • The currently available solutions deploying Fabric Notifications.
Register today and bring your questions for our experts. We hope you’ll join us on August 27th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

The Evolution of Congestion Management in Fibre Channel

Erik Smith

Jun 20, 2024

title of post
The Fibre Channel (FC) industry introduced Fabric Notifications as a key resiliency mechanism for storage networks in 2021 to combat congestion, link integrity, and delivery errors. Since then, numerous manufacturers of FC SAN solutions have implemented Fabric Notifications and enhanced the overall user experience when deploying FC SANs. On August 27, 2024, the SNIA Data, Networking & Storage Forum is hosting a live webinar, “The Evolution of Congestion Management in Fibre Channel,” for a deep dive into Fibre Channel congestion management. We’ve convened a stellar, multi-vendor group of Fibre Channel experts with extensive Fibre Channel knowedge and different technology viewpoints to explore the evolution of Fabric Notifications and the available solutions of this exciting new technology. You’ll learn:
  • The state of Fabric Notifications as defined by the Fibre Channel standards.
  • The mechanisms and techniques for implementing Fabric Notifications.
  • The currently available solutions deploying Fabric Notifications.
Register today and bring your questions for our experts. We hope you’ll join us on August 27th. The post The Evolution of Congestion Management in Fibre Channel first appeared on SNIA on Data, Networking & Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Fibre Channel SAN Hosts and Targets Q&A

John Kim

Oct 25, 2021

title of post

At our recent SNIA Networking Storage Forum (NSF) webcast “How Fibre Channel Hosts and Targets Really Communicate” our Fibre Channel (FC) experts explained exactly how Fibre Channel works, starting with the basics on the FC networking stack, link initialization, port types, and flow control, and then dove into the details on host/target logins and host/target IO. It was a great tutorial on Fibre Channel. If you missed it, you can view it on-demand. The audience asked several questions during the live event. Here are answers to them all:

Q. What is the most common problem that we face in the FC protocol?

A. Much the same as any other network protocol, congestion is the most common problem found in FC SANs. It can take a couple of forms including, but not limited to, host oversubscription and “Fan-in/Fan-out" ratios of host ports to storage ports, but it is probably the single largest generator of support cases. Another common problem is the 'Host cannot see target' kind of problem.  

Q. What are typical latencies for N-to-N (node-to-node) Port and N-F-N (one switch between)?

A. Latencies vary from switch type to switch type and also vary based on the type of forwarding that is done. Port to port on a switch, I would say is from 1us to 5us in general.

Q. Has the Fabric Shortest Path First (FSPF) always been there or is there a minimum FC speed at which it was introduced in? Also, how is the FSPF determined?  Is it via shortest path only or does it also take into account for speeds of the switches along the path?

A. While Fibre Channel has existed since 1993 at 133Mbit speed, FSPF was developed by the INCITS T11 Technical Committee and was published in 2000 as a cost-based Link State routing protocol. Costs are based on link speeds. The higher the link speed, the lower the cost. The cost is 1012 / Bandwidth(bps) = cost. There have been variations of implementations that allowed the network administrator to artificially set a link cost and force traffic into a path, but the better case is to simply allow FSPF to do its normal work. And yes, the link costs are considered for all of the intermediate devices along the path.

Q. All of this FSPF happens without even us noticing, right? Or do we need to manually configure?

A. Yes, all of the FSPF routing happens without any manual configuration. Most users don't even realize there is an underlying routing protocol. 

Q. Is it a best practice to have all ports in the system run at the same speed? We have storage connected at 32Gb interfaces and a hundred clients with 16Gb interfaces. Would this make the switch's job easier?

A. It's virtually impossible to have all ports of a FC SAN (or any network of size) connect at the same speed. In fact, the more common environment is for multiple versions of server and storage technology to have been “organically grown over time” in the datacenter. Even if that was somehow done, then there still can be congestion caused by hosts and targets requesting data from multiple simultaneous sources. So, having a uniform speed doesn't really fix anything even if it might make some things a bit better. That said, it is always helpful to make certain that your HBA device drivers and firmware versions are up to date.

Q. From your experience, is there any place where the IO has gone wrong?

A. Not sure what 'IO gone wrong' means. All frames that transverse the SAN are cyclic redundancy check (CRC) checked. That might happen on each hop or it might just happen at the end devices. But frames that are found to be corrupted should never be incorporated into the LUN.

Q. Is there a fabric notification feature for these backpressure events?

A. Yes, the recent standards have several mechanisms for notification. This is called 'Fabric Performance Impact Notifications (FPIN). It includes several things such as ELS (extended link service) notifications sent through software to identify congestion, link integrity and SCSI command delivery issues. In Gen 7/64Gb platforms it also includes an in-band hardware signal for credit stall and oversubscription conditions. Today both RHEL and AIX support the receipt of FPIN link integrity notifications and integrate it into their respective MPIO interfaces allowing them to load balance/avoid a “sick but not dead” link. Additional operating systems are on the way and the first of the array vendors to support this are expected “soonish.” While there is no "silver bullet" that solves every congestion problem, FPIN as a tool which engages the whole ecosystem instead of leaving the "switch in the middle" to interpret data on its own is a huge potential benefit.

Q. There is so much good information here. Are the slides available?

A. Yes, the session has been recorded and is available on-demand along with the slides at the SNIA Educational Library where you can also search for countless educational content storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Fibre Channel SAN Hosts and Targets Q&A

John Kim

Oct 25, 2021

title of post
At our recent SNIA Networking Storage Forum (NSF) webcast “How Fibre Channel Hosts and Targets Really Communicate” our Fibre Channel (FC) experts explained exactly how Fibre Channel works, starting with the basics on the FC networking stack, link initialization, port types, and flow control, and then dove into the details on host/target logins and host/target IO. It was a great tutorial on Fibre Channel. If you missed it, you can view it on-demand. The audience asked several questions during the live event. Here are answers to them all: Q. What is the most common problem that we face in the FC protocol? A. Much the same as any other network protocol, congestion is the most common problem found in FC SANs. It can take a couple of forms including, but not limited to, host oversubscription and “Fan-in/Fan-out” ratios of host ports to storage ports, but it is probably the single largest generator of support cases. Another common problem is the ‘Host cannot see target’ kind of problem. Q. What are typical latencies for N-to-N (node-to-node) Port and N-F-N (one switch between)? A. Latencies vary from switch type to switch type and also vary based on the type of forwarding that is done. Port to port on a switch, I would say is from 1us to 5us in general. Q. Has the Fabric Shortest Path First (FSPF) always been there or is there a minimum FC speed at which it was introduced in? Also, how is the FSPF determined?  Is it via shortest path only or does it also take into account for speeds of the switches along the path? A. While Fibre Channel has existed since 1993 at 133Mbit speed, FSPF was developed by the INCITS T11 Technical Committee and was published in 2000 as a cost-based Link State routing protocol. Costs are based on link speeds. The higher the link speed, the lower the cost. The cost is 1012 / Bandwidth(bps) = cost. There have been variations of implementations that allowed the network administrator to artificially set a link cost and force traffic into a path, but the better case is to simply allow FSPF to do its normal work. And yes, the link costs are considered for all of the intermediate devices along the path. Q. All of this FSPF happens without even us noticing, right? Or do we need to manually configure? A. Yes, all of the FSPF routing happens without any manual configuration. Most users don’t even realize there is an underlying routing protocol. Q. Is it a best practice to have all ports in the system run at the same speed? We have storage connected at 32Gb interfaces and a hundred clients with 16Gb interfaces. Would this make the switch’s job easier? A. It’s virtually impossible to have all ports of a FC SAN (or any network of size) connect at the same speed. In fact, the more common environment is for multiple versions of server and storage technology to have been “organically grown over time” in the datacenter. Even if that was somehow done, then there still can be congestion caused by hosts and targets requesting data from multiple simultaneous sources. So, having a uniform speed doesn’t really fix anything even if it might make some things a bit better. That said, it is always helpful to make certain that your HBA device drivers and firmware versions are up to date. Q. From your experience, is there any place where the IO has gone wrong? A. Not sure what ‘IO gone wrong’ means. All frames that transverse the SAN are cyclic redundancy check (CRC) checked. That might happen on each hop or it might just happen at the end devices. But frames that are found to be corrupted should never be incorporated into the LUN. Q. Is there a fabric notification feature for these backpressure events? A. Yes, the recent standards have several mechanisms for notification. This is called ‘Fabric Performance Impact Notifications (FPIN). It includes several things such as ELS (extended link service) notifications sent through software to identify congestion, link integrity and SCSI command delivery issues. In Gen 7/64Gb platforms it also includes an in-band hardware signal for credit stall and oversubscription conditions. Today both RHEL and AIX support the receipt of FPIN link integrity notifications and integrate it into their respective MPIO interfaces allowing them to load balance/avoid a “sick but not dead” link. Additional operating systems are on the way and the first of the array vendors to support this are expected “soonish.” While there is no “silver bullet” that solves every congestion problem, FPIN as a tool which engages the whole ecosystem instead of leaving the “switch in the middle” to interpret data on its own is a huge potential benefit. Q. There is so much good information here. Are the slides available? A. Yes, the session has been recorded and is available on-demand along with the slides at the SNIA Educational Library where you can also search for countless educational content storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Demystifying the Fibre Channel SAN Protocol

John Kim

Sep 10, 2021

title of post

Every wonder how Fibre Channel (FC) hosts and targets really communicate? Join the SNIA Networking Storage Forum (NSF) on September 23, 2021 for a live webcast, “How Fibre Channel Hosts and Targets Really Communicate.” This SAN overview will dive into details on how initiators (hosts) and targets (storage arrays) communicate and will address key questions, like:

  • How do FC links activate?
  • Is FC routable?
  • What kind of flow control is present in FC?
  • How do initiators find targets and set up their communication?
  • Finally, how does actual data get transferred between initiators and hosts, since that is the ultimate goal?

Each SAN transport has its own way to initialize and transfer data. This is an opportunity to learn how it works in the Fibre Channel world. Storage experts will introduce the concepts and demystify the inner workings of FC SAN. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Demystifying the Fibre Channel SAN Protocol

John Kim

Sep 10, 2021

title of post
Every wonder how Fibre Channel (FC) hosts and targets really communicate? Join the SNIA Networking Storage Forum (NSF) on September 23, 2021 for a live webcast, “How Fibre Channel Hosts and Targets Really Communicate.” This SAN overview will dive into details on how initiators (hosts) and targets (storage arrays) communicate and will address key questions, like:
  • How do FC links activate?
  • Is FC routable?
  • What kind of flow control is present in FC?
  • How do initiators find targets and set up their communication?
  • Finally, how does actual data get transferred between initiators and hosts, since that is the ultimate goal?
Each SAN transport has its own way to initialize and transfer data. This is an opportunity to learn how it works in the Fibre Channel world. Storage experts will introduce the concepts and demystify the inner workings of FC SAN. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Beyond NVMe-oF Performance Hero Numbers

Erik Smith

Jan 28, 2021

title of post

When it comes to selecting the right NVMe over Fabrics™ (NVMe-oF™) solution, one should look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency and consider the other, more important, questions such as “How does the transport really impact application performance?” and “How does the transport holistically fit into my environment?”

To date, the focus has been on specialized fabrics like RDMA (e.g., RoCE) because it provides the lowest possible latency, as well as Fibre Channel because it is generally considered to be the most reliable.  However, with the introduction of NVMe-oF/TCP this conversation must be expanded to also include considerations regarding scale, cost, and operations. That’s why the SNIA Networking Storage Forum (NSF) is hosting a webcast series that will dive into answering these questions beyond the standard answer “it depends.”

The first in this series will be on March 25, 2021 “NVMe-oF: Looking Beyond Performance Hero Numbers” where SNIA experts with deep NVMe and fabric technology expertise will discuss the thought process you can use to determine pros and cons of a fabric for your environment, including:

  • Use cases driving fabric choices  
  • NVMe transports and their strengths
  • Industry dynamics driving adoption
  • Considerations for scale, security, and efficiency

Future webcasts will dive deeper and cover operating and managing NVMe-oF, discovery automation, and securing NVMe-oF. I hope you will register today. Our expert panel will be available on March 25th to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Beyond NVMe-oF Performance Hero Numbers

Erik Smith

Jan 28, 2021

title of post
When it comes to selecting the right NVMe over Fabrics™ (NVMe-oF™) solution, one should look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency and consider the other, more important, questions such as “How does the transport really impact application performance?” and “How does the transport holistically fit into my environment?” To date, the focus has been on specialized fabrics like RDMA (e.g., RoCE) because it provides the lowest possible latency, as well as Fibre Channel because it is generally considered to be the most reliable. However, with the introduction of NVMe-oF/TCP this conversation must be expanded to also include considerations regarding scale, cost, and operations. That’s why the SNIA Networking Storage Forum (NSF) is hosting a webcast series that will dive into answering these questions beyond the standard answer “it depends.” The first in this series will be on March 25, 2021 “NVMe-oF: Looking Beyond Performance Hero Numbers” where SNIA experts with deep NVMe and fabric technology expertise will discuss the thought process you can use to determine pros and cons of a fabric for your environment, including:
  • Use cases driving fabric choices
  • NVMe transports and their strengths
  • Industry dynamics driving adoption
  • Considerations for scale, security, and efficiency
Future webcasts will dive deeper and cover operating and managing NVMe-oF, discovery automation, and securing NVMe-oF. I hope you will register today. Our expert panel will be available on March 25th to answer your questions.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Notable Questions on NVMe-oF 1.1

Tim Lustig

Jul 14, 2020

title of post

At our recent SNIA Networking Storage Forum (NSF) webcast, Notable Updates in NVMe-oF™ 1.1we explored the latest features of NVMe over Fabrics (NVMe-oF), discussing what's new in the NVMe-oF 1.1 release, support for CMB and PMR, managing and provisioning NVMe-oF devices with SNIA Swordfish™, and FC-NVMe-2. If you missed the live event, you can watch it here. Our presenters received many interesting questions on NVMe-oF and here are answers to them all:

Q. Is there an implementation of NVMe-oF with direct CMB access?

A. The Controller Memory Buffer (CMB) was introduced in NVMe 1.2 and first supported in the NVMe-oF 1.0 specification. It's supported if the storage vendor has implemented this within the hardware and the network supports it. We recommend that you ask your favorite vendor if they support the feature.

Q. What is the different between PMR in an NVMe device and the persistent memory in general?

A. The Persistent Memory Region (PMR) is a region within the SSD controller and it is reserved for system level persistent memory that is exposed to the host. Just like a Controller Memory Buffer (CMB), the PMR may be used to store command data, but because it's persistent it allows the content to remain even after power cycles and resets. To go further into this answer would require a follow up webinar.

Q. Are any special actions required on the host side over Controller Memory Buffers to maintain the data consistency?

A. To prevent possible disruption and to maintain data consistency, first the control address range must be configured so that addresses will not overlap, as described in the latest specification. There is also a flush command so that persistent memory can be cleared, (also described in the specification).

Q. Is there a field to know the size of CMB and PMR supported by controller? What is the general size of CMR in current devices?

A. The general size of CMB/PMR is vendor-specific but there is a size register field in both that is defined in the specification by the size register.

Q. Does having PMR guarantee that write requests to the PMR region have been committed to media, even though they have not been acknowledged before the power fail? Is there a max time limit in spec, within which NVMe drive should recover after power fail?

A. The implementation must ensure that the previous write has completed and that it is persistent. Time limit is vendor-specific.

Q. What is the average latency of an unladen swallow using NVMe-oF 1.1?

A. Average latency will depend on the media, the network and the way the devices are implemented. It also depends on whether or not the swallow is African or European (African swallows are non-migratory).

Q. Doesn't RDMA provide an 'implicit' queue on the controller side (negating the need for CMB for queues). Can the CMB also be used for data?

A. Yes, the CMB can be used to hold both commands and command data and the queues are managed by RDMA within host memory or within the adapter. By having the queue in the CMB you can gain performance advantages.

Q. What is a ballpark latency difference number between CMB and PMR access, can you provide a number based on assumption that both of these are accessed over RDMA fabric?

A. When using CMB, latency goes down but there are no specific latency numbers available as of this writing.

Q. What is the performance of NVMe/TCP in terms of IOPS as compared to NVMe/RDMA? (Good implementation assumed)

A. This is heavily implementation dependent as the network adapter may provide offloads for TCP. NVMe/RDMA generally will have lower latency.

Q. If there are several sequence-level errors, how can we correct the errors in an appropriate order?

Q. How could we control the right order for the error corrections in FC-NVMe-2?

  1. These two questions are related and the response below is applicable to both questions.

As mentioned in the presentation, Sequence-level error recovery provides the ability to detect and recover from lost commands, lost data, and lost status responses. For Fibre Channel, a Sequence consists of one or more frames: e.g., a Sequence containing a NVMe command, a Sequence containing data, or a Sequence containing a status response. 

The order for error correction is based on information returned from the target on the given state of the Exchange compared to the state of the Exchange at the initiator. To do this, and from a high-level overview, upon sending an Exchange containing an NVMe command, a timer is started at the initiator. The default value for this timer is 2 seconds, and if a response is not received for the Exchange before the timer expires, a message is sent from the initiator to the target to determine the status of the Exchange.

Also, a response from the target may be received before the information on the Exchange is obtained from the target. If this occurs the command just continues on as normal, the timer is restarted if the Exchange is still in progress, and all is good. Otherwise, if no response from the target has been received since sending the Exchange information message, then one of two actions usually take place:

a) If the information returned from the target indicates the Exchange is not known, then the Exchange resources are cleaned up and released, and the Exchange containing the NVMe command is re-transmitted; or

b) If the information returned from the target indicates the Exchange is known and the target is still working on the command, then no error recovery is needed; the timer is restarted, and the initiator continues to wait for a response from the target.

An example of this behavior is a format command, where it may take a while for the command to complete, and the status response to be sent.

For some other typical information returned from the target per the Exchange status query:

  1. If the information returned from the target indicates the Exchange is known, and a ready to receive data message was sent by the target (e.g., a write operation), then the initiator requests the target to re-transmit the ready-to-receive data message, and the write operation continues at the transport level;
  2. If the information returned from the target indicates the Exchange is known, and data was sent by the target (e.g., a read operation), then the initiator requests the target to re-transmit the data and the status response, and the read operation continues at the transport level; and
  3. If the information returned from the target indicates the Exchange is known, and the status response was sent by the target, then the initiator requests the target to re-transmit the status response and the command completes accordingly at the transport level.

For further information, detailed informative Sequence level error recovery diagrams are provided in Annex E of the FC-NVMe-2 standard available via INCITS. 

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Notable Questions on NVMe-oF 1.1

Tim Lustig

Jul 14, 2020

title of post
At our recent SNIA Networking Storage Forum (NSF) webcast, “Notable Updates in NVMe-oF™ 1.1” we explored the latest features of NVMe over Fabrics (NVMe-oF), discussing what’s new in the NVMe-oF 1.1 release, support for CMB and PMR, managing and provisioning NVMe-oF devices with SNIA Swordfish™, and FC-NVMe-2. If you missed the live event, you can watch it here. Our presenters received many interesting questions on NVMe-oF and here are answers to them all: Q. Is there an implementation of NVMe-oF with direct CMB access? A. The Controller Memory Buffer (CMB) was introduced in NVMe 1.2 and first supported in the NVMe-oF 1.0 specification. It’s supported if the storage vendor has implemented this within the hardware and the network supports it. We recommend that you ask your favorite vendor if they support the feature. Q. What is the different between PMR in an NVMe device and the persistent memory in general? A. The Persistent Memory Region (PMR) is a region within the SSD controller and it is reserved for system level persistent memory that is exposed to the host. Just like a Controller Memory Buffer (CMB), the PMR may be used to store command data, but because it’s persistent it allows the content to remain even after power cycles and resets. To go further into this answer would require a follow up webinar. Q. Are any special actions required on the host side over Controller Memory Buffers to maintain the data consistency? A. To prevent possible disruption and to maintain data consistency, first the control address range must be configured so that addresses will not overlap, as described in the latest specification. There is also a flush command so that persistent memory can be cleared, (also described in the specification). Q. Is there a field to know the size of CMB and PMR supported by controller? What is the general size of CMR in current devices? A. The general size of CMB/PMR is vendor-specific but there is a size register field in both that is defined in the specification by the size register. Q. Does having PMR guarantee that write requests to the PMR region have been committed to media, even though they have not been acknowledged before the power fail? Is there a max time limit in spec, within which NVMe drive should recover after power fail? A. The implementation must ensure that the previous write has completed and that it is persistent. Time limit is vendor-specific. Q. What is the average latency of an unladen swallow using NVMe-oF 1.1? A. Average latency will depend on the media, the network and the way the devices are implemented. It also depends on whether or not the swallow is African or European (African swallows are non-migratory). Q. Doesn’t RDMA provide an ‘implicit’ queue on the controller side (negating the need for CMB for queues). Can the CMB also be used for data? A. Yes, the CMB can be used to hold both commands and command data and the queues are managed by RDMA within host memory or within the adapter. By having the queue in the CMB you can gain performance advantages. Q. What is a ballpark latency difference number between CMB and PMR access, can you provide a number based on assumption that both of these are accessed over RDMA fabric? A. When using CMB, latency goes down but there are no specific latency numbers available as of this writing. Q. What is the performance of NVMe/TCP in terms of IOPS as compared to NVMe/RDMA? (Good implementation assumed) A. This is heavily implementation dependent as the network adapter may provide offloads for TCP. NVMe/RDMA generally will have lower latency. Q. If there are several sequence-level errors, how can we correct the errors in an appropriate order? Q. How could we control the right order for the error corrections in FC-NVMe-2?
  1. These two questions are related and the response below is applicable to both questions.
As mentioned in the presentation, Sequence-level error recovery provides the ability to detect and recover from lost commands, lost data, and lost status responses. For Fibre Channel, a Sequence consists of one or more frames: e.g., a Sequence containing a NVMe command, a Sequence containing data, or a Sequence containing a status response. The order for error correction is based on information returned from the target on the given state of the Exchange compared to the state of the Exchange at the initiator. To do this, and from a high-level overview, upon sending an Exchange containing an NVMe command, a timer is started at the initiator. The default value for this timer is 2 seconds, and if a response is not received for the Exchange before the timer expires, a message is sent from the initiator to the target to determine the status of the Exchange. Also, a response from the target may be received before the information on the Exchange is obtained from the target. If this occurs the command just continues on as normal, the timer is restarted if the Exchange is still in progress, and all is good. Otherwise, if no response from the target has been received since sending the Exchange information message, then one of two actions usually take place: a) If the information returned from the target indicates the Exchange is not known, then the Exchange resources are cleaned up and released, and the Exchange containing the NVMe command is re-transmitted; or b) If the information returned from the target indicates the Exchange is known and the target is still working on the command, then no error recovery is needed; the timer is restarted, and the initiator continues to wait for a response from the target. An example of this behavior is a format command, where it may take a while for the command to complete, and the status response to be sent. For some other typical information returned from the target per the Exchange status query:
  1. If the information returned from the target indicates the Exchange is known, and a ready to receive data message was sent by the target (e.g., a write operation), then the initiator requests the target to re-transmit the ready-to-receive data message, and the write operation continues at the transport level;
  2. If the information returned from the target indicates the Exchange is known, and data was sent by the target (e.g., a read operation), then the initiator requests the target to re-transmit the data and the status response, and the read operation continues at the transport level; and
  3. If the information returned from the target indicates the Exchange is known, and the status response was sent by the target, then the initiator requests the target to re­transmit the status response and the command completes accordingly at the transport level.
For further information, detailed informative Sequence level error recovery diagrams are provided in Annex E of the FC-NVMe-2 standard available via INCITS.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Fibre Channel