Nonstop Forwarding Commands

A switch can be described in terms of three semi-independent functions called the forwarding plane, the control plane, and the management plane. The forwarding plane forwards data packets. The forwarding plane is implemented in hardware. The control plane is the set of protocols that determine how the forwarding plane should forward packets, deciding which data packets are allowed to be forwarded and where they should go. Application software on the management unit acts as the control plane. The management plane is application software running on the management unit that provides interfaces allowing a network administrator to configure and monitor the device.

Nonstop forwarding (NSF) allows the forwarding plane of stack units to continue to forward packets while the control and management planes restart as a result of a power failure, hardware failure, or software fault on the management unit. A nonstop forwarding failover can also be manually initiated using the initiate failover command. Traffic flows that enter and exit the stack through physical ports on a unit other than the management continue with at most sub-second interruption when the management unit fails.

To prepare the backup management unit in case of a failover, applications on the management unit continuously checkpoint some state information to the backup unit. Changes to the running configuration are automatically copied to the backup unit. MAC addresses stay the same across a nonstop forwarding failover so that neighbors do not have to relearn them.

When a nonstop forwarding failover occurs, the control plane on the backup unit starts from a partially initialized state and applies the checkpointed state information. While the control plane is initializing, the stack cannot react to external changes, such as network topology changes. Once the control plane is fully operational on the new management unit, the control plane ensures that the hardware state is updated as necessary. Control plane failover time depends on the size of the stack, the complexity of the configuration, and the speed of the CPU.

The management plane restarts when a failover occurs. Management connections must be reestablished.

For NSF to be effective, adjacent networking devices must not reroute traffic around the restarting device.

TJ2911 uses three techniques to prevent traffic from being rerouted:

A protocol may distribute a part of its control plane to stack units so that the protocol can give the appearance that it is still functional during the restart. Spanning tree and port channels use this technique.
A protocol may enlist the cooperation of its neighbors through a technique known as graceful restart. OSPF uses graceful restart if it is enabled (see OSPF Graceful Restart Commands and OSPF Graceful Restart Commands).
A protocol may simply restart after the failover if neighbors react slowly enough that they will not normally detect the outage. The IP multicast routing protocols are a good example of this behavior.

To take full advantage of nonstop forwarding, layer 2 connections to neighbors should be via port channels that span two or more stack units, and layer 3 routes should be ECMP routes with next hops via physical ports on two or more units. The hardware can quickly move traffic flows from port channel members or ECMP paths on a failed unit to a surviving unit.

nsf (Stack Global Config Mode)

This command enables nonstop forwarding feature on the stack. When nonstop forwarding is enabled, if the management unit of a stack fails, the backup unit takes over as the master without clearing the hardware tables of any of the surviving units. Data traffic continues to be forwarded in hardware while the management functions initialize on the backup unit.

NSF is enabled by default on platforms that support it. The administrator may wish to disable NSF in order to redirect the CPU resources consumed by data checkpointing.

If a unit that does not support NSF is connected to the stack, then NSF is disabled on all stack members. When a unit that does not support NSF is disconnected from the stack and all other units support NSF, and NSF is administratively enabled, then NSF operation resumes.

Default: enabled
Format: nsf
Mode: Stack Global Config Mode

no nsf

This command disables NSF on the stack.

Format: no nsf
Mode: Stack Global Config Mode

show nsf

This command displays global and per-unit information on NSF configuration on the stack.

Format: show nsf
Mode: Privileged Exec
NSF Administrative Status: Whether nonstop forwarding is administratively enabled or disabled. Default: Enabled
NSF Operational Status: Indicates whether NSF is enabled on the stack.
Last Startup Reason: The type of activation that caused the software to start the last time:
- "Power-On" means that the switch rebooted. This could have been caused by a power cycle or an administrative "Reload" command.
- "Administrative Move" means that the administrator issued the movemanagement command for the stand-by manager to take over.
- "Warm-Auto-Restart" means that the primary management card restarted due to a failure, and the system executed a nonstop forwarding failover.
- "Cold-Auto-Restart" means that the system switched from the active manager to the backup manager and was unable to maintain user data traffic. This is usually caused by multiple failures occurring close together.
Time Since Last Restart: Time since the current management unit became the active management unit. Restart in progress Whether a restart is in progress.
Warm Restart: Ready Whether the system is ready to perform a nonstop forwarding failover from the management unit to the backup unit.
Copy of Running Configuration to Backup Unit: Status Whether the running configuration on the backup unit includes all changes made on the management unit. Displays as Current or Stale.
Time Since Last Copy: When the running configuration was last copied from the management unit to the backup unit.

initiate failover

This command forces the backup unit to take over as the management unit and perform a "warm restart" of the stack. On a warm restart, the backup unit becomes the management unit without clearing its hardware tables (on a cold restart, hardware tables are cleared). Applications apply checkpointed data from the former management unit. The original management unit reboots.

If the system is not ready for a warm restart, for example because no backup unit has been elected or one or more members of the stack do not support nonstop forwarding, the command fails with a warning message.

The movemanagement command also transfers control from the current management unit; however, the hardware is cleared and all units reinitialize.

Format: initiate failover
Mode: Stack Global Config Mode

show checkpoint statistics

This command displays general information about the checkpoint service operation.

Format: show checkpoint statistics

Mode: Privileged Exec

Messages Checkpointed: Number of checkpoint messages transmitted to the backup unit. Range: Integer. Default: 0

Bytes Checkpointed: Number of bytes transmitted to the backup unit. Range: Integer. Default: 0

Time Since Counters Cleared: Number of days, hours, minutes and seconds since the counters were reset to zero. The counters are cleared when a unit becomes manager and with a support command. Range: Time Stamp. Default: 0d00:00:00

Checkpoint Message Rate: Average number of checkpoint messages per second. The average is computed over the time period since the counters were cleared. Range: Integer. Default: 0

Last 10-second Message Rate: Average number of checkpoint messages per second in the last 10-second interval. This average is updated once every 10 seconds. Range: Integer. Default: 0

Highest 10-second Message Rate: The highest rate recorded over a 10-second interval since the counters were cleared. Range: Integer. Default: 0

clear checkpoint statistics

This command clears all checkpoint statistics to their initial values.

Format: clear checkpoint statistics
Mode: Privileged Exec