SPIE Membership Get updates from SPIE Newsroom
  • Newsroom Home
  • Astronomy
  • Biomedical Optics & Medical Imaging
  • Defense & Security
  • Electronic Imaging & Signal Processing
  • Illumination & Displays
  • Lasers & Sources
  • Micro/Nano Lithography
  • Nanotechnology
  • Optical Design & Engineering
  • Optoelectronics & Communications
  • Remote Sensing
  • Sensing & Measurement
  • Solar & Alternative Energy
  • Sign up for Newsroom E-Alerts
  • Information for:
SPIE Photonics Europe 2018 | Register Today!

2018 SPIE Optics + Photonics | Register Today




Print PageEmail Page

Optoelectronics & Communications

Fast fault detection and localization in WDM networks

Finding and localizing faults in optical WDM networks that carry communications traffic is time-critical and must be cost-effective.
2 March 2006, SPIE Newsroom. DOI: 10.1117/2.1200601.0081

Advances in WDM technology mean that one fiber can carry 192 or more wavelengths.1,2 Data rates for each wavelength have risen from 2.5Gbps and 10 Gbps to 40Gbps.3 It is also widely believed that optical networks will eventually offer end users dynamic lightpath provisioning, in which end-to-end connections are formed on one wavelength of light.

Very short service disruptions caused by a fault in such a network may lead to a very high data loss, so it is essential to be able find and localize such faults quickly, in order to ensure reliable network operation. Numerous schemes have been proposed to improve the survivability of such networks.4 and fault detection and localization—is a vital part of such schemes—but has received disproportionate attention.

In a typical WDM network, as shown in Figure 1, a node consists of an optical switch and an electronic controller. The controller provides an interface for upper-layer protocols to manipulate the switch, and also maintains information about the network topology, wavelength occupation, and port mappings of the optical switch. Network nodes are connected by WDM links (fibers)that carry a number of optical channels. These optical channels carry user data and form the data plane. The controllers communicate with each other using dedicated electronic or optical channels, which form the control plane.

Figure 1. The architecture of wavelength-routed WDM networks.

Fault localization finds the minimum set of potential failed network resources, based on the alarms generated during fault detection. Two main approaches have been proposed so far, based either the black box and network models.5 Black-box based methods use expert systems, artificial neural networks, or other learning systems to understand the relationship between the network faults and diagnosis. Network-model methods compare the expected network behavior, based on the model, with the actual observed behavior of the network. They include probabilistic reasoning systems, finite-state-machine models, and deterministic fault-propagation models. However, the use of dynamic lightpath provisioning demands either frequent retraining for black-box methods, or dynamic updates of the network model that can cause both inaccuracy and slow fault localization.

To handle these issues, we propose an end-to-end fault-detection-and-localization protocol.6 The sender keeps sending keepalive packets in a certain pattern to the receiver along the lightpath of the user's data route. At the lightpath destination, if a certain number of consecutive keepalive packets are missed in a given time, the receiver triggers an alarm and starts a fault-recovery process. The potentially-faulty sources will lie in the common parts of the lightpaths that have alarms.

The fault notification is sent to the affected nodes to start the traffic restoration scheme: this goes only to the source node for path restoration, or to all upstream nodes along the lightpath for segment (link) restoration. A network management system collects alarms in real time and executes the fault-localization algorithm based on how alarms are distributed in the lightpaths. Figure 2 shows an example where a network fault is localized to the common parts of two lightpaths that are generating alarms.

Figure 2. Fault detection and localization using end-to-end keepalive packets.

The protocol doesn't need prior knowledge of the network topology and characterization, nor a lengthy training/learning process. It could be integrated into the destination-initiating path-restoration protocols used for WDM networks. Further the keepalive packets could be implemented by reusing the MPLS echo request/reply packet.7

If the destination assumes a fault when keepalive packets are missed, the fault detection time, TD, can be estimated as:


where τ is the time between two consecutive keepalive packets at the source and Tprop is the link propagation delay. Inequality (1) shows that the fault detection time can be reduced by decreasing τ, although a smaller τ increases the risk of false alarms and brings more overhead. If Lu is the user data rate and Lh the data rate of the keepalive packets, the protocol's overhead can be measured by the ratio of keepalive packets to the total traffic on the data plane:



The typical data rate in SDH/SONET networks is 155/622Mbps and the length of the MPLS echo request packet is 44 or 68 bytes.7 Even for τ = 0.25ms, the data rate of keepalive packets is about 2 Kbps, which is a negligble overhead on the data channel.

We've described a way of improving the reliability of optical communication networks through better fault detection and localization. Our end-to-end lightpath fault-detection scheme applies to the data plane, with fault localization and notification in the control plane. Statistical analysis shows that the fault detection time is small and the overhead for the user's data is negligible. Our next step will be to validate the approach on a testbed network and to try to to integrate it with existing recovery schemes, such as destination-initiating path-restoration protocols.

Hongqing Zeng
Optical Network Lab, Department of Systems and Computer Engineering, Carleton University
Ottawa, Ontario, Canada
Broadband Network Technologies Research Branch
Ottawa, Ontario, Canada
Mr. Hongqing Zeng received his B.Eng. and M.Sc. in electrical engineering from Huazhong University of Science and Technology (1990), and Wuhan University (1995), China, respectively. He worked for the Industrial and Commercial Bank of China from 1995 to 2000 as a network engineer. Since 2002 he has been a research engineer in Communications Research Center, Canada,and he is currently also a Ph.D. candidate at Carleton University. His research interest is in optical communication networks. He has published and presented a series of papers about fault management in optical networks at SPIE conferences such as Photonics North 2004 and 2005.
Alex Vukovic
Communications Research Centre Canada
Ottawa, Ontario, Canada
Dr. Alex Vukovic earned his M.Sc. and Ph.D. degrees from the University of Belgrade, Yugoslavia, in 1987 and 1990 respectively. He is currently at the Communications Research Centre Canada. His focus is on research leadership to verify and validate network concepts and key building blocks for next generation communication networks. His contributions involve over 60 journal and conference papers, patents, white papers, technology roadmaps and book chapters.
Changcheng Huang
Optical Network Lab, Department of Systems and Computer Engineering, Carleton University
Ottawa, Ontario, Canada
Dr. Changcheng Huang received B.Eng., M.Eng., and Ph.D. degrees in electrical engineering from Tsinghua University (1985, 1988), China, and Carleton University (1997), Canada, respectively. He worked for Nortel Networks from 1996 to 1998 and Tellabs, IL, from 1998 to 2000. Since July 2000, he has been with the Dept. of Systems and Computer Engineering at Carleton, where he is currently an associate professor. He is also currently an associate editor of IEEE Communications Letters.