
|
Merit Network Downtime, Outages and other Events
Date Prev | Date Next |
Date Index |
Thread Index |
Author Index |
Historical
TT6009 -- Network Outage Update
- From: Elwood J. Downing
- Date: Tue Aug 21 16:54:20 2007
First of all, thank you for participating in the Webcast and Teleconference
Network Outage update today. The network started to return to normal
operations around 12:00 PM, EDT, Tuesday, August 21, 2007.
Listed below is a recap of the outage we experienced today and the
reassurance that we are addressing the problem.
In the early morning hours of Tuesday, August 21, 2007, EDT, a scheduled
fiber maintenance of Level 3 fiber on the Lansing (MSU) to Detroit (WSU)
path was completed. This path was successfully tested at the switch ports
and placed back in service after the fiber maintenance. Unknown at the
time, was that the optical amplifier on this fiber segment had shutdown
during the maintenance.
At approximately 7:30 AM, EDT, Tuesday, August 21, 2007 was the first
notification that the problems with the 10Gig switch ring were reported.
The network was beginning to become sluggish and unresponsive. Network
engineers immediately started to work on the problem and found several of
the switches were hung due to overloaded management processors. At
approximately 8:30 AM the switch ports on the Lansing to Detroit path were
determined to be the problem and were shutdown. The weak signal on the
fiber path caused the switches to go into recovery cycles so fast that the
switch memory was filled and caused crashing of the switches.
Once the offending path was removed from service, the switches at Lansing
(MSU), Ann Arbor (Arbor Lakes) and Detroit (WSU) were restarted to clear
their hung condition major portions of the network started to return to
normal operation.
We had two hung switches in Grand Rapids and Kalamazoo that effectively cut
the network ring in half until Merit technicians arrived and restarts were
completed at approximately 11:50 AM. During this hung state various
network sites were degraded or inaccessible. The degradation was
experienced because too much traffic was being forwarded through the
Detroit Internet access point while Chicago was unavailable to some users.
The inaccessibility that some users experienced was due to some address
ranges that were not being announced out of Chicago.
At approximately 12:00 PM the network was fully operational with the
Lansing (MSU) to Detroit (WSU) path being held offline until the amplifier
is reset and further testing is completed.
We are taking several short-term steps including, analysis of the condition
of the amplifiers, assessing the contributing/complicating factors of
Extreme's EAPS (Ethernet Automatic Protection System), and Network
Operations procedures. Longer term actions to prevent such network troubles
in the future have also been initiated. We will inform the community
regularly of both short-term and long-term actions we will implement.
If you are continuing to experience any network performance problems please
contact Merit's Network Operations Center (NOC) immediately. We have
engineering, NOC, and support staff available to work with you on resolving
any issues you are experiencing.
We sincerely apologize for the inconvenience this outage has caused your
organization. We continuously strive to provide the highest level of
service to our Membership and regret these service issues.
If you have any questions about this network event, please contact your
support team at http://www.merit.edu/members/ for additional information
and assistance.
Sincere Regards,
--Elwood
---------------------------------------------------------------------------
Elwood J. Downing e-mail: ejd@merit.edu
Merit Network Phone: (734) 936-2040
Director of Member Services Fax: (734) 647-3185
Merit Network -- Connecting Organizations, Building Community
|
|
|