Merit Network
Can't find what you're looking for? Search the Mail Archives.
  About Merit   Services   Network   Resources & Support   Network Research   News   Events   Home

Discussion Communities: Merit Network Email List Archives

North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: HE.net, Fremont-2 outage?

  • From: Valdis.Kletnieks
  • Date: Wed Nov 04 21:58:15 2009

On Wed, 04 Nov 2009 12:26:15 CST, Joe Greco said:

> With power:
> 
> N+1 is usually better than N
> Best to assume full load when doing math
> Things will go wrong, predict common failures

And uncommon ones. :)

So as part of a major compute-cluster install, we upgraded our UPS and diesel
generator one weekend, and breathed a collective sigh of relief that we were
now safe from power outages and mostly dodged a bullet. We *did* have some
scary moments when we discovered that (a) of the 400 or so disks on our Sun
E10K, about 10 didn't spin up again and (b) several of the boot disks on said
box weren't mirrored.  Fortunately, none of the 10 fails were on a non-mirrored
disk.  By Tuesday, all the non-mirrored boot disks were in fact mirrored.

That Friday, a bozo contractor relocating a doorway managed to set off the
Halon. Only lost two disks on the E10K.  Guess which two? ;)

And a month later, we discovered that the nice shiny new automatic cutover
switch was wired in backwards, necessitating another power outage to re-wire it
correctly.

So much for safe from power outages... :)

Attachment: pgp00071.pgp
Description: PGP signature




Discussion Communities


About Merit | Services | Network | Resources & Support | Network Research
News | Events | Contact | Site Map | Merit Network Home


Merit Network, Inc.