goto UNSW  home page
CONTACTS
Notices
 IT Notices forums
 Security: AL-2005.0043 [Win]
 Magellan disk corruption
 Power shutdown July 6 2005
 Magellan new disk rollout
 Newt RAID5 rollout
 Telnet and FTP service
 Mimail.D virus
 Sobig.E virus
 Newt replacement
 Bugbear worm
 W32/klez
 Nimda worm

Help
 Physics Mail
 Secure Shell
 Cygwin X11 Server
 Paper guidelines
 Helpdesk
 Request help
 Contact us

Downloads
 Software
 Account application PDF
 Useful mirrors

Network connection
 Network Access Request
 Network settings

Info exchange
 Physics e-mail lists
 School forums
 Computing forum

Documentation
 Workstation Guide HTML/PDF
 PDF Scanning
 Computing Facilities
 Workstation Software
 UN*X Security Guide
 Multimedia Facility
 CD creation quick guide
 OCR quick guide
 C Language Course Notes
 DEC F77 guide
 Proxy information

Quicklinks
 Physics IT Support
 School of Physics
 Linux links
 AARnet Mirror
 Web design
 Web statistics

Magellan disk crash

Magellan disk crash, loss of /var and /var/imap/user

An air-conditioning unit failure prompted a hard disk crash on Magellan. The crash was of the "catastrophic" variety which mandated a system rebuild from backups. Physics IT Support staff attended the scene and by midnight Thursday all Internet services on Magellan except mail were operational. All mailboxes were rebuilt by 7am Friday whereupon all services on Magellan had returned to normalcy. Our apologies for the downtime. On Friday morning Magellan was a little busier than usual, so those people using Webmail likely noticed temporary performance degradation. As is normal, some mail clients did not handle the mailbox rebuild so well (reporting a couple of duplicate messages in some mailboxes, or incorrectly reporting Read/Unread states) while other mail clients were unaffected.

Recovery process

The job took a few hours longer than it could have, as we found we were able to recover the mailbox state to within minutes of the hard disk crash/corruption (up until midday Thursday 27, when Magellan had to be shut down). This recovery was achieved by doing a non-destructive media analysis on the crashed HDD, and overlaying recent mailbox state changes over the data we restored from tape.

Risk management

We are addressing risk management issues more thoroughly year by year (as funding and time permit). In line with this we have installed an ATA hard disk in Magellan so we can setup a mirror arrangement to prevent such prolonged downtime in the future; mirroring using the former SCSI hard disk (ie.$3.5K per unit of which we require two) is prohibitively expensive.

Would you like to know more?

Contact Physics IT Support at help@phys.unsw.edu.au if you'd like further details.

  CRICOS Provider Code - 00098G Disclaimer
School of Physics - The University of New South Wales - Sydney Australia 2052
Site comments physicsweb@phys.unsw.edu.au © School of Physics UNSW