Disk server failure
MAJOR Closed General
STATUS
Closed
CREATED
Feb 22, 08:03 AM (10 years ago)
AFFECTED
General
STARTED
Feb 22, 12:39 AM (10 years ago)
CLOSED
Feb 22, 01:00 PM (10 years ago)
REFERENCE
1894 / AA1894
INFORMATION
  • INITIAL
    10 years ago by Adrian

    A disk server has failed, it is impacting all web sites we host and email. Engineers are working on this now.

  • UPDATE
    10 years ago by Adrian

    There is a major issue with one of the disk servers, and we are planning to switch to a backup, but that is likely to involve an engineer visit to the data centre.

  • UPDATE
    10 years ago by Adrian

    Engineer is on his way to the data centre now.

  • UPDATE
    10 years ago by Adrian

    This is looking more complex than expected - we have switched the secondary controller, but there are issues with one of the disk arrays as well. Engineer still on site.

  • UPDATE
    10 years ago by Adrian

    Disk array is rebuilding now. We should have email working shortly and then web pages once the disk array rebuilds.

  • UPDATE
    10 years ago by Adrian

    Web space up, and mail servers being reconnected to disk array now.

  • UPDATE
    10 years ago by Adrian

    Issues with web pages again, investigating.

  • UPDATE
    10 years ago by Adrian

    The secondary disk server is now showing problems too. We are working on it.

  • UPDATE
    10 years ago by Adrian

    This is proving to be quite a serious issue - we appear to have issues with two separate disk controllers and with some of the RAID disks and with the file system on one of the disks. This is a very odd multiple failure, especially given that all of this is monitored constantly and was not showing any issues yesterday. We do have daily backups, so if all else fails there are ways to get service restored with backups and some loss of recent emails or changes. At this stage we are working to repair the failed file systems before considering that move.

  • UPDATE
    10 years ago by Adrian

    It looks like we have the mail file store repaired and mail should be back on line shortly.

  • UPDATE
    10 years ago by Adrian

    Web pages back.

  • UPDATE
    10 years ago by Adrian

    Incoming email should now be working again.

  • UPDATE
    10 years ago by Adrian

    We are checking all mail and web servers now to confirm all is well again.

  • RESOLUTION
    10 years ago by Adrian

    Obviously this sort of multiple failure is somewhat unexpected. We do have plans for new disk servers anyway, and this type of failure will be considered as part of that system design.

  • Closed