Disk server failure -- AAISP's status page

Disk server failure

MAJOR Closed General

STATUS

Closed

CREATED

Feb 22, 08:03 AM (10 years ago)

AFFECTED

General

STARTED

Feb 22, 12:39 AM (10 years ago)

CLOSED

Feb 22, 01:00 PM (10 years ago)

REFERENCE

1894 / AA1894

PERMALINK

https://aastatus.net/1894

INFORMATION

INITIAL
10 years ago by Adrian
A disk server has failed, it is impacting all web sites we host and email. Engineers are working on this now.
UPDATE
10 years ago by Adrian
There is a major issue with one of the disk servers, and we are planning to switch to a backup, but that is likely to involve an engineer visit to the data centre.
UPDATE
10 years ago by Adrian
Engineer is on his way to the data centre now.
UPDATE
10 years ago by Adrian
This is looking more complex than expected - we have switched the secondary controller, but there are issues with one of the disk arrays as well. Engineer still on site.
UPDATE
10 years ago by Adrian
Disk array is rebuilding now. We should have email working shortly and then web pages once the disk array rebuilds.
UPDATE
10 years ago by Adrian
Web space up, and mail servers being reconnected to disk array now.
UPDATE
10 years ago by Adrian
Issues with web pages again, investigating.
UPDATE
10 years ago by Adrian
The secondary disk server is now showing problems too. We are working on it.
UPDATE
10 years ago by Adrian
This is proving to be quite a serious issue - we appear to have issues with two separate disk controllers and with some of the RAID disks and with the file system on one of the disks. This is a very odd multiple failure, especially given that all of this is monitored constantly and was not showing any issues yesterday. We do have daily backups, so if all else fails there are ways to get service restored with backups and some loss of recent emails or changes. At this stage we are working to repair the failed file systems before considering that move.
UPDATE
10 years ago by Adrian
It looks like we have the mail file store repaired and mail should be back on line shortly.
UPDATE
10 years ago by Adrian
Web pages back.
UPDATE
10 years ago by Adrian
Incoming email should now be working again.
UPDATE
10 years ago by Adrian
We are checking all mail and web servers now to confirm all is well again.
RESOLUTION
10 years ago by Adrian
Obviously this sort of multiple failure is somewhat unexpected. We do have plans for new disk servers anyway, and this type of failure will be considered as part of that system design.
Closed

Last updated: 10 years ago

Disk server failure

STATUS

CREATED

AFFECTED

STARTED

CLOSED

REFERENCE

PERMALINK

INFORMATION

INITIAL

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

UPDATE

RESOLUTION