Order posts by limited to posts

Yesterday 11:09:46
Details
9 Dec 11:20:04
Some lines on the LOWER HOLLOWAY exchange are experiencing peak time packet loss. We have reported this to BT and they are investigating the issue.
Update
11 Dec 10:46:42
BT have passed this to TSO for investigation. We are waiting for a further update.
Update
12 Dec 14:23:56
BT's Tso are currently investigating the issue.
Update
16 Dec 12:07:31
Other ISPs are seeing the same problem. The BT Capacity team are now looking in to this.
Update
Wednesday 16:21:04
No update to report yet, we're still chasing BT...
Update
Yesterday 11:09:46
The latest update from this morning is: "The BT capacity team have investigated and confirmed that the port is not being over utilized, tech services have been engaged and are currently investigating from their side."
Update expected Today 14:00:00
Expected close Today 15:14:17 (Estimated Resolution Time from AAISP)

3 Dec
Details
3 Dec 08:26:55
Having sorted the BGP issue this week we will be rolling out upgrades over the next few days with LNS upgrades over night, and some router upgrades early in the mornings.
Started 3 Dec
Previously expected 8 Dec

18 Aug 10:00:00
Details
18 Aug 10:48:39

Our legacy 'C' VoIP platform will be removed from service on March 2nd 2015.

This platform is now old, tired and we have a better VoIP platform: our FireBrick-based 'Voiceless' platform.

We have created a wiki page with details for customers needing to move platforms: http://wiki.aa.org.uk/VoIP_-_Moving_Platform

We will be contacting customers individually by email later in the year, but we'd recommend that customers start moving now. The wiki page above explains how to move, and in most cases it is simply changing the server details in your VoIP device. Please do contact Support for help though.

Started 18 Aug 10:00:00 by AAISP Staff
Update expected 02 Mar 2015 11:00:00
Expected close 02 Mar 2015 10:00:00

29 Jul 11:42:12
Details
17 Jul 10:08:44
Our email services can learn spam/non-spam messages. This feature is currently down for maintenance as we work on the back-end systems. This means that if you move email in to the various 'learn' folders they will stay there and will not be processed at the moment. For the moment, we advise customers not to use this feature. Will will post updates in the next week or so as we may well be changing how this feature works. This should not affect any spam scores etc, but do contact support if needed.
Update
29 Jul 11:42:12
This project is still ongoing. This should not be causing too many problems though, as the spam checking system has many many other ways to determine if a message is spam or not. However, for now, if customers have email that is miss-classified by the spam checking system then please email the headers in to support and we can make some suggestions.
Started 17 Jul 10:00:00

3 Jun 17:00:00
Details
3 Jun 18:20:39
The router upgrades went well, and now there is a new factory release we'll be doing some rolling upgrades over the next few days. Should be minimal disruption.
Update
3 Jun 18:47:21
First batch of updates done.
Started 3 Jun 17:00:00
Previously expected 7 Jun

14 Apr
Details
13 Apr 17:29:53
We handle SMS, both outgoing from customers, and incoming via various carriers, and we are now linking in once again to SMS with mobile voice SIM cards. The original code for this is getting a tad worn out, so we are working on a new system. It will have ingress gateways for the various ways SMS can arrive at us, core SMS routing, and then output gateways for the ways we can send on SMS. The plan is to convert all SMS to/from standard GSM 03.40 TPDUs. This is a tad technical I know, but it will mean that we have a common format internally. This will not be easy as there are a lot of character set conversion issues, and multiple TPDUs where concatenation of texts is used. The upshot for us is a more consistent and maintainable platform. The benefit for customers is more ways to submit and receive text messages, including using 17094009 to make an ETSI in-band modem text call from suitable equipment (we think gigasets do this). It also means customers will be able to send/receive texts in a raw GSM 03.40 TPDU format, which will be of use to some customers. It also makes it easier for us to add other formats later. There will be some changes to the existing interfaces over time, but we want to keep these to a minimum, obviously.
Update
21 Apr 16:27:23

Work is going well on this, and we hope to switch Mobile Originated texting (i.e. texts from the SIP2SIM) over to the new system this week. If that goes to plan we can move some of the other ingress texting over to the new system one by one.

We'll be updating documentation at the same time.

The new system should be a lot more maintainable. We have a number of open tickets with the mobile carrier and other operators to try and improve the functionality of texting to/from us. These cover things like correct handling of multi-part texts, and correct character set coding.

The plan is ultimately to have full UTF-8 unicode support on all texts, but that could take a while. It seems telcos like to mess with things rather than giving us a clean GSM TPDU for texts. All good fun.

Update
22 Apr 08:51:09
We have updated the web site documentation on this to the new system, but this is not fully in use yet. Hopefully this week we have it all switched over. Right now we have removed some features from documenation (such as delivery reports), but we plan to have these re-instated soon once we have the new system handling them sensibly.
Update
22 Apr 09:50:44
MO texts from SIP2SIM are now using the new system - please let support know of any issues.
Update
22 Apr 12:32:07
Texts from Three are now working to ALL of our 01, 02, and 03 numbers. These are delivered by email, http, or direct to SIP2SIM depending on the configuration on our control pages.
Update
23 Apr 09:23:20
We have switched over one of our incoming SMS gateways to the new system now. So most messages coming from outside will use this. Any issues, please let support know ASAP.
Update
25 Apr 10:29:50
We are currently running all SMS via the new platform - we expect there to be more work still to be done, but it should be operating as per the current documentation now. Please let support know of any issues.
Update
26 Apr 13:27:37
We have switched the DNS to point SMS to the new servers running the new system. Any issues, please let support know.
Started 14 Apr
Previously expected 1 May

11 Apr 15:50:28
Details
11 Apr 15:53:42
There is a problem with the C server and it needs to be restarted again after the maintenance yesterday evening. We are going to do this at 17:00 as we need it to be done as soon as possible. Sorry for the short notice.
Started 11 Apr 15:50:28

7 Apr 13:45:09
Details
7 Apr 13:52:31
We will be carrying out some maintenance on our 'C' SIP server outside office hours. It will cause disruption to calls, but is likely only to last a couple of minutes and will only affect calls on the A and C servers. It will not affect calls on our "voiceless" SIP platform or SIP2SIM. We will do this on Thursday evening at around 22:30. Please contact support if you have any questions.
Update
10 Apr 23:19:59
Completed earlier this evening.
Started 7 Apr 13:45:09
Previously expected 10 Apr 22:45:00

25 Sep 2013
Details
18 Sep 2013 16:32:41
We have received notification that Three's network team will be carrying out maintenance on one of the nodes that routes our data SIM traffic between 00:00 and 06:00 on Weds 25th September. Some customers may notice a momentary drop in connections during this time as any SIMs using that route will disconnect when the link is shut down. Any affected SIMs will automatically take an alternate route when they try and reconnect. Unfortunately, we have no control over the timing of this as it is dependent on the retry strategy of your devices. During the window, the affected node will be offline therefore SIM connectivity should be considered at risk throughout.
Started 25 Sep 2013

Yesterday 20:00:52
Details
Yesterday 20:00:52
The working days between Christmas and New Year are "Christmas" rate, this means that any usage on 29th, 30th, and 31st December is not counted towards your Units allowance. As usual, bank holidays are treated as 'Weekend' rate.
(This doesn't apply to Home::1 or Office::1 customers.)
We wish all out customers a Merry Christmas!
Started Yesterday 19:00:00

Wednesday 11:45:55
Details
Wednesday 11:09:30
We are looking in to a problem on secondary-dns.co.uk at the moment as it is requesting transfers from the 'wrong' IP address. We hope to 'resolve' this soon.
Resolution This has been fixed, zones are catching up now. Sorry for the inconvenience caused.
Started Wednesday 09:00:00
Closed Wednesday 11:45:55

15 Dec 10:00:00
Details
11 Dec 15:52:46
The mobile carrier (Three) have a problem affecting the activating and suspending of SIMs. They are aware of this and it is being worked on.
Update
15 Dec 14:50:04
This is still open, we are chasing this with the carrier.
Resolution From the carrier: "Three have monitored the issue over the weekend and no further incidents of this have been reported. They have restarted the affected platform and all requests have been processed as normal. We're sorry for any inconvenience for the delay in processing the requests."
Started 10 Dec 13:00:00
Closed 15 Dec 10:00:00

12 Dec 11:00:40
Details
11 Dec 10:42:15
We are seeing some TT connected lines with packetloss starting at 9AM yesterday and today. The loss lasts until 10AM and then there continues a low amount of loss. We have reported this to TalkTalk
Update
11 Dec 10:46:34
This is the pattern of loss we are seeing:
Update
12 Dec 12:00:04
No loss has been seen on these lines today. We're still chasing TT for any update though.
Resolution The problem went away... TT were unable to find the cause.
Broadband Users Affected 7%
Started 11 Dec 09:00:00
Closed 12 Dec 11:00:40

12 Dec 11:28:34
Details
12 Dec 11:28:34
During the holiday season our offices will closed on the usual (English) blank holidays and weekends.
Wed December 24th - Open as usual
Thu December 25th - Closed
Fri December 26th - Closed
Sat December 27th - Closed
Sun December 28th - Closed
Mon December 29th - Open as usual
Tue December 30th - Open as usual
Wed December 31th - Open as usual
Thu January 1st - Closed
Fri January 2nd - Open as usual
Started 12 Dec 11:00:00

11 Dec 14:15:00
Details
11 Dec 14:13:58
BT issue affecting SOHO AKA GERRARD STREET 21CN-ACC-ALN1-L-GER. we have reported to this BT and they are now investigating.
Update
11 Dec 14:19:33
BT are investigating, however the circuits are mostly back online.
Started 11 Dec 13:42:11 by AAISP Pro Active Monitoring Systems
Closed 11 Dec 14:15:00
Previously expected 11 Dec 18:13:11 (Last Estimated Resolution Time from AAISP)

4 Dec 10:18:06
Details
21 Jul 15:49:07
We now have a new official URL for our Status Pages: https://aastatus.net The reason for the change is to make the status pages completely independent of any AAISP infrastructure. They were already hosted on a server in Amsterdam out side of our network, and now the DNS is independent too. Anyone using status.aa.net.uk should update to use aastatus.net
Started 21 Jul 15:45:00

4 Dec 10:15:46
Details
4 Jul 11:00:06
Just to update - we have the physical SIM cards now, and we have pricing agreed. They are not yet provisioned on the network and that will hopefully be start of next week at which point we'll be able to start selling them. Thank you all for your patience.
Started 4 Jul
Previously expected 8 Jul

2 Dec 09:05:00
Details
1 Dec 21:54:24
All FTTP circuits on Bradwell Abbey have packetloss. This started at about 23:45 on 30th November. This is affecting other ISPs too. BT did have an Incident open, but this has been closed. They restarted a line card last night, but it seems the problem has been since the card was restarted. We are chasing BT.
Example graph:
Update
1 Dec 22:38:39
It has been a struggle to get the front line support and the Incident Desk at BT to accept that this is a problem. We have passed this on to our Account Manager and other contacts within BT in the hope of a speedy fix.
Update
2 Dec 07:28:40
BT have tried doing something overnight, but the packetloss still exists at 7am 2nd December. Our monitoring shows:
  • packet loss it stops at 00:30
  • The lines go off between 04:20 and 06:00
  • The packet loss starts again at 6:00 when they come back onine.
We've passed this on to BT.
Update
2 Dec 09:04:56
Since 7AM today, the lines have been OK... we will continue to monitor.
Started 30 Nov 23:45:00
Closed 2 Dec 09:05:00

3 Dec 09:44:00
Details
27 Nov 16:31:03
We are seeing what looks like congestion on the Walworth exchange. Customers will be experiencing high latency, packetloss and slow throughput in the evenings and weekends. We have reported this to TalkTalk.
Update
2 Dec 09:39:27
Talk Talk are still investigating this issue.
Update
2 Dec 12:22:04
The congestion issue has been discovered on Walworth Exchange and Talk Talk are in the process of traffic balancing.
Update
3 Dec 10:30:14
Capacity has been increased and the exchange is looking much better now.
Started 27 Nov 16:28:35
Closed 3 Dec 09:44:00

3 Dec 18:20:00
Details
3 Dec 10:45:55
We are seeing MTU issues on some 21CN lines this morning where lines are unable to pass more than 1462 byte IP packets. It isn't affecting all lines, and the common factor appears to be that they are on ACC-ALN1 BRASs in London, but not all lines on those BRASs are affected. We have reported the issue to BT Wholesale and they are investigating the issue.
Update
3 Dec 16:18:23
BT have now raised an incident and are investigating.
Update
3 Dec 18:57:49
This has been fixed. We've been speaking to some network guys at BT this evening and helping them find the fault.
Resolution BT fixed this last night. We believe BT had equipment in the network that was miss-configured.
Started 3 Dec 09:00:00
Closed 3 Dec 18:20:00

22 Nov 09:00:00
Details
19 Nov 21:29:04
Customers with SIMs are currently unable to activate them. The fault has been escalated within the Network carrier (Three). We'll update this post when we get further news.
Update
21 Nov 09:13:50
SIMs are not activating correctly. We do apologise for the delay. Customers can now activate SIMs via the control pages.
Started 18 Nov 15:00:00
Closed 22 Nov 09:00:00

19 Nov 17:00:20
Details
17 Nov 08:42:08

Our outgoing email service, smtp.aa.net.uk, is made up of multiple servers for resilience. One of these servers had a disk system failure over the weekend and has been taken out of service for repair. Before the server was removed from the pool it would have been reporting errors to customers trying to send email and not accepting connections.

At the moment we are running on a single server whilst the faulty one is repaired. We don't expect this to be a problem. Customers are able to send email OK.

Email that was on the server before it died has been recovered and will be relayed on this morning. -This may mean a small number of emails will be delayed.

Update
19 Nov 11:45:23
Queued mail on the faulty server was restored on Monday evening.
Started 17 Nov 08:36:01
Closed 19 Nov 17:00:20

19 Nov 16:20:46
Details
19 Nov 15:11:12
Lonap (one of the main Internet peering points in the UK) has a problem. We have stopped passing traffic over Lonap. Customers may have seen packetloss for a short while, but routing should be OK now. We are monitoring the traffic and will bring back Lonap when all is well.
Update
19 Nov 16:21:29
The Lonap problem has been fixed, and we've re-enabled our peering.
Started 19 Nov 15:00:00
Closed 19 Nov 16:20:46

21 Nov 00:18:00
Details
21 Nov 10:58:09
We have a number of TT lines down all on the same RAS: HOST-62-24-203-36-AS13285-NET. We are chasing this with TalkTalk.
Update
21 Nov 11:01:29
Most lines are now back. We have informed TalkTalk.
Update
21 Nov 12:18:22
TT have come back to us. They were aware of the problem, it was caused by a software problem on an LTS.
Started 21 Nov 10:45:00
Closed 21 Nov 00:18:00

25 Nov 10:43:46
Details
21 Oct 14:10:19
We're seeing congestion from 10am up to 11:30pm across the BT Rose Street, PIMLICO and the High Wycombe exchange. A fault has been raised with BT and we will post updates as soon as we can. Thanks for your patience.
Update
28 Oct 11:23:44
Rose Street and High Wycombe are now clear. Still investigating Pimlico
Update
3 Nov 14:41:45
Pimlico has now been passed to BT's capacity team to deal with . Further capacity is needed and will be added asap. We will provide updates as soon as it's available.
Update
5 Nov 10:12:30
We have just been informed by the BT capacity team that end users will be moved to a different VLAN on Friday morning. We will post futher updates when we have them.
Update
11 Nov 10:23:59
Most of the Pimlico exchange is now fixed. Sorry for the delay.
Update
19 Nov 11:01:57
There is further planned work on the Pimlico exchange for the 20th November. This should resolve the congestion on the Exchange.
Update
25 Nov 10:44:43
Pimlico lines are now running as expected. Thanks for your patience.
Started 21 Oct 13:31:50
Closed 25 Nov 10:43:46

2 Dec 08:52:25
Details
20 Nov 09:37:01
We have had a couple of incidents over the last few weeks with some external routes vanishing from our network. Whilst this may seem quite minor it simply should not happen. As such we are working on some investigation over the next few days. This may mean re-loading some routers to add additional diagnostics. In general this is a pretty seamless operation as packets are re-routed around the equipment that is being reloaded. However, there is a small risk of issues.
Update
25 Nov 10:25:38
Investigations are going well and have not needed any changes yet. We may be reloading two routers later today (Tuesday) which should have little or no impact, but will help us with diagnostics.
Update
1 Dec 09:22:04
We think we may have found the cause of the routing issue, and plan to upgrade some routers during the week. This should be relatively seamless.
Update
1 Dec 17:11:04
Some router upgrades this evening (Monday). We have seen this cause a blip on TalkTalk lines before, but hopefully that will not happen this time. In any case we expect any disruption to be a few seconds at most, and for most people none at all.
Resolution The changes seem to have worked. We will also be upgrading again later in the week. Thank you all for your patience.
Started 21 Nov
Closed 2 Dec 08:52:25
Previously expected 8 Dec

4 Nov 16:47:11
Details
4 Nov 09:42:18
Several graphs have been missing in recent weeks, some days, and some LNSs. This is something we are working on. Unfortunately, today, one of the LNSs is not showing live graphs again, and so these will not be logged over night. We hope to have a fix for this in the next few days. Sorry for any inconvenience.
Resolution The underlying cause has been identified and will be deployed over the next few days.
Started 1 Oct
Closed 4 Nov 16:47:11
Previously expected 10 Nov

3 Nov 15:00:00
Details
3 Nov 10:07:42
Due to a customer managing to send spam through our outgoing mail relays (smtp.aa.net.uk) some of the server IP addresses have been blacklisted. We're working on getting the IPs removed from the blacklists. In the mean time, apologies for any inconvenience.
Update
3 Nov 13:50:58
This shouldn't be affecting customers today as we've changed relay IP addresses. If you see any bounces now, please contact support.
Started 3 Nov 10:05:41
Closed 3 Nov 15:00:00
Previously expected 3 Nov 14:05:41

10 Nov 09:00:00
Details
5 Nov 13:30:50
Some routers will be updated tomorrow morning, this should have little or no impact. We are also doing LNS upgrades over the next 3 nights anyway.
Update
6 Nov 06:09:29
The upgrades went as planned, but we are extending this PEW to mornings over next few days.
Started 6 Nov
Closed 10 Nov 09:00:00
Previously expected 10 Nov 08:00:00

5 Nov 02:27:31
Details
4 Nov 09:50:36
Once again we expect to reset one of the LNSs early in the morning. This will not be the usual LNS switch, with the preferred time of night, but all lines on the LNS at once. The exact time depends on staff availability, sorry. This means a clear of PPP which can immediately reconnect. This may be followed by a second PPP reset a few minutes later. We do hope to have a proper solution to this issue in a couple of days.
Resolution Reset completed. We will do a normal rolling update of LNSs over next three nights. This should address the cause of the problem. If we have issues with graphs before that is complete, we may have to do a reset like this again.
Broadband Users Affected 33%
Started 5 Nov
Closed 5 Nov 02:27:31
Previously expected 5 Nov 07:00:00

1 Nov 11:35:11
[Broadband] - Blip - Closed
Details
1 Nov 11:55:38
There appears to be something of a small DoS attack which resulted in a blip around 11:29:16 today, and caused some issues with broadband lines and other services. We're looking in to this at present and graphs are not currently visible on one of the LNSs for customers.
Update
1 Nov 13:09:44
We expect graphs on a.gormless to be back tomorrow morning after some planned work.
Resolution Being investigated further.
Started 1 Nov 11:29:16
Closed 1 Nov 11:35:11

2 Nov 04:08:38
Details
1 Nov 13:07:11
We normally do LNS switch overs without a specific planned notice - the process is routine for maintenance and means clearing the PPP session to reconnect immediately on another LNS. We do one line at a time, slowly. We even have a control on the clueless so you can state preferred time of night.

However, tomorrow morning, we will be moving lines off a.gormless (one third of customers) using a different process. It should be much the same, but all lines will be at one time of night, and this may mean some are slower to reconnect.

This plan is to do this early morning - the exact time depends on when staff are available. Sorry for any inconvenience.

Resolution Completed as planned. Graphs back from now on a.gormless.
Broadband Users Affected 33%
Started 1 Nov 03:00:00
Closed 2 Nov 04:08:38
Previously expected 1 Nov 07:00:00

29 Oct 20:43:36
[Email and Web Hosting] - Delays to incoming email - Closed
Details
29 Oct 14:12:02
We've currently got a backlog of incoming mail on our servers, which means that some inbound email is being delayed. Sorry for any inconvenience. This does not affect outgoing email, IMAP, POP3 or webmail access.
Update
29 Oct 18:13:27
We're getting through the backlog, but there's still a large queue of email waiting. Things are catching up.
Update
29 Oct 20:39:45
Back to normal now. If you have any questions, please contact support.
Closed 29 Oct 20:43:36
Previously expected 29 Oct 16:08:11

24 Oct 18:13:53
Details
1 Sep 09:40:46
Once again, the Direct Debits have not gone through on the 1st and so have caused an emailed notice for collection and hence they are going out on the 8th. Obviously they are going out on the date notified in the email, but I appreciate that a few extra days credit may be inconvenient for some people expecting the DD on the 1st. We are working on this. The problem is that the system has been desigend very "defensively" so that any doubt at all on the emailed advance notice will result in a new emailed 5 working days notice to be absolutely sure we are meeting the DD rules.
Resolution This is resolved now
Started 1 Sep
Closed 24 Oct 18:13:53

28 Oct 12:14:13
Details
23 Oct 08:16:55
Over the next few days we expect to do some minor updates. Previous work has made it so that these are seamless, but there is always a risk of some impact. Plans are to do work before 8am though some backup routers may be updated at otehr times (and expect to be no risk). We will also be doing some overnight rolling LNS updates As ever, we may update test routers/LNS at any time.
Update
24 Oct 18:15:46
Updates this morning were around 5am with one extra at 8am, but they show as virtually no disruption to any traffic. Several "B" routers have been done today. We are doing an LNS roll over tonight and more routers in the morning. The testing over the last few weeks has been very good, and allowed us to track down some minor issues that simply did not show on the bench test systems.
Resolution Updates were all completed over the weekend.
Started 24 Oct
Closed 28 Oct 12:14:13
Previously expected 30 Oct

22 Oct 14:45:53
Details
17 Oct 16:43:25
Planning to do some work early Saturday and possible Sunday, before 8am. It should be virtually no disruption though. This is on the Maidenhead routers so any impact would be VoIP, colocation and Ethernet links from there. We have some slight tweaks to apply.
Started 18 Oct
Previously expected 20 Oct

15 Oct 17:14:55
[Control Pages] - New Usage Graph - Info
Details
We have added a graph to the usage section of the Control Pages. This will show upload and download usage over the past year. We welcome any feedback!
Started 30 Sep 15:00:00
Closed 15 Oct 17:14:55

7 Oct 06:17:13
Details
3 Oct 16:25:24
As we advised, we have had to make some radical changes to our billing to fix database load issues. These have gone quite well overall, but there have been a few snags. We think we have them all now, but this month we had to revert some usage charging giving some free usage.

We have identified that quarterly billed customers on units tariffs were not charged, so these are being applied shortly as a new invoice. Anyone with excess usage as a result, please do ask accounts for a credit.

We have also identified that call charges have not been billed - these can be billed to date if anyone asks, or if you leave it then they should finally catch up on next month's bill.

Sorry for any inconvenience.

Started 1 Oct
Previously expected 1 Nov

15 Oct 17:14:18
Details
6 Oct 14:22:50
For the next week or so we're considering 5am-7am to be a PEW window for some very low disruption work (a few seconds of "blip"). We're still trying very hard to improve our network configuration and router code to create a much more stable network. It seems, from recent experience, that this sort of window will be least disruptive to customers. It is a time where issues can be resolve by staff if needed (which is harder at times like 3am) and we get more feedback from end users. As before, we expect this work to have no impact in most cases, and maybe a couple of seconds of routing issues if it is not quite to plan. Sadly, all of our efforts to create the same test scenarios "on the bench" have not worked well. At this stage we are reviewing code to understand Sunday morning's work better, and this may take some time before we start. We'll update here and on irc before work is done. Thank you for your patience.
Update
7 Oct 09:06:41
We did do work around 6:15 to 6:30 today - I thought I had posted an update here before I started but somehow it did not show. If we do any more, I'll try and make it a little earlier.
Update
8 Oct 05:43:11
Doing work a little earlier today. We don't believe we caused any blips with today's testing.
Update
9 Oct 05:47:53
Another early start and went very well.
Update
10 Oct 08:22:53
We updated remaining core routers this morning, and it seemed to go very well. Indeed pings we ran had zero loss when upgrading routers in Telecity. However, we did lose TalkTalk broadband lines in the process. These all reconnected straight away, but we are no reviewing how this happens to try and avoid it in future.
Resolution Closing this PEW from last week. We may need to do more work at some point, but we are getting quite good at this now.
Started 7 Oct 06:00:00
Closed 15 Oct 17:14:18
Previously expected 14 Oct 07:00:00

5 Oct 07:26:50
Details
3 Oct 10:41:59
We do plan to upgrade routers again over the weekend, probably early saturday morning (before 9am). I'll post on irc at the time and update this notice.

The work this week means we expect this to be totally seamless, but the only way to actually be sure is to try it.

If we still see any issues we'll do more on Sunday.

Update
4 Oct 06:54:19
Upgrades starting shortly.
Update
4 Oct 07:24:47
Almost perfect!

We loaded four routers, each at different points in the network. We ran a ping that went through all four routers whilst doing this. For three of them we did see ping drop a packet. The fourth we did not see a drop at all.

This may sound good, but it should be better - we should not lose a single packet doing this. We're looking at the logs to work out why, and may try again Sunday morning.

Thank you for your patience.

Update
4 Oct 07:53:52
Plan for tomorrow is to pick one of the routers that did drop a ping, and shut it down and hold it without restarting - at that point we can investigate what is still routing via it and why. This should help us explain the dropped ping. Assuming that provides the clues we need we may load or reconfigure routers later on Sunday to fix it.
Update
5 Oct 06:57:39
We are starting work shortly.
Update
5 Oct 07:11:00
We are doing the upgrades as planned, but not able to do the level of additional diagnostics we wanted. We may look in to that next weekend.
Resolution Only 3 routers were upgraded, the 3rd having several seconds of issues. We will investigate the logs and do another planned work. It seems early morning like this is less disruptive to customers.
Started 4 Oct
Closed 5 Oct 07:26:50
Previously expected 6 Oct

2 Oct 19:05:55
Details
2 Oct 19:05:15
We'd like to thank customers for patience this week. The tests we have been doing in the evenings have been invaluable. The issues seen have mostly related to links to Maidenhead (so voice calls rather than broadband connections).

The work we are doing has involved a lot of testing "on the bench" and even in our offices (to the annoyance of staff) but ultimately testing on the live customer services is the final test. The results have been informative and we are very close to out goal now.

The goal is to allow router maintenance with zero packet loss. We finally have the last piece in the jigsaw for this, and so should have this in place soon. Even so, there may be some further work to achieve this.

Apart from a "Nice to have" goal, this also relates to failures of hardware, power cuts, and software crashes. The work is making the network configuration more robust and should allow for key component failures with outages as short as 300ms in some cases. LNS issues tend to take longer for PPP to reconnect, but we want to try and be as robust as possible.

So, once again, thank you all for your patience while we work on this. There may be some more planned works which really should now be invisible to customers.

Started 2 Oct 19:00:41

2 Oct 11:05:01
Details
2 Oct 11:05:01
We're updating SSL certificates for our customer facing servers this morning (email, webmail). Users who don't have the CAcert root certificate installed may see errors. Details on http://aa.net.uk/cacert.html
Started 2 Oct 11:04:16

1 Oct 17:49:32
Details
30 Sep 18:04:06
Having been very successful with the router upgrade tonight, we are looking to move to the next router on Wednesday. Signs so far are that this should be equally seamless. We are, however, taking this slowly, one step at a time, to be sure.
Resolution We loaded 4 routers in all, and some were almost seamless, and some had a few seconds of outage, it was not perfect but way better than previously. We are now going to look in to the logs in detail and try to understand what we do next.

Our goal here is zero packet loss for maintenance.

I'd like to thank all those on irc for their useful feedback during these test.

Started 1 Oct 17:00:00
Closed 1 Oct 17:49:32
Previously expected 1 Oct 18:00:00

30 Sep 18:02:25
Details
29 Sep 21:57:11
We are going to spend much of tomorrow trying to track down why things did not go smoothly tonight, and hope to have a solution by tomorrow (Tuesday) evening.

This time I hope to make a test load before the peak period at 6pm, so between 5pm and 6pm when things are a bit of a lull between business and home use.

If all goes to plan there will be NO impact at all, and that is what we hope. If so we will update three routers with increasing risk of impact, and abort if there are any issues.

Please follow things on irc tomorrow.

If this works as planned we will finally have all routers under "seamless upgrade" processes.

Update
30 Sep 08:29:42
Tests on our internal systems this morning confirm we understand what went wrong last night, and as such the upgrade tonight should be seamless.

For the technically minded, we had an issue with VRRP becoming master too soon, i.e. before all routes are installed. The routing logic is now linked to VRRP to avoid this scenario, regarless of how long routing takes.

Resolution The upgrade went very nearly perfectly on the first router - we believe the only noticeable impact was the link to our office, which we think we understand now. However, we did only do the one router this time.
Started 30 Sep 17:00:00
Closed 30 Sep 18:02:25
Previously expected 30 Sep 18:00:00

29 Sep 22:37:36
Details
21 Aug 12:50:32
Over the past week or so we have been missing data on some monitoring graphs, this is shown as purple for the first hour in the morning. This is being caused by delays in collecting the data. This is being looked in to.
Resolution We believe this has been fixed now. We have been monitoring it for a fortnight after making an initial fix, and it looks to have been successful.
Closed 29 Sep 22:37:36

29 Sep 19:29:19
Details
29 Sep 14:06:12
We expect to reload a router this evening, which is likely to cause a few seconds of routing issues. This is part of trying to address the blips caused by router upgrades, which are meant to be seamless.
Update
29 Sep 18:48:37
The reload is expected shortly, and will be on two boxes at least. We are monitoring the effect of the changes we have made. They should be a big improvement.
Resolution Upgrade was tested only on one router (Maidenhead) and caused some nasty impact on routing to call servers and control systems - general DSL was unaffected. Changes are backed out now, and back to drawing board. Further PEW will be announced as necessary.
Started 29 Sep 17:00:00
Closed 29 Sep 19:29:19
Previously expected 29 Sep 23:00:00

29 Sep 13:17:50
Details
29 Sep 08:48:37
Some updates to the billing system have caused a problem for units billed customers resulting in their usage for next month starting early, i.e. usage is now being logged for October.

Because of the way usage carriers forward, this is unlikey to have much impact on customer in terms of additional charges. However, any customers that think they have lost out, please let us know and we'll make a manual adjustment.

The problem has been corrected for next month.

Update
29 Sep 08:57:00
It looks like customers won't get billed top-up and may not get billed units either, so we are working on un-doing this issue so that billing is done normally. Please bear with us.
Update
29 Sep 09:23:40
We are working on this now and should have usage billing back to normal later this morning.
Resolution Usage billing has been restored to around 1am Saturday, giving customers 2.5 days of unmetered usage.
Started 29 Sep 08:45:12
Closed 29 Sep 13:17:50

28 Sep 19:20:54
Details
28 Sep 18:52:50
We are experiencing a network problem affecting our broadband customers. Staff are investigating.
Update
28 Sep 19:08:28
This is looking like some sort of Denial of Service attack. We're lookig at mitigating this.
Update
28 Sep 19:16:36
The traffic has died down, things are starting to look better.
Update
28 Sep 19:21:46
Traffic is now back to normal.
Started 28 Sep 18:30:00
Closed 28 Sep 19:20:54

20 Sep 07:09:09
Details
20 Sep 11:59:13
RADIUS account is behind at the moment. This is causing the usage data to appear as missing from customer lines. The accounting is behind, but it's not broken, and is catching up. The usage data doesn't appear to be lost, and should appear later in the day.
Update
21 Sep 08:12:52
Records have now caught up.
Closed 20 Sep 07:09:09
Previously expected 20 Sep 15:57:11

25 Sep 12:07:57
Details
25 Sep 11:48:00
We are investigating a network problem affecting our offices.
Update
25 Sep 11:51:31
This is affecting our telephones.
Update
25 Sep 12:08:45
The office is back online. We had lost IPv4, we're looking in to the cause of this.
Started 25 Sep 11:47:00
Closed 25 Sep 12:07:57