Order posts by limited to posts

Yesterday 16:31:03
Yesterday 16:31:03
We are seeing what looks like congestion on the Walworth exchange. Customers will be experiencing high latency, packetloss and slow throughput in the evenings and weekends. We have reported this to TalkTalk.
Started Yesterday 16:28:35
Update expected Today 16:00:00

21 Nov 10:58:09
21 Nov 10:58:09
We have a number of TT lines down all on the same RAS: HOST-62-24-203-36-AS13285-NET. We are chasing this with TalkTalk.
21 Nov 11:01:29
Most lines are now back. We have informed TalkTalk.
21 Nov 12:18:22
TT have come back to us. They were aware of the problem, it was caused by a software problem on an LTS.
Started 21 Nov 10:45:00

21 Nov
20 Nov 09:37:01
We have had a couple of incidents over the last few weeks with some external routes vanishing from our network. Whilst this may seem quite minor it simply should not happen. As such we are working on some investigation over the next few days. This may mean re-loading some routers to add additional diagnostics. In general this is a pretty seamless operation as packets are re-routed around the equipment that is being reloaded. However, there is a small risk of issues.
25 Nov 10:25:38
Investigations are going well and have not needed any changes yet. We may be reloading two routers later today (Tuesday) which should have little or no impact, but will help us with diagnostics.
Started 21 Nov
Expected close 1 Dec

19 Nov 11:45:23
17 Nov 08:42:08

Our outgoing email service, smtp.aa.net.uk, is made up of multiple servers for resilience. One of these servers had a disk system failure over the weekend and has been taken out of service for repair. Before the server was removed from the pool it would have been reporting errors to customers trying to send email and not accepting connections.

At the moment we are running on a single server whilst the faulty one is repaired. We don't expect this to be a problem. Customers are able to send email OK.

Email that was on the server before it died has been recovered and will be relayed on this morning. -This may mean a small number of emails will be delayed.

19 Nov 11:45:23
Queued mail on the faulty server was restored on Monday evening.
Started 17 Nov 08:36:01

18 Aug 10:00:00
18 Aug 10:48:39

Our legacy 'C' VoIP platform will be removed from service on March 2nd 2015.

This platform is now old, tired and we have a better VoIP platform: our FireBrick-based 'Voiceless' platform.

We have created a wiki page with details for customers needing to move platforms: http://wiki.aa.org.uk/VoIP_-_Moving_Platform

We will be contacting customers individually by email later in the year, but we'd recommend that customers start moving now. The wiki page above explains how to move, and in most cases it is simply changing the server details in your VoIP device. Please do contact Support for help though.

Started 18 Aug 10:00:00 by AAISP Staff
Update expected 02 Mar 2015 11:00:00
Expected close 02 Mar 2015 10:00:00

29 Jul 11:42:12
17 Jul 10:08:44
Our email services can learn spam/non-spam messages. This feature is currently down for maintenance as we work on the back-end systems. This means that if you move email in to the various 'learn' folders they will stay there and will not be processed at the moment. For the moment, we advise customers not to use this feature. Will will post updates in the next week or so as we may well be changing how this feature works. This should not affect any spam scores etc, but do contact support if needed.
29 Jul 11:42:12
This project is still ongoing. This should not be causing too many problems though, as the spam checking system has many many other ways to determine if a message is spam or not. However, for now, if customers have email that is miss-classified by the spam checking system then please email the headers in to support and we can make some suggestions.
Started 17 Jul 10:00:00

3 Jun 17:00:00
3 Jun 18:20:39
The router upgrades went well, and now there is a new factory release we'll be doing some rolling upgrades over the next few days. Should be minimal disruption.
3 Jun 18:47:21
First batch of updates done.
Started 3 Jun 17:00:00
Previously expected 7 Jun

14 Apr
13 Apr 17:29:53
We handle SMS, both outgoing from customers, and incoming via various carriers, and we are now linking in once again to SMS with mobile voice SIM cards. The original code for this is getting a tad worn out, so we are working on a new system. It will have ingress gateways for the various ways SMS can arrive at us, core SMS routing, and then output gateways for the ways we can send on SMS. The plan is to convert all SMS to/from standard GSM 03.40 TPDUs. This is a tad technical I know, but it will mean that we have a common format internally. This will not be easy as there are a lot of character set conversion issues, and multiple TPDUs where concatenation of texts is used. The upshot for us is a more consistent and maintainable platform. The benefit for customers is more ways to submit and receive text messages, including using 17094009 to make an ETSI in-band modem text call from suitable equipment (we think gigasets do this). It also means customers will be able to send/receive texts in a raw GSM 03.40 TPDU format, which will be of use to some customers. It also makes it easier for us to add other formats later. There will be some changes to the existing interfaces over time, but we want to keep these to a minimum, obviously.
21 Apr 16:27:23

Work is going well on this, and we hope to switch Mobile Originated texting (i.e. texts from the SIP2SIM) over to the new system this week. If that goes to plan we can move some of the other ingress texting over to the new system one by one.

We'll be updating documentation at the same time.

The new system should be a lot more maintainable. We have a number of open tickets with the mobile carrier and other operators to try and improve the functionality of texting to/from us. These cover things like correct handling of multi-part texts, and correct character set coding.

The plan is ultimately to have full UTF-8 unicode support on all texts, but that could take a while. It seems telcos like to mess with things rather than giving us a clean GSM TPDU for texts. All good fun.

22 Apr 08:51:09
We have updated the web site documentation on this to the new system, but this is not fully in use yet. Hopefully this week we have it all switched over. Right now we have removed some features from documenation (such as delivery reports), but we plan to have these re-instated soon once we have the new system handling them sensibly.
22 Apr 09:50:44
MO texts from SIP2SIM are now using the new system - please let support know of any issues.
22 Apr 12:32:07
Texts from Three are now working to ALL of our 01, 02, and 03 numbers. These are delivered by email, http, or direct to SIP2SIM depending on the configuration on our control pages.
23 Apr 09:23:20
We have switched over one of our incoming SMS gateways to the new system now. So most messages coming from outside will use this. Any issues, please let support know ASAP.
25 Apr 10:29:50
We are currently running all SMS via the new platform - we expect there to be more work still to be done, but it should be operating as per the current documentation now. Please let support know of any issues.
26 Apr 13:27:37
We have switched the DNS to point SMS to the new servers running the new system. Any issues, please let support know.
Started 14 Apr
Previously expected 1 May

11 Apr 15:50:28
11 Apr 15:53:42
There is a problem with the C server and it needs to be restarted again after the maintenance yesterday evening. We are going to do this at 17:00 as we need it to be done as soon as possible. Sorry for the short notice.
Started 11 Apr 15:50:28

7 Apr 13:45:09
7 Apr 13:52:31
We will be carrying out some maintenance on our 'C' SIP server outside office hours. It will cause disruption to calls, but is likely only to last a couple of minutes and will only affect calls on the A and C servers. It will not affect calls on our "voiceless" SIP platform or SIP2SIM. We will do this on Thursday evening at around 22:30. Please contact support if you have any questions.
10 Apr 23:19:59
Completed earlier this evening.
Started 7 Apr 13:45:09
Previously expected 10 Apr 22:45:00

25 Sep 2013
18 Sep 2013 16:32:41
We have received notification that Three's network team will be carrying out maintenance on one of the nodes that routes our data SIM traffic between 00:00 and 06:00 on Weds 25th September. Some customers may notice a momentary drop in connections during this time as any SIMs using that route will disconnect when the link is shut down. Any affected SIMs will automatically take an alternate route when they try and reconnect. Unfortunately, we have no control over the timing of this as it is dependent on the retry strategy of your devices. During the window, the affected node will be offline therefore SIM connectivity should be considered at risk throughout.
Started 25 Sep 2013

25 Nov 10:43:46
21 Oct 14:10:19
We're seeing congestion from 10am up to 11:30pm across the BT Rose Street, PIMLICO and the High Wycombe exchange. A fault has been raised with BT and we will post updates as soon as we can. Thanks for your patience.
28 Oct 11:23:44
Rose Street and High Wycombe are now clear. Still investigating Pimlico
3 Nov 14:41:45
Pimlico has now been passed to BT's capacity team to deal with . Further capacity is needed and will be added asap. We will provide updates as soon as it's available.
5 Nov 10:12:30
We have just been informed by the BT capacity team that end users will be moved to a different VLAN on Friday morning. We will post futher updates when we have them.
11 Nov 10:23:59
Most of the Pimlico exchange is now fixed. Sorry for the delay.
19 Nov 11:01:57
There is further planned work on the Pimlico exchange for the 20th November. This should resolve the congestion on the Exchange.
25 Nov 10:44:43
Pimlico lines are now running as expected. Thanks for your patience.
Started 21 Oct 13:31:50
Closed 25 Nov 10:43:46

21 Nov 09:00:00
19 Nov 21:29:04
Customers with SIMs are currently unable to activate them. The fault has been escalated within the Network carrier (Three). We'll update this post when we get further news.
21 Nov 09:13:50
SIMs are not activating correctly. We do apologise for the delay. Customers can now activate SIMs via the control pages.
Started 18 Nov 15:00:00
Closed 21 Nov 09:00:00

19 Nov 16:20:46
19 Nov 15:11:12
Lonap (one of the main Internet peering points in the UK) has a problem. We have stopped passing traffic over Lonap. Customers may have seen packetloss for a short while, but routing should be OK now. We are monitoring the traffic and will bring back Lonap when all is well.
19 Nov 16:21:29
The Lonap problem has been fixed, and we've re-enabled our peering.
Started 19 Nov 15:00:00
Closed 19 Nov 16:20:46

4 Nov 16:47:11
4 Nov 09:42:18
Several graphs have been missing in recent weeks, some days, and some LNSs. This is something we are working on. Unfortunately, today, one of the LNSs is not showing live graphs again, and so these will not be logged over night. We hope to have a fix for this in the next few days. Sorry for any inconvenience.
Resolution The underlying cause has been identified and will be deployed over the next few days.
Started 1 Oct
Closed 4 Nov 16:47:11
Previously expected 10 Nov

3 Nov 15:00:00
3 Nov 10:07:42
Due to a customer managing to send spam through our outgoing mail relays (smtp.aa.net.uk) some of the server IP addresses have been blacklisted. We're working on getting the IPs removed from the blacklists. In the mean time, apologies for any inconvenience.
3 Nov 13:50:58
This shouldn't be affecting customers today as we've changed relay IP addresses. If you see any bounces now, please contact support.
Started 3 Nov 10:05:41
Closed 3 Nov 15:00:00
Previously expected 3 Nov 14:05:41

10 Nov 09:00:00
5 Nov 13:30:50
Some routers will be updated tomorrow morning, this should have little or no impact. We are also doing LNS upgrades over the next 3 nights anyway.
6 Nov 06:09:29
The upgrades went as planned, but we are extending this PEW to mornings over next few days.
Started 6 Nov
Closed 10 Nov 09:00:00
Previously expected 10 Nov 08:00:00

5 Nov 02:27:31
4 Nov 09:50:36
Once again we expect to reset one of the LNSs early in the morning. This will not be the usual LNS switch, with the preferred time of night, but all lines on the LNS at once. The exact time depends on staff availability, sorry. This means a clear of PPP which can immediately reconnect. This may be followed by a second PPP reset a few minutes later. We do hope to have a proper solution to this issue in a couple of days.
Resolution Reset completed. We will do a normal rolling update of LNSs over next three nights. This should address the cause of the problem. If we have issues with graphs before that is complete, we may have to do a reset like this again.
Broadband Users Affected 33%
Started 5 Nov
Closed 5 Nov 02:27:31
Previously expected 5 Nov 07:00:00

1 Nov 11:35:11
[Broadband] - Blip - Closed
1 Nov 11:55:38
There appears to be something of a small DoS attack which resulted in a blip around 11:29:16 today, and caused some issues with broadband lines and other services. We're looking in to this at present and graphs are not currently visible on one of the LNSs for customers.
1 Nov 13:09:44
We expect graphs on a.gormless to be back tomorrow morning after some planned work.
Resolution Being investigated further.
Started 1 Nov 11:29:16
Closed 1 Nov 11:35:11

2 Nov 04:08:38
1 Nov 13:07:11
We normally do LNS switch overs without a specific planned notice - the process is routine for maintenance and means clearing the PPP session to reconnect immediately on another LNS. We do one line at a time, slowly. We even have a control on the clueless so you can state preferred time of night.

However, tomorrow morning, we will be moving lines off a.gormless (one third of customers) using a different process. It should be much the same, but all lines will be at one time of night, and this may mean some are slower to reconnect.

This plan is to do this early morning - the exact time depends on when staff are available. Sorry for any inconvenience.

Resolution Completed as planned. Graphs back from now on a.gormless.
Broadband Users Affected 33%
Started 1 Nov 03:00:00
Closed 2 Nov 04:08:38
Previously expected 1 Nov 07:00:00

29 Oct 20:43:36
[Email and Web Hosting] - Delays to incoming email - Closed
29 Oct 14:12:02
We've currently got a backlog of incoming mail on our servers, which means that some inbound email is being delayed. Sorry for any inconvenience. This does not affect outgoing email, IMAP, POP3 or webmail access.
29 Oct 18:13:27
We're getting through the backlog, but there's still a large queue of email waiting. Things are catching up.
29 Oct 20:39:45
Back to normal now. If you have any questions, please contact support.
Closed 29 Oct 20:43:36
Previously expected 29 Oct 16:08:11

24 Oct 18:13:53
1 Sep 09:40:46
Once again, the Direct Debits have not gone through on the 1st and so have caused an emailed notice for collection and hence they are going out on the 8th. Obviously they are going out on the date notified in the email, but I appreciate that a few extra days credit may be inconvenient for some people expecting the DD on the 1st. We are working on this. The problem is that the system has been desigend very "defensively" so that any doubt at all on the emailed advance notice will result in a new emailed 5 working days notice to be absolutely sure we are meeting the DD rules.
Resolution This is resolved now
Started 1 Sep
Closed 24 Oct 18:13:53

28 Oct 12:14:13
23 Oct 08:16:55
Over the next few days we expect to do some minor updates. Previous work has made it so that these are seamless, but there is always a risk of some impact. Plans are to do work before 8am though some backup routers may be updated at otehr times (and expect to be no risk). We will also be doing some overnight rolling LNS updates As ever, we may update test routers/LNS at any time.
24 Oct 18:15:46
Updates this morning were around 5am with one extra at 8am, but they show as virtually no disruption to any traffic. Several "B" routers have been done today. We are doing an LNS roll over tonight and more routers in the morning. The testing over the last few weeks has been very good, and allowed us to track down some minor issues that simply did not show on the bench test systems.
Resolution Updates were all completed over the weekend.
Started 24 Oct
Closed 28 Oct 12:14:13
Previously expected 30 Oct

22 Oct 14:45:53
17 Oct 16:43:25
Planning to do some work early Saturday and possible Sunday, before 8am. It should be virtually no disruption though. This is on the Maidenhead routers so any impact would be VoIP, colocation and Ethernet links from there. We have some slight tweaks to apply.
Started 18 Oct
Previously expected 20 Oct

15 Oct 17:14:55
[Control Pages] - New Usage Graph - Info
We have added a graph to the usage section of the Control Pages. This will show upload and download usage over the past year. We welcome any feedback!
Started 30 Sep 15:00:00
Closed 15 Oct 17:14:55

7 Oct 06:17:13
3 Oct 16:25:24
As we advised, we have had to make some radical changes to our billing to fix database load issues. These have gone quite well overall, but there have been a few snags. We think we have them all now, but this month we had to revert some usage charging giving some free usage.

We have identified that quarterly billed customers on units tariffs were not charged, so these are being applied shortly as a new invoice. Anyone with excess usage as a result, please do ask accounts for a credit.

We have also identified that call charges have not been billed - these can be billed to date if anyone asks, or if you leave it then they should finally catch up on next month's bill.

Sorry for any inconvenience.

Started 1 Oct
Previously expected 1 Nov

15 Oct 17:14:18
6 Oct 14:22:50
For the next week or so we're considering 5am-7am to be a PEW window for some very low disruption work (a few seconds of "blip"). We're still trying very hard to improve our network configuration and router code to create a much more stable network. It seems, from recent experience, that this sort of window will be least disruptive to customers. It is a time where issues can be resolve by staff if needed (which is harder at times like 3am) and we get more feedback from end users. As before, we expect this work to have no impact in most cases, and maybe a couple of seconds of routing issues if it is not quite to plan. Sadly, all of our efforts to create the same test scenarios "on the bench" have not worked well. At this stage we are reviewing code to understand Sunday morning's work better, and this may take some time before we start. We'll update here and on irc before work is done. Thank you for your patience.
7 Oct 09:06:41
We did do work around 6:15 to 6:30 today - I thought I had posted an update here before I started but somehow it did not show. If we do any more, I'll try and make it a little earlier.
8 Oct 05:43:11
Doing work a little earlier today. We don't believe we caused any blips with today's testing.
9 Oct 05:47:53
Another early start and went very well.
10 Oct 08:22:53
We updated remaining core routers this morning, and it seemed to go very well. Indeed pings we ran had zero loss when upgrading routers in Telecity. However, we did lose TalkTalk broadband lines in the process. These all reconnected straight away, but we are no reviewing how this happens to try and avoid it in future.
Resolution Closing this PEW from last week. We may need to do more work at some point, but we are getting quite good at this now.
Started 7 Oct 06:00:00
Closed 15 Oct 17:14:18
Previously expected 14 Oct 07:00:00

5 Oct 07:26:50
3 Oct 10:41:59
We do plan to upgrade routers again over the weekend, probably early saturday morning (before 9am). I'll post on irc at the time and update this notice.

The work this week means we expect this to be totally seamless, but the only way to actually be sure is to try it.

If we still see any issues we'll do more on Sunday.

4 Oct 06:54:19
Upgrades starting shortly.
4 Oct 07:24:47
Almost perfect!

We loaded four routers, each at different points in the network. We ran a ping that went through all four routers whilst doing this. For three of them we did see ping drop a packet. The fourth we did not see a drop at all.

This may sound good, but it should be better - we should not lose a single packet doing this. We're looking at the logs to work out why, and may try again Sunday morning.

Thank you for your patience.

4 Oct 07:53:52
Plan for tomorrow is to pick one of the routers that did drop a ping, and shut it down and hold it without restarting - at that point we can investigate what is still routing via it and why. This should help us explain the dropped ping. Assuming that provides the clues we need we may load or reconfigure routers later on Sunday to fix it.
5 Oct 06:57:39
We are starting work shortly.
5 Oct 07:11:00
We are doing the upgrades as planned, but not able to do the level of additional diagnostics we wanted. We may look in to that next weekend.
Resolution Only 3 routers were upgraded, the 3rd having several seconds of issues. We will investigate the logs and do another planned work. It seems early morning like this is less disruptive to customers.
Started 4 Oct
Closed 5 Oct 07:26:50
Previously expected 6 Oct

2 Oct 19:05:55
2 Oct 19:05:15
We'd like to thank customers for patience this week. The tests we have been doing in the evenings have been invaluable. The issues seen have mostly related to links to Maidenhead (so voice calls rather than broadband connections).

The work we are doing has involved a lot of testing "on the bench" and even in our offices (to the annoyance of staff) but ultimately testing on the live customer services is the final test. The results have been informative and we are very close to out goal now.

The goal is to allow router maintenance with zero packet loss. We finally have the last piece in the jigsaw for this, and so should have this in place soon. Even so, there may be some further work to achieve this.

Apart from a "Nice to have" goal, this also relates to failures of hardware, power cuts, and software crashes. The work is making the network configuration more robust and should allow for key component failures with outages as short as 300ms in some cases. LNS issues tend to take longer for PPP to reconnect, but we want to try and be as robust as possible.

So, once again, thank you all for your patience while we work on this. There may be some more planned works which really should now be invisible to customers.

Started 2 Oct 19:00:41

2 Oct 11:05:01
2 Oct 11:05:01
We're updating SSL certificates for our customer facing servers this morning (email, webmail). Users who don't have the CAcert root certificate installed may see errors. Details on http://aa.net.uk/cacert.html
Started 2 Oct 11:04:16

1 Oct 17:49:32
30 Sep 18:04:06
Having been very successful with the router upgrade tonight, we are looking to move to the next router on Wednesday. Signs so far are that this should be equally seamless. We are, however, taking this slowly, one step at a time, to be sure.
Resolution We loaded 4 routers in all, and some were almost seamless, and some had a few seconds of outage, it was not perfect but way better than previously. We are now going to look in to the logs in detail and try to understand what we do next.

Our goal here is zero packet loss for maintenance.

I'd like to thank all those on irc for their useful feedback during these test.

Started 1 Oct 17:00:00
Closed 1 Oct 17:49:32
Previously expected 1 Oct 18:00:00

30 Sep 18:02:25
29 Sep 21:57:11
We are going to spend much of tomorrow trying to track down why things did not go smoothly tonight, and hope to have a solution by tomorrow (Tuesday) evening.

This time I hope to make a test load before the peak period at 6pm, so between 5pm and 6pm when things are a bit of a lull between business and home use.

If all goes to plan there will be NO impact at all, and that is what we hope. If so we will update three routers with increasing risk of impact, and abort if there are any issues.

Please follow things on irc tomorrow.

If this works as planned we will finally have all routers under "seamless upgrade" processes.

30 Sep 08:29:42
Tests on our internal systems this morning confirm we understand what went wrong last night, and as such the upgrade tonight should be seamless.

For the technically minded, we had an issue with VRRP becoming master too soon, i.e. before all routes are installed. The routing logic is now linked to VRRP to avoid this scenario, regarless of how long routing takes.

Resolution The upgrade went very nearly perfectly on the first router - we believe the only noticeable impact was the link to our office, which we think we understand now. However, we did only do the one router this time.
Started 30 Sep 17:00:00
Closed 30 Sep 18:02:25
Previously expected 30 Sep 18:00:00

29 Sep 22:37:36
21 Aug 12:50:32
Over the past week or so we have been missing data on some monitoring graphs, this is shown as purple for the first hour in the morning. This is being caused by delays in collecting the data. This is being looked in to.
Resolution We believe this has been fixed now. We have been monitoring it for a fortnight after making an initial fix, and it looks to have been successful.
Closed 29 Sep 22:37:36

29 Sep 19:29:19
29 Sep 14:06:12
We expect to reload a router this evening, which is likely to cause a few seconds of routing issues. This is part of trying to address the blips caused by router upgrades, which are meant to be seamless.
29 Sep 18:48:37
The reload is expected shortly, and will be on two boxes at least. We are monitoring the effect of the changes we have made. They should be a big improvement.
Resolution Upgrade was tested only on one router (Maidenhead) and caused some nasty impact on routing to call servers and control systems - general DSL was unaffected. Changes are backed out now, and back to drawing board. Further PEW will be announced as necessary.
Started 29 Sep 17:00:00
Closed 29 Sep 19:29:19
Previously expected 29 Sep 23:00:00

29 Sep 13:17:50
29 Sep 08:48:37
Some updates to the billing system have caused a problem for units billed customers resulting in their usage for next month starting early, i.e. usage is now being logged for October.

Because of the way usage carriers forward, this is unlikey to have much impact on customer in terms of additional charges. However, any customers that think they have lost out, please let us know and we'll make a manual adjustment.

The problem has been corrected for next month.

29 Sep 08:57:00
It looks like customers won't get billed top-up and may not get billed units either, so we are working on un-doing this issue so that billing is done normally. Please bear with us.
29 Sep 09:23:40
We are working on this now and should have usage billing back to normal later this morning.
Resolution Usage billing has been restored to around 1am Saturday, giving customers 2.5 days of unmetered usage.
Started 29 Sep 08:45:12
Closed 29 Sep 13:17:50

28 Sep 19:20:54
28 Sep 18:52:50
We are experiencing a network problem affecting our broadband customers. Staff are investigating.
28 Sep 19:08:28
This is looking like some sort of Denial of Service attack. We're lookig at mitigating this.
28 Sep 19:16:36
The traffic has died down, things are starting to look better.
28 Sep 19:21:46
Traffic is now back to normal.
Started 28 Sep 18:30:00
Closed 28 Sep 19:20:54

20 Sep 07:09:09
20 Sep 11:59:13
RADIUS account is behind at the moment. This is causing the usage data to appear as missing from customer lines. The accounting is behind, but it's not broken, and is catching up. The usage data doesn't appear to be lost, and should appear later in the day.
21 Sep 08:12:52
Records have now caught up.
Closed 20 Sep 07:09:09
Previously expected 20 Sep 15:57:11

25 Sep 12:07:57
25 Sep 11:48:00
We are investigating a network problem affecting our offices.
25 Sep 11:51:31
This is affecting our telephones.
25 Sep 12:08:45
The office is back online. We had lost IPv4, we're looking in to the cause of this.
Started 25 Sep 11:47:00
Closed 25 Sep 12:07:57

25 Sep 22:25:27
17 Sep 11:52:39
We will be performing some minor maintenance on our POP3 and IMAP servers from 10PM on 25th September 2014. Part of this work will involve a reboot of the servers. This will mean that access to email will be unavailable for about 15 minutes. This status post will be updated during the maintenance.
25 Sep 22:00:06
This work has started.
Resolution This work has been completed.
Started 17 Sep 12:00:00 by AAISP Staff
Closed 25 Sep 22:25:27
Previously expected 25 Sep 22:00:00

29 Aug 09:00:00
29 Aug 15:39:43
We have had a slight issue with on of our routers which has caused a few seconds of routing blips to some destinations, on a couple of occasions. We're working on this now.
Started 28 Aug
Closed 29 Aug 09:00:00

26 Aug 09:15:00
26 Aug 09:02:02
Yesterday's and today's line graphs are not being shown at the moment. We are working on restoring this.
26 Aug 09:42:18
Today's graphs are back, yesterdays are lost though.
Started 26 Aug 08:00:00
Closed 26 Aug 09:15:00

29 Sep 16:57:23
2 Sep 17:15:50
We had a blip on one of the LNSs yesterday, so we are looking to roll out some updates over this week which should help address this, and some of the other issues last month. As usual LNS upgrades would be over night. We'll be rolling out to some of the other routers first, which may mean a few seconds of routing changes.
7 Sep 09:43:40
Upgrades are going well, but we are taking this slowly, and have not touched the LNSs yet. Addressing stability issues is always tricky as it can be weeks or months before we know we have actually fixed the problems. So far we have managed to identify some specific issues that we have been able to fix. We obviously have to be very careful to ensure these "fixes" do not impact normal service in any way. As such I have extended this PEW another week.
13 Sep 11:07:13
We are making significant progress on this. Two upgrades are expected today (Saturday 13th) which should not have any impact. We are also working on ways to make upgrades properly seamless (which is often the case, but not always).
14 Sep 17:21:35
Over the weekend we have done a number of tests, and we have managed to identify specific issues and put fixes in place on some of the routers on the network to see how they go.

This did lead to some blips (around 9am and 5pm on Sunday for example). We think we have a clearer idea on what happened with these too, and so we expect that we will load some new code early tomorrow or late tonight which may mean another brief blip. This should allow us to be much more seamless in future.

Later in the week we expect to roll out code to more routers.

16 Sep 16:57:07
We really think we have this sussed now - including reloads that have near zero impact on customers. We have a couple more loads to do this week (including one at 5pm today), and some over night rolling LNS updates.
17 Sep 12:23:59
The new release is now out, and we are planning upgrades this evening (from 5pm) and one of the LNSs over night. This should be pretty seamless now. At the end of the month we'll upgrade the second half of the core routers, assuming all goes well. Thank you for your patience.
18 Sep 17:15:27
FYI, there were a couple of issues with core routers today, at least one of which would have impacted internet routing for some destinations for several seconds. These issues were on the routers which have not yet been upgraded, which is rather encouraging. We are, of course, monitoring the situatuion carefully. The plan is still to upgrade the second half of the routers at the end of the month.
19 Sep 12:12:42
One of our LNS's (d.gormless) did restart unexpectedly this morning - this router is scheduled to be upgraded tonight.
28 Sep 13:25:10
The new release has been very stable for the last week and is being upgraded on remaining routers during Sunday.
Resolution Stable releases loaded at weekend
Started 2 Sep 18:00:00
Closed 29 Sep 16:57:23
Previously expected 19 Sep

2 Sep 17:08:13
2 Sep 15:38:09
Some people use the test LNS (doubtless) for various reasons, and it is also used some of the time for our NAT64 gateway.

We normally do re-loads on doubtless to test things with no notice, but we expect there may be quite a few this afternoon/evening as we are trying to track down an issue with new code that is not showing on the bench test systems.

As usual this is a PPP reset and reconnect and if it crashes may be a few seconds extra outage. With any luck this will not take many resets to find the issue.

Resolution Testing went well.
Started 2 Sep 15:40:00
Closed 2 Sep 17:08:13
Previously expected 3 Sep

1 Sep 19:42:08
1 Sep 19:42:56
c.gormless rebooted, lines moved to other LNS automatically. We are investigating.
Broadband Users Affected 33%
Started 1 Sep 19:39:19
Closed 1 Sep 19:42:08

1 Sep 10:50:07
1 Sep 10:49:41
In an effort to make the billing for VoIP easier to read by nicely formatting the phone numbers, we managed to make SIM billing show the ICCID on the bill as "(null)".

Sorry about that - if anyone needs the billing re-done so you know which SIMs used which data, please let accounts know.

Started 1 Sep

26 Aug 10:08:21
26 Aug 10:08:21
We now support Sieve Filters on our mail servers. In short, much like the filters feature on many email programs this enables customers to set up filters on the server side to move email in to folders.
More information on: http://wiki.aa.org.uk/Sieve_Filtering
Started 26 Aug 10:00:00

23 Apr 10:21:03
01 Nov 2013 15:05:00
We have identified an issue that appears to be affecting some customers with FTTC modems. The issue is stupidly complex, and we are still trying to pin down the exact details. The symptoms appear to be that some packets are not passing correctly, some of the time.

Unfortunately one of the types of packet that refuses to pass correctly are FireBrick FB105 tunnel packets. This means customers relying on FB105 tunnels over FTTC are seeing issues.

The work around is to remove the ethernet lead to the modem and then reconnect it. This seems to fix the issue, at least until the next PPP restart. If you have remote access to a FireBrick, e.g. via WAN IP, and need to do this you can change the Ethernet port settings to force it to re-negotiate, and this has the same effect - this only works if directly connected to the FTTC modem as the fix does need the modem Ethernet to restart.

We are asking BT about this, and we are currently assuming this is a firmware issue on the BT FTTC modems.

We have confirmed that modems re-flashed with non-BT firmware do not have the same problem, though we don't usually recommend doing this as it is a BT modem and part of the service.

04 Nov 2013 16:52:49
We have been working on getting more specific information regarding this, we hope to post an update tomorrow.
05 Nov 2013 09:34:14
We have reproduced this problem by sending UDP packets using 'Scapy'. We are doing further testing today, and hope to write up a more detailed report about what we are seeing and what we have tested.
05 Nov 2013 14:27:26
We have some quite good demonstrations of the problem now, and it looks like it will mess up most VPNs based on UDP. We can show how a whole range of UDP ports can be blacklisted by the modem somehow on the next PPP restart. It is crazy. We hope to post a little video of our testing shortly.
05 Nov 2013 15:08:16
Here is an update/overview of the situation. (from http://revk.www.me.uk/2013/11/bt-huawei-fttc-modem-bug-breaking-vpns.html )

We have confirmed that the latest code in the BT FTTC modems appears to have a serious bug that is affecting almost anyone running any sort of VPN over FTTC.

Existing modems seem to be upgrading, presumably due to a roll out of new code in BT. An older modem that has not been on-line a while is fine. A re-flashed modem with non-BT firmware is fine. A working modem on the line for a while suddenly stopped working, presumably upgraded.

The bug appears to be that the modem manages to "blacklist" some UDP packets after a PPP restart.

If we send a number of UDP packets, using various UDP ports, then cause PPP to drop and reconnect, we then find that around 254 combinations of UDP IP/ports are now blacklisted. I.e. they no longer get sent on the line. Other packets are fine.

Sending 500 different packets, around 254 of them will not work again after the PPP restart. It is not actually the first or last 254 packets, some in the middle, but it seems to be 254 combinations. They work as much as you like before the PPP restart, and then never work after it.

We can send a batch of packets, wait 5 minutes, PPP restart, and still find that packets are now blacklisted. We have tried a wide range of ports, high and low, different src and dst ports, and so on - they are all affected.

The only way to "fix" it, is to disconnect the Ethernet port on the modem and reconnect. This does not even have to be long enough to drop PPP. Then it is fine until the next PPP restart. And yes, we have been running a load of scripts to systematically test this and reproduce the fault.

The problem is that a lot of VPNs use UDP and use the same set of ports for all of the packets, so if that combination is blacklisted by the modem the VPN stops after a PPP restart. The only way to fix it is manual intervention.

The modem is meant to be an Ethernet bridge. It should not know anything about PPP restarting or UDP packets and ports. It makes no sense that it would do this. We have tested swapping working and broken modems back and forth. We have tested with a variety of different equipment doing PPPoE and IP behind the modem.

BT are working on this, but it is a serious concern that this is being rolled out.
12 Nov 2013 10:20:18
Work on this in still ongoing... We have tested this on a standard BT retail FTTC 'Infinity' line, and the problem cannot be reproduced. We suspect this is because when the PPP re-establishes a different IP address is allocated each time, and whatever is session tracking does not match the new connection.
12 Nov 2013 11:08:17

Here is an update with some a more specific explanation as to what the problem we are seeing is:

On WBC FTTC, we can send a UDP packet inside the PPP and then drop the PPP a few seconds later. After the PPP re-establishes, UDP packets with the same source and destination IP and ports won't pass; they do not reach the LNS at the ISP.

Further to that, it's not just one src+dst IP and port tuple which is affected. We can send 254 UDP packets using different src+dest ports before we drop the PPP. After it comes back up, all 254 port combinations will fail. It is worth noting here that this cannot be reproduced on an FTTC service which allocates a dynamic IP which changes each time PPP re-established.

If we send more than 254 packets, only 254 will be broken and the others will work. It's not always the first 254 or last 254, the broken ones move around between tests.

So it sounds like the modem (or, less likely, something in the cab or exchange) is creating state table entries for packets it is passing which tie them to a particular PPP session, and then failing to flush the table when the PPP goes down.

This is a little crazy in the first place. It's a modem. It shouldn't even be aware that it's passing PPPoE frames, let along looking inside them to see that they are UDP.

This only happens when using an Openreach Huawei HG612 modem that we suspect has been recently remotely and automatically upgraded by Openreach in the past couple of months. Further - a HG612 modem with the 'unlocked' firmware does not have this problem. A HG612 modem that has probably not been automatically/remotely upgraded does not have this problem.

Side note: One theory is that the brokenness is actually happening in the street cab and not the modem. And that the new firmware in the modem which is triggering it has enabled 'link-state forwarding' on the modem's Ethernet interface.

27 Nov 2013 10:09:42
This post has been a little quiet, but we are still working with BT/Openreach regarding this issue. We hope to have some more information to post in the next day or two.
27 Nov 2013 10:10:13
We have also had reports from someone outside of AAISP reproducing this problem.
27 Nov 2013 14:19:19
We have spent the morning with some nice chaps from Openreach and Huawei. We have demonstrated the problem and they were able to do traffic captures at various points on their side. Huawei HQ can now reproduce the problem and will investigate the problem further.
28 Nov 2013 10:39:36
Adrian has posted about this on his blog: http://revk.www.me.uk/2013/11/bt-huawei-working-with-us.html
13 Jan 14:09:08
We are still chasing this with BT.
3 Apr 15:47:59
We have seen this affect SIP registrations (which use 5060 as the source and target)... Customers can contact us and we'll arrange a modem swap.
23 Apr 10:21:03
BT are in the process of testing an updated firmware for the modems with customers. Any customers affected by this can contact us and we can arrange a new modem to be sent out.
Resolution BT are testing a fix in the lab and will deploy in due course, but this could take months. However, if any customers are adversely affected by this bug, please let us know and we can arrange for BT to send a replacement ECI modem instead of the Huawei modem. Thank you all for your patience.

BT do have a new firmware that they are rolling out to the modems. So far it does seem to have fixed the fault and we have not heard of any other issues as of yet. If you do still have the issue, please reboot your modem, if the problem remains, please contact support@aa.net.uk and we will try and get the firmware rolled out to you.
Started 25 Oct 2013
Closed 23 Apr 10:21:03

25 Aug 23:49:30
25 Aug 22:15:51
We are seeing what looks to be routing problems within our network with traffic to/from our Maidenhead datacentre. Routes seem to be flapping and disrupting connectivity with increased latency and packet loss. This would be affecting Ethernet services from Maidenhead as well as customers accessing web and email services that we host in Maidenhead. Customers are also reporting DNS problems.
25 Aug 22:19:09
Engineers are investigating...
25 Aug 23:33:53
Staff are still working on this. The cause of the problem has been identified and is being worked on.
25 Aug 23:50:13
The problem has been resolved, traffic is now back to normal, we apologise for this inconvenience.
Started 25 Aug 21:45:00
Closed 25 Aug 23:49:30

22 Aug 12:17:37
22 Aug 11:56:10
We have added a new section clarifying engineer visits and missed appointments. The confirms the "point of no return" for rearranging appointments, and clarifies compensation either way when an appointment is missed.

We have also added two additional reasons for charging an admin fee (£5+VAT). We hope you think these are reasonable. It is a bit of a shame that such things are necessary. We think it is not fair for such costs to be part of our overheads and so affect the price for everyone else who is being reasonable.

1. If you send us a bogus invoice which we validly reject (e.g. trying to invoice us for a delayed install when we do not guarantee install dates). Also for each further exchange of correspondence on such invoices.

2. If you attempt to take us to ADR when you are not entitled to (e.g. if you have not followed our complaints procedures, or you are a company of more than 10 staff, or you are, or have said you are, a communications provider). We will also charge any fees we end up paying as a result of such an attempt if accepted by the ADR provider.

Any questions, please let us know.

Started 22 Aug

19 Aug 12:59:53
19 Aug 00:36:05
Initial reports suggest one of our fibre links to TalkTalk is down. This is affecting broadband lines using TalkTalk backhaul.
19 Aug 00:43:35
00:05 TT Lines drop, looked like we had a router blip and a TT fibre blip - reasons yet unknown
00:15 Lines start to log back in
However, we are getting reports in intermittent access to some sites on internet - possible MTU related.
19 Aug 01:33:16
MTU is still a problem. A workaround for the moment, is to lower the MTU setting in your router to 1432. Ideally this should not be needed, but try this until the problem is resolved.
19 Aug 01:58:30
Other wholesalers using TT are reporting the same problem. TT helpdesk is aware of planned work that may be causing this. We have requested that that pass this MTU report on to the team involved in the planned work.
19 Aug 07:14:05
TT tell us they think the problem with MTU has been fixed. We're still unsure at this moment, and will work with customers who still have problems.
19 Aug 07:55:02
This is still a problem affecting customers using TT backhaul. TT are aware and are investigating. This is a result of a router upgrade within TT which looks to have been given incorrect settings.
Where possible, customers can change the MTU on their routers to be 1432
19 Aug 08:55:47
We have been in contact with the TT Service Director who will be chasing this up internally at TT.
19 Aug 09:05:48
Customers with bonded lines using TT and BT can turn off their TT modem or router for the time being.
19 Aug 09:20:11
We are looking at re-routing TT connections through our secondary connection to TT...
19 Aug 09:30:55
Traffic is now routing via our secondary connection to TT, this looks like it is not being routed via the faulty TT router and it is looks as if lines are passing traffic as normal
19 Aug 09:55:32
Some customers are working OK, some are not.
The reason being is that we have 2 interconnects to TT. We are still seeing connections from both of them, however, we have a 1600 byte path from one but only 1500 from the other. The 1500 one is the one that TT did an upgrade on last night. So it looks like TT forgot to configure jumbo frames on an interface after the upgrade.
Needless to say, we've passed this information on to people at various levels within TT
19 Aug 09:57:02
We are working on only accepting connections from TT via the working interconnect.
19 Aug 10:39:32
We are forcing TT lines to reconnect, this should mean they then reconnect over the working interconnect and not the one with the faulty TT router.
19 Aug 11:21:53
We are blocking connections from the faulty TT router and only accepting from the working one. This means when customers connect they have a working connection. However, this does mean that logins are being rejected from customers until they are routed via the working interconnect. It may take a few attempts for customers to connect first time.
19 Aug 11:24:09
Some lines are taking a long time to come back. This is because they are still coming in via the broken interconnect - that we're rejecting. Unfortunately, affected lines just have to be left until they attempt to log in via the working interconnect. So, if we appear to be rejecting your login please leave your router to keep trying and it should fix itself.
19 Aug 11:32:11
TT are reverting their upgrade from last night. This looks like it's underway at the moment.
19 Aug 11:35:00
Latest from TT: "The roll back has been completed and the associated equipment has been restarted. Our (TT) engineers are currently performing system checks and a retest before confirming resolution on this incident. Further information will be provided shortly. "
19 Aug 11:43:32
TT have completed their downgrade. It looks like the faulty link is working OK again, we'll be testing this before we unblock the link our side.
19 Aug 13:01:55
We've re-enabled the faulty link, we are now back to normality! We do apologise for this outage. We will be discussing this fault and future upgrades of these TT routers with TT staff.
Started 19 Aug 00:05:00
Closed 19 Aug 12:59:53