Order posts by limited to posts

18 Aug 10:00:00
Details
18 Aug 10:48:39

Our legacy 'C' VoIP platform will be removed from service on March 2nd 2015.

This platform is now old, tired and we have a better VoIP platform: our FireBrick-based 'Voiceless' platform.

We have created a wiki page with details for customers needing to move platforms: http://wiki.aa.org.uk/VoIP_-_Moving_Platform

We will be contacting customers individually by email later in the year, but we'd recommend that customers start moving now. The wiki page above explains how to move, and in most cases it is simply changing the server details in your VoIP device. Please do contact Support for help though.

Started 18 Aug 10:00:00 by AAISP Staff
Update expected 02 Mar 2015 11:00:00
Expected close 02 Mar 2015 10:00:00

29 Jul 11:42:12
Details
17 Jul 10:08:44
Our email services can learn spam/non-spam messages. This feature is currently down for maintenance as we work on the back-end systems. This means that if you move email in to the various 'learn' folders they will stay there and will not be processed at the moment. For the moment, we advise customers not to use this feature. Will will post updates in the next week or so as we may well be changing how this feature works. This should not affect any spam scores etc, but do contact support if needed.
Update
29 Jul 11:42:12
This project is still ongoing. This should not be causing too many problems though, as the spam checking system has many many other ways to determine if a message is spam or not. However, for now, if customers have email that is miss-classified by the spam checking system then please email the headers in to support and we can make some suggestions.
Started 17 Jul 10:00:00

3 Jun 17:00:00
Details
3 Jun 18:20:39
The router upgrades went well, and now there is a new factory release we'll be doing some rolling upgrades over the next few days. Should be minimal disruption.
Update
3 Jun 18:47:21
First batch of updates done.
Started 3 Jun 17:00:00
Previously expected 7 Jun

23 May 12:00:00
Details
23 May 12:07:01

Our legacy 'A' VoIP platform will be removed from service on November 10th 2014.

This platform is our original Asterisk based system; it is now old, tired and we have a better VoIP platform: our FireBrick based 'Voiceless' platform.

We have created a wiki page with details for customers needing to move platforms: http://wiki.aa.org.uk/VoIP_-_Moving_Platform

We will be contacting customers individually by email in the coming weeks.

Our Asterisk platform has historically been used by customers using IAX or who have phones behind NAT. Our current 'Voiceless' platform should work just as well for phones behind NAT. This does mean that we will no longer be providing a IAX service. The wiki pages have details on configuring asterisk to use SIP instead.

The feature set on Voiceless is the same (if not better) than on 'A' (apart from the IAX support).

Please see the wiki page for more information: http://wiki.aa.org.uk/VoIP_-_Moving_Platform

Update
19 Aug 13:42:19
We are starting to email customers about this work this week.
Started 23 May 12:00:00 by AAISP Staff
Expected close 10 Nov 10:00:00

14 Apr
Details
13 Apr 17:29:53
We handle SMS, both outgoing from customers, and incoming via various carriers, and we are now linking in once again to SMS with mobile voice SIM cards. The original code for this is getting a tad worn out, so we are working on a new system. It will have ingress gateways for the various ways SMS can arrive at us, core SMS routing, and then output gateways for the ways we can send on SMS. The plan is to convert all SMS to/from standard GSM 03.40 TPDUs. This is a tad technical I know, but it will mean that we have a common format internally. This will not be easy as there are a lot of character set conversion issues, and multiple TPDUs where concatenation of texts is used. The upshot for us is a more consistent and maintainable platform. The benefit for customers is more ways to submit and receive text messages, including using 17094009 to make an ETSI in-band modem text call from suitable equipment (we think gigasets do this). It also means customers will be able to send/receive texts in a raw GSM 03.40 TPDU format, which will be of use to some customers. It also makes it easier for us to add other formats later. There will be some changes to the existing interfaces over time, but we want to keep these to a minimum, obviously.
Update
21 Apr 16:27:23

Work is going well on this, and we hope to switch Mobile Originated texting (i.e. texts from the SIP2SIM) over to the new system this week. If that goes to plan we can move some of the other ingress texting over to the new system one by one.

We'll be updating documentation at the same time.

The new system should be a lot more maintainable. We have a number of open tickets with the mobile carrier and other operators to try and improve the functionality of texting to/from us. These cover things like correct handling of multi-part texts, and correct character set coding.

The plan is ultimately to have full UTF-8 unicode support on all texts, but that could take a while. It seems telcos like to mess with things rather than giving us a clean GSM TPDU for texts. All good fun.

Update
22 Apr 08:51:09
We have updated the web site documentation on this to the new system, but this is not fully in use yet. Hopefully this week we have it all switched over. Right now we have removed some features from documenation (such as delivery reports), but we plan to have these re-instated soon once we have the new system handling them sensibly.
Update
22 Apr 09:50:44
MO texts from SIP2SIM are now using the new system - please let support know of any issues.
Update
22 Apr 12:32:07
Texts from Three are now working to ALL of our 01, 02, and 03 numbers. These are delivered by email, http, or direct to SIP2SIM depending on the configuration on our control pages.
Update
23 Apr 09:23:20
We have switched over one of our incoming SMS gateways to the new system now. So most messages coming from outside will use this. Any issues, please let support know ASAP.
Update
25 Apr 10:29:50
We are currently running all SMS via the new platform - we expect there to be more work still to be done, but it should be operating as per the current documentation now. Please let support know of any issues.
Update
26 Apr 13:27:37
We have switched the DNS to point SMS to the new servers running the new system. Any issues, please let support know.
Started 14 Apr
Previously expected 1 May

11 Apr 15:50:28
Details
11 Apr 15:53:42
There is a problem with the C server and it needs to be restarted again after the maintenance yesterday evening. We are going to do this at 17:00 as we need it to be done as soon as possible. Sorry for the short notice.
Started 11 Apr 15:50:28

7 Apr 13:45:09
Details
7 Apr 13:52:31
We will be carrying out some maintenance on our 'C' SIP server outside office hours. It will cause disruption to calls, but is likely only to last a couple of minutes and will only affect calls on the A and C servers. It will not affect calls on our "voiceless" SIP platform or SIP2SIM. We will do this on Thursday evening at around 22:30. Please contact support if you have any questions.
Update
10 Apr 23:19:59
Completed earlier this evening.
Started 7 Apr 13:45:09
Previously expected 10 Apr 22:45:00

25 Sep 2013
Details
18 Sep 2013 16:32:41
We have received notification that Three's network team will be carrying out maintenance on one of the nodes that routes our data SIM traffic between 00:00 and 06:00 on Weds 25th September. Some customers may notice a momentary drop in connections during this time as any SIMs using that route will disconnect when the link is shut down. Any affected SIMs will automatically take an alternate route when they try and reconnect. Unfortunately, we have no control over the timing of this as it is dependent on the retry strategy of your devices. During the window, the affected node will be offline therefore SIM connectivity should be considered at risk throughout.
Started 25 Sep 2013

Today 14:11:34
Details
Today 14:10:19
We're seeing congestion from 10am up to 11:30pm across the BT Rose Street exchange at the moment. A fault has been raised with BT and we're awaiting an update. Thanks for your patience.
Started Today 13:31:50
Expected close Tomorrow 17:31:50

17 Oct 16:43:25
Details
17 Oct 16:43:25
Planning to do some work early Saturday and possible Sunday, before 8am. It should be virtually no disruption though. This is on the Maidenhead routers so any impact would be VoIP, colocation and Ethernet links from there. We have some slight tweaks to apply.
Started 18 Oct
Previously expected Yesterday

15 Oct 17:14:55
[Control Pages] - New Usage Graph - Info
Details
We have added a graph to the usage section of the Control Pages. This will show upload and download usage over the past year. We welcome any feedback!
Started 30 Sep 15:00:00
Closed 15 Oct 17:14:55

7 Oct 06:17:13
Details
3 Oct 16:25:24
As we advised, we have had to make some radical changes to our billing to fix database load issues. These have gone quite well overall, but there have been a few snags. We think we have them all now, but this month we had to revert some usage charging giving some free usage.

We have identified that quarterly billed customers on units tariffs were not charged, so these are being applied shortly as a new invoice. Anyone with excess usage as a result, please do ask accounts for a credit.

We have also identified that call charges have not been billed - these can be billed to date if anyone asks, or if you leave it then they should finally catch up on next month's bill.

Sorry for any inconvenience.

Started 1 Oct
Expected close 1 Nov

15 Oct 17:14:18
Details
6 Oct 14:22:50
For the next week or so we're considering 5am-7am to be a PEW window for some very low disruption work (a few seconds of "blip"). We're still trying very hard to improve our network configuration and router code to create a much more stable network. It seems, from recent experience, that this sort of window will be least disruptive to customers. It is a time where issues can be resolve by staff if needed (which is harder at times like 3am) and we get more feedback from end users. As before, we expect this work to have no impact in most cases, and maybe a couple of seconds of routing issues if it is not quite to plan. Sadly, all of our efforts to create the same test scenarios "on the bench" have not worked well. At this stage we are reviewing code to understand Sunday morning's work better, and this may take some time before we start. We'll update here and on irc before work is done. Thank you for your patience.
Update
7 Oct 09:06:41
We did do work around 6:15 to 6:30 today - I thought I had posted an update here before I started but somehow it did not show. If we do any more, I'll try and make it a little earlier.
Update
8 Oct 05:43:11
Doing work a little earlier today. We don't believe we caused any blips with today's testing.
Update
9 Oct 05:47:53
Another early start and went very well.
Update
10 Oct 08:22:53
We updated remaining core routers this morning, and it seemed to go very well. Indeed pings we ran had zero loss when upgrading routers in Telecity. However, we did lose TalkTalk broadband lines in the process. These all reconnected straight away, but we are no reviewing how this happens to try and avoid it in future.
Resolution Closing this PEW from last week. We may need to do more work at some point, but we are getting quite good at this now.
Started 7 Oct 06:00:00
Closed 15 Oct 17:14:18
Previously expected 14 Oct 07:00:00

5 Oct 07:26:50
Details
3 Oct 10:41:59
We do plan to upgrade routers again over the weekend, probably early saturday morning (before 9am). I'll post on irc at the time and update this notice.

The work this week means we expect this to be totally seamless, but the only way to actually be sure is to try it.

If we still see any issues we'll do more on Sunday.

Update
4 Oct 06:54:19
Upgrades starting shortly.
Update
4 Oct 07:24:47
Almost perfect!

We loaded four routers, each at different points in the network. We ran a ping that went through all four routers whilst doing this. For three of them we did see ping drop a packet. The fourth we did not see a drop at all.

This may sound good, but it should be better - we should not lose a single packet doing this. We're looking at the logs to work out why, and may try again Sunday morning.

Thank you for your patience.

Update
4 Oct 07:53:52
Plan for tomorrow is to pick one of the routers that did drop a ping, and shut it down and hold it without restarting - at that point we can investigate what is still routing via it and why. This should help us explain the dropped ping. Assuming that provides the clues we need we may load or reconfigure routers later on Sunday to fix it.
Update
5 Oct 06:57:39
We are starting work shortly.
Update
5 Oct 07:11:00
We are doing the upgrades as planned, but not able to do the level of additional diagnostics we wanted. We may look in to that next weekend.
Resolution Only 3 routers were upgraded, the 3rd having several seconds of issues. We will investigate the logs and do another planned work. It seems early morning like this is less disruptive to customers.
Started 4 Oct
Closed 5 Oct 07:26:50
Previously expected 6 Oct

2 Oct 19:05:55
Details
2 Oct 19:05:15
We'd like to thank customers for patience this week. The tests we have been doing in the evenings have been invaluable. The issues seen have mostly related to links to Maidenhead (so voice calls rather than broadband connections).

The work we are doing has involved a lot of testing "on the bench" and even in our offices (to the annoyance of staff) but ultimately testing on the live customer services is the final test. The results have been informative and we are very close to out goal now.

The goal is to allow router maintenance with zero packet loss. We finally have the last piece in the jigsaw for this, and so should have this in place soon. Even so, there may be some further work to achieve this.

Apart from a "Nice to have" goal, this also relates to failures of hardware, power cuts, and software crashes. The work is making the network configuration more robust and should allow for key component failures with outages as short as 300ms in some cases. LNS issues tend to take longer for PPP to reconnect, but we want to try and be as robust as possible.

So, once again, thank you all for your patience while we work on this. There may be some more planned works which really should now be invisible to customers.

Started 2 Oct 19:00:41

2 Oct 11:05:01
Details
2 Oct 11:05:01
We're updating SSL certificates for our customer facing servers this morning (email, webmail). Users who don't have the CAcert root certificate installed may see errors. Details on http://aa.net.uk/cacert.html
Started 2 Oct 11:04:16

1 Oct 17:49:32
Details
30 Sep 18:04:06
Having been very successful with the router upgrade tonight, we are looking to move to the next router on Wednesday. Signs so far are that this should be equally seamless. We are, however, taking this slowly, one step at a time, to be sure.
Resolution We loaded 4 routers in all, and some were almost seamless, and some had a few seconds of outage, it was not perfect but way better than previously. We are now going to look in to the logs in detail and try to understand what we do next.

Our goal here is zero packet loss for maintenance.

I'd like to thank all those on irc for their useful feedback during these test.

Started 1 Oct 17:00:00
Closed 1 Oct 17:49:32
Previously expected 1 Oct 18:00:00

30 Sep 18:02:25
Details
29 Sep 21:57:11
We are going to spend much of tomorrow trying to track down why things did not go smoothly tonight, and hope to have a solution by tomorrow (Tuesday) evening.

This time I hope to make a test load before the peak period at 6pm, so between 5pm and 6pm when things are a bit of a lull between business and home use.

If all goes to plan there will be NO impact at all, and that is what we hope. If so we will update three routers with increasing risk of impact, and abort if there are any issues.

Please follow things on irc tomorrow.

If this works as planned we will finally have all routers under "seamless upgrade" processes.

Update
30 Sep 08:29:42
Tests on our internal systems this morning confirm we understand what went wrong last night, and as such the upgrade tonight should be seamless.

For the technically minded, we had an issue with VRRP becoming master too soon, i.e. before all routes are installed. The routing logic is now linked to VRRP to avoid this scenario, regarless of how long routing takes.

Resolution The upgrade went very nearly perfectly on the first router - we believe the only noticeable impact was the link to our office, which we think we understand now. However, we did only do the one router this time.
Started 30 Sep 17:00:00
Closed 30 Sep 18:02:25
Previously expected 30 Sep 18:00:00

29 Sep 22:37:36
Details
21 Aug 12:50:32
Over the past week or so we have been missing data on some monitoring graphs, this is shown as purple for the first hour in the morning. This is being caused by delays in collecting the data. This is being looked in to.
Resolution We believe this has been fixed now. We have been monitoring it for a fortnight after making an initial fix, and it looks to have been successful.
Closed 29 Sep 22:37:36

29 Sep 19:29:19
Details
29 Sep 14:06:12
We expect to reload a router this evening, which is likely to cause a few seconds of routing issues. This is part of trying to address the blips caused by router upgrades, which are meant to be seamless.
Update
29 Sep 18:48:37
The reload is expected shortly, and will be on two boxes at least. We are monitoring the effect of the changes we have made. They should be a big improvement.
Resolution Upgrade was tested only on one router (Maidenhead) and caused some nasty impact on routing to call servers and control systems - general DSL was unaffected. Changes are backed out now, and back to drawing board. Further PEW will be announced as necessary.
Started 29 Sep 17:00:00
Closed 29 Sep 19:29:19
Previously expected 29 Sep 23:00:00

29 Sep 13:17:50
Details
29 Sep 08:48:37
Some updates to the billing system have caused a problem for units billed customers resulting in their usage for next month starting early, i.e. usage is now being logged for October.

Because of the way usage carriers forward, this is unlikey to have much impact on customer in terms of additional charges. However, any customers that think they have lost out, please let us know and we'll make a manual adjustment.

The problem has been corrected for next month.

Update
29 Sep 08:57:00
It looks like customers won't get billed top-up and may not get billed units either, so we are working on un-doing this issue so that billing is done normally. Please bear with us.
Update
29 Sep 09:23:40
We are working on this now and should have usage billing back to normal later this morning.
Resolution Usage billing has been restored to around 1am Saturday, giving customers 2.5 days of unmetered usage.
Started 29 Sep 08:45:12
Closed 29 Sep 13:17:50

28 Sep 19:20:54
Details
28 Sep 18:52:50
We are experiencing a network problem affecting our broadband customers. Staff are investigating.
Update
28 Sep 19:08:28
This is looking like some sort of Denial of Service attack. We're lookig at mitigating this.
Update
28 Sep 19:16:36
The traffic has died down, things are starting to look better.
Update
28 Sep 19:21:46
Traffic is now back to normal.
Started 28 Sep 18:30:00
Closed 28 Sep 19:20:54

20 Sep 07:09:09
Details
20 Sep 11:59:13
RADIUS account is behind at the moment. This is causing the usage data to appear as missing from customer lines. The accounting is behind, but it's not broken, and is catching up. The usage data doesn't appear to be lost, and should appear later in the day.
Update
21 Sep 08:12:52
Records have now caught up.
Closed 20 Sep 07:09:09
Previously expected 20 Sep 15:57:11

25 Sep 12:07:57
Details
25 Sep 11:48:00
We are investigating a network problem affecting our offices.
Update
25 Sep 11:51:31
This is affecting our telephones.
Update
25 Sep 12:08:45
The office is back online. We had lost IPv4, we're looking in to the cause of this.
Started 25 Sep 11:47:00
Closed 25 Sep 12:07:57

25 Sep 22:25:27
Details
17 Sep 11:52:39
We will be performing some minor maintenance on our POP3 and IMAP servers from 10PM on 25th September 2014. Part of this work will involve a reboot of the servers. This will mean that access to email will be unavailable for about 15 minutes. This status post will be updated during the maintenance.
Update
25 Sep 22:00:06
This work has started.
Resolution This work has been completed.
Started 17 Sep 12:00:00 by AAISP Staff
Closed 25 Sep 22:25:27
Previously expected 25 Sep 22:00:00

29 Aug 09:00:00
Details
29 Aug 15:39:43
We have had a slight issue with on of our routers which has caused a few seconds of routing blips to some destinations, on a couple of occasions. We're working on this now.
Started 28 Aug
Closed 29 Aug 09:00:00

26 Aug 09:15:00
Details
26 Aug 09:02:02
Yesterday's and today's line graphs are not being shown at the moment. We are working on restoring this.
Update
26 Aug 09:42:18
Today's graphs are back, yesterdays are lost though.
Started 26 Aug 08:00:00
Closed 26 Aug 09:15:00

29 Sep 16:57:23
Details
2 Sep 17:15:50
We had a blip on one of the LNSs yesterday, so we are looking to roll out some updates over this week which should help address this, and some of the other issues last month. As usual LNS upgrades would be over night. We'll be rolling out to some of the other routers first, which may mean a few seconds of routing changes.
Update
7 Sep 09:43:40
Upgrades are going well, but we are taking this slowly, and have not touched the LNSs yet. Addressing stability issues is always tricky as it can be weeks or months before we know we have actually fixed the problems. So far we have managed to identify some specific issues that we have been able to fix. We obviously have to be very careful to ensure these "fixes" do not impact normal service in any way. As such I have extended this PEW another week.
Update
13 Sep 11:07:13
We are making significant progress on this. Two upgrades are expected today (Saturday 13th) which should not have any impact. We are also working on ways to make upgrades properly seamless (which is often the case, but not always).
Update
14 Sep 17:21:35
Over the weekend we have done a number of tests, and we have managed to identify specific issues and put fixes in place on some of the routers on the network to see how they go.

This did lead to some blips (around 9am and 5pm on Sunday for example). We think we have a clearer idea on what happened with these too, and so we expect that we will load some new code early tomorrow or late tonight which may mean another brief blip. This should allow us to be much more seamless in future.

Later in the week we expect to roll out code to more routers.

Update
16 Sep 16:57:07
We really think we have this sussed now - including reloads that have near zero impact on customers. We have a couple more loads to do this week (including one at 5pm today), and some over night rolling LNS updates.
Update
17 Sep 12:23:59
The new release is now out, and we are planning upgrades this evening (from 5pm) and one of the LNSs over night. This should be pretty seamless now. At the end of the month we'll upgrade the second half of the core routers, assuming all goes well. Thank you for your patience.
Update
18 Sep 17:15:27
FYI, there were a couple of issues with core routers today, at least one of which would have impacted internet routing for some destinations for several seconds. These issues were on the routers which have not yet been upgraded, which is rather encouraging. We are, of course, monitoring the situatuion carefully. The plan is still to upgrade the second half of the routers at the end of the month.
Update
19 Sep 12:12:42
One of our LNS's (d.gormless) did restart unexpectedly this morning - this router is scheduled to be upgraded tonight.
Update
28 Sep 13:25:10
The new release has been very stable for the last week and is being upgraded on remaining routers during Sunday.
Resolution Stable releases loaded at weekend
Started 2 Sep 18:00:00
Closed 29 Sep 16:57:23
Previously expected 19 Sep

2 Sep 17:08:13
Details
2 Sep 15:38:09
Some people use the test LNS (doubtless) for various reasons, and it is also used some of the time for our NAT64 gateway.

We normally do re-loads on doubtless to test things with no notice, but we expect there may be quite a few this afternoon/evening as we are trying to track down an issue with new code that is not showing on the bench test systems.

As usual this is a PPP reset and reconnect and if it crashes may be a few seconds extra outage. With any luck this will not take many resets to find the issue.

Resolution Testing went well.
Started 2 Sep 15:40:00
Closed 2 Sep 17:08:13
Previously expected 3 Sep

1 Sep 19:42:08
Details
1 Sep 19:42:56
c.gormless rebooted, lines moved to other LNS automatically. We are investigating.
Broadband Users Affected 33%
Started 1 Sep 19:39:19
Closed 1 Sep 19:42:08

1 Sep 10:50:07
Details
1 Sep 10:49:41
In an effort to make the billing for VoIP easier to read by nicely formatting the phone numbers, we managed to make SIM billing show the ICCID on the bill as "(null)".

Sorry about that - if anyone needs the billing re-done so you know which SIMs used which data, please let accounts know.

Started 1 Sep

1 Sep 09:41:14
Details
1 Sep 09:40:46
Once again, the Direct Debits have not gone through on the 1st and so have caused an emailed notice for collection and hence they are going out on the 8th. Obviously they are going out on the date notified in the email, but I appreciate that a few extra days credit may be inconvenient for some people expecting the DD on the 1st. We are working on this. The problem is that the system has been desigend very "defensively" so that any doubt at all on the emailed advance notice will result in a new emailed 5 working days notice to be absolutely sure we are meeting the DD rules.
Started 1 Sep

26 Aug 10:08:21
Details
26 Aug 10:08:21
We now support Sieve Filters on our mail servers. In short, much like the filters feature on many email programs this enables customers to set up filters on the server side to move email in to folders.
More information on: http://wiki.aa.org.uk/Sieve_Filtering
Started 26 Aug 10:00:00

23 Apr 10:21:03
Details
01 Nov 2013 15:05:00
We have identified an issue that appears to be affecting some customers with FTTC modems. The issue is stupidly complex, and we are still trying to pin down the exact details. The symptoms appear to be that some packets are not passing correctly, some of the time.

Unfortunately one of the types of packet that refuses to pass correctly are FireBrick FB105 tunnel packets. This means customers relying on FB105 tunnels over FTTC are seeing issues.

The work around is to remove the ethernet lead to the modem and then reconnect it. This seems to fix the issue, at least until the next PPP restart. If you have remote access to a FireBrick, e.g. via WAN IP, and need to do this you can change the Ethernet port settings to force it to re-negotiate, and this has the same effect - this only works if directly connected to the FTTC modem as the fix does need the modem Ethernet to restart.

We are asking BT about this, and we are currently assuming this is a firmware issue on the BT FTTC modems.

We have confirmed that modems re-flashed with non-BT firmware do not have the same problem, though we don't usually recommend doing this as it is a BT modem and part of the service.

Update
04 Nov 2013 16:52:49
We have been working on getting more specific information regarding this, we hope to post an update tomorrow.
Update
05 Nov 2013 09:34:14
We have reproduced this problem by sending UDP packets using 'Scapy'. We are doing further testing today, and hope to write up a more detailed report about what we are seeing and what we have tested.
Update
05 Nov 2013 14:27:26
We have some quite good demonstrations of the problem now, and it looks like it will mess up most VPNs based on UDP. We can show how a whole range of UDP ports can be blacklisted by the modem somehow on the next PPP restart. It is crazy. We hope to post a little video of our testing shortly.
Update
05 Nov 2013 15:08:16
Here is an update/overview of the situation. (from http://revk.www.me.uk/2013/11/bt-huawei-fttc-modem-bug-breaking-vpns.html )

We have confirmed that the latest code in the BT FTTC modems appears to have a serious bug that is affecting almost anyone running any sort of VPN over FTTC.

Existing modems seem to be upgrading, presumably due to a roll out of new code in BT. An older modem that has not been on-line a while is fine. A re-flashed modem with non-BT firmware is fine. A working modem on the line for a while suddenly stopped working, presumably upgraded.

The bug appears to be that the modem manages to "blacklist" some UDP packets after a PPP restart.

If we send a number of UDP packets, using various UDP ports, then cause PPP to drop and reconnect, we then find that around 254 combinations of UDP IP/ports are now blacklisted. I.e. they no longer get sent on the line. Other packets are fine.

Sending 500 different packets, around 254 of them will not work again after the PPP restart. It is not actually the first or last 254 packets, some in the middle, but it seems to be 254 combinations. They work as much as you like before the PPP restart, and then never work after it.

We can send a batch of packets, wait 5 minutes, PPP restart, and still find that packets are now blacklisted. We have tried a wide range of ports, high and low, different src and dst ports, and so on - they are all affected.

The only way to "fix" it, is to disconnect the Ethernet port on the modem and reconnect. This does not even have to be long enough to drop PPP. Then it is fine until the next PPP restart. And yes, we have been running a load of scripts to systematically test this and reproduce the fault.

The problem is that a lot of VPNs use UDP and use the same set of ports for all of the packets, so if that combination is blacklisted by the modem the VPN stops after a PPP restart. The only way to fix it is manual intervention.

The modem is meant to be an Ethernet bridge. It should not know anything about PPP restarting or UDP packets and ports. It makes no sense that it would do this. We have tested swapping working and broken modems back and forth. We have tested with a variety of different equipment doing PPPoE and IP behind the modem.

BT are working on this, but it is a serious concern that this is being rolled out.
Update
12 Nov 2013 10:20:18
Work on this in still ongoing... We have tested this on a standard BT retail FTTC 'Infinity' line, and the problem cannot be reproduced. We suspect this is because when the PPP re-establishes a different IP address is allocated each time, and whatever is session tracking does not match the new connection.
Update
12 Nov 2013 11:08:17

Here is an update with some a more specific explanation as to what the problem we are seeing is:

On WBC FTTC, we can send a UDP packet inside the PPP and then drop the PPP a few seconds later. After the PPP re-establishes, UDP packets with the same source and destination IP and ports won't pass; they do not reach the LNS at the ISP.

Further to that, it's not just one src+dst IP and port tuple which is affected. We can send 254 UDP packets using different src+dest ports before we drop the PPP. After it comes back up, all 254 port combinations will fail. It is worth noting here that this cannot be reproduced on an FTTC service which allocates a dynamic IP which changes each time PPP re-established.

If we send more than 254 packets, only 254 will be broken and the others will work. It's not always the first 254 or last 254, the broken ones move around between tests.

So it sounds like the modem (or, less likely, something in the cab or exchange) is creating state table entries for packets it is passing which tie them to a particular PPP session, and then failing to flush the table when the PPP goes down.

This is a little crazy in the first place. It's a modem. It shouldn't even be aware that it's passing PPPoE frames, let along looking inside them to see that they are UDP.

This only happens when using an Openreach Huawei HG612 modem that we suspect has been recently remotely and automatically upgraded by Openreach in the past couple of months. Further - a HG612 modem with the 'unlocked' firmware does not have this problem. A HG612 modem that has probably not been automatically/remotely upgraded does not have this problem.

Side note: One theory is that the brokenness is actually happening in the street cab and not the modem. And that the new firmware in the modem which is triggering it has enabled 'link-state forwarding' on the modem's Ethernet interface.

Update
27 Nov 2013 10:09:42
This post has been a little quiet, but we are still working with BT/Openreach regarding this issue. We hope to have some more information to post in the next day or two.
Update
27 Nov 2013 10:10:13
We have also had reports from someone outside of AAISP reproducing this problem.
Update
27 Nov 2013 14:19:19
We have spent the morning with some nice chaps from Openreach and Huawei. We have demonstrated the problem and they were able to do traffic captures at various points on their side. Huawei HQ can now reproduce the problem and will investigate the problem further.
Update
28 Nov 2013 10:39:36
Adrian has posted about this on his blog: http://revk.www.me.uk/2013/11/bt-huawei-working-with-us.html
Update
13 Jan 14:09:08
We are still chasing this with BT.
Update
3 Apr 15:47:59
We have seen this affect SIP registrations (which use 5060 as the source and target)... Customers can contact us and we'll arrange a modem swap.
Update
23 Apr 10:21:03
BT are in the process of testing an updated firmware for the modems with customers. Any customers affected by this can contact us and we can arrange a new modem to be sent out.
Resolution BT are testing a fix in the lab and will deploy in due course, but this could take months. However, if any customers are adversely affected by this bug, please let us know and we can arrange for BT to send a replacement ECI modem instead of the Huawei modem. Thank you all for your patience.

--Update--
BT do have a new firmware that they are rolling out to the modems. So far it does seem to have fixed the fault and we have not heard of any other issues as of yet. If you do still have the issue, please reboot your modem, if the problem remains, please contact support@aa.net.uk and we will try and get the firmware rolled out to you.
Started 25 Oct 2013
Closed 23 Apr 10:21:03

25 Aug 23:49:30
Details
25 Aug 22:15:51
We are seeing what looks to be routing problems within our network with traffic to/from our Maidenhead datacentre. Routes seem to be flapping and disrupting connectivity with increased latency and packet loss. This would be affecting Ethernet services from Maidenhead as well as customers accessing web and email services that we host in Maidenhead. Customers are also reporting DNS problems.
Update
25 Aug 22:19:09
Engineers are investigating...
Update
25 Aug 23:33:53
Staff are still working on this. The cause of the problem has been identified and is being worked on.
Update
25 Aug 23:50:13
The problem has been resolved, traffic is now back to normal, we apologise for this inconvenience.
Started 25 Aug 21:45:00
Closed 25 Aug 23:49:30

22 Aug 12:17:37
Details
22 Aug 11:56:10
We have added a new section clarifying engineer visits and missed appointments. The confirms the "point of no return" for rearranging appointments, and clarifies compensation either way when an appointment is missed.

We have also added two additional reasons for charging an admin fee (£5+VAT). We hope you think these are reasonable. It is a bit of a shame that such things are necessary. We think it is not fair for such costs to be part of our overheads and so affect the price for everyone else who is being reasonable.

1. If you send us a bogus invoice which we validly reject (e.g. trying to invoice us for a delayed install when we do not guarantee install dates). Also for each further exchange of correspondence on such invoices.

2. If you attempt to take us to ADR when you are not entitled to (e.g. if you have not followed our complaints procedures, or you are a company of more than 10 staff, or you are, or have said you are, a communications provider). We will also charge any fees we end up paying as a result of such an attempt if accepted by the ADR provider.

Any questions, please let us know.

Started 22 Aug

19 Aug 12:59:53
Details
19 Aug 00:36:05
Initial reports suggest one of our fibre links to TalkTalk is down. This is affecting broadband lines using TalkTalk backhaul.
Update
19 Aug 00:43:35
00:05 TT Lines drop, looked like we had a router blip and a TT fibre blip - reasons yet unknown
00:15 Lines start to log back in
However, we are getting reports in intermittent access to some sites on internet - possible MTU related.
Update
19 Aug 01:33:16
MTU is still a problem. A workaround for the moment, is to lower the MTU setting in your router to 1432. Ideally this should not be needed, but try this until the problem is resolved.
Update
19 Aug 01:58:30
Other wholesalers using TT are reporting the same problem. TT helpdesk is aware of planned work that may be causing this. We have requested that that pass this MTU report on to the team involved in the planned work.
Update
19 Aug 07:14:05
TT tell us they think the problem with MTU has been fixed. We're still unsure at this moment, and will work with customers who still have problems.
Update
19 Aug 07:55:02
This is still a problem affecting customers using TT backhaul. TT are aware and are investigating. This is a result of a router upgrade within TT which looks to have been given incorrect settings.
Where possible, customers can change the MTU on their routers to be 1432
Update
19 Aug 08:55:47
We have been in contact with the TT Service Director who will be chasing this up internally at TT.
Update
19 Aug 09:05:48
Customers with bonded lines using TT and BT can turn off their TT modem or router for the time being.
Update
19 Aug 09:20:11
We are looking at re-routing TT connections through our secondary connection to TT...
Update
19 Aug 09:30:55
Traffic is now routing via our secondary connection to TT, this looks like it is not being routed via the faulty TT router and it is looks as if lines are passing traffic as normal
Update
19 Aug 09:55:32
Some customers are working OK, some are not.
The reason being is that we have 2 interconnects to TT. We are still seeing connections from both of them, however, we have a 1600 byte path from one but only 1500 from the other. The 1500 one is the one that TT did an upgrade on last night. So it looks like TT forgot to configure jumbo frames on an interface after the upgrade.
Needless to say, we've passed this information on to people at various levels within TT
Update
19 Aug 09:57:02
We are working on only accepting connections from TT via the working interconnect.
Update
19 Aug 10:39:32
We are forcing TT lines to reconnect, this should mean they then reconnect over the working interconnect and not the one with the faulty TT router.
Update
19 Aug 11:21:53
We are blocking connections from the faulty TT router and only accepting from the working one. This means when customers connect they have a working connection. However, this does mean that logins are being rejected from customers until they are routed via the working interconnect. It may take a few attempts for customers to connect first time.
Update
19 Aug 11:24:09
Some lines are taking a long time to come back. This is because they are still coming in via the broken interconnect - that we're rejecting. Unfortunately, affected lines just have to be left until they attempt to log in via the working interconnect. So, if we appear to be rejecting your login please leave your router to keep trying and it should fix itself.
Update
19 Aug 11:32:11
TT are reverting their upgrade from last night. This looks like it's underway at the moment.
Update
19 Aug 11:35:00
Latest from TT: "The roll back has been completed and the associated equipment has been restarted. Our (TT) engineers are currently performing system checks and a retest before confirming resolution on this incident. Further information will be provided shortly. "
Update
19 Aug 11:43:32
TT have completed their downgrade. It looks like the faulty link is working OK again, we'll be testing this before we unblock the link our side.
Update
19 Aug 13:01:55
We've re-enabled the faulty link, we are now back to normality! We do apologise for this outage. We will be discussing this fault and future upgrades of these TT routers with TT staff.
Started 19 Aug 00:05:00
Closed 19 Aug 12:59:53

21 Aug 10:30:00
[Mobile SIMs] - Cheap data SIMs - Info
Details
18 Aug 13:21:06
We have an issue with printing the DATA SIM cards at present. (No issue with VOICE SIM cards).

For a few days at least we'll be shipping SIMs unprinted (plain white) instead. We also have a number of mis-prints, which we are doing for half price. Staff will call when you order to confirm if you want a mis-print.

We hope to have the printer working again soon. Sorry for any inconvenience.

Resolution Card printing is working well again, we still have some miss-printed cards, so do ask our Sales Dept if you are interested.
Started 18 Aug
Closed 21 Aug 10:30:00
Previously expected 25 Aug

13 Aug 09:15:00
Details
13 Aug 11:26:08
Due to a radius issue we were not receiving line statistics from just after midnight. As a result we needed to force lines to login again. This would have caused lines to lose their PPP connection and then reconnect at around 9AM. We apologise for this, and will be investigating the cause.
Started 13 Aug 09:00:00
Closed 13 Aug 09:15:00

15 Aug
Details
12 Aug 08:48:28
The recent router upgrades have now seen some issues (last night). This means we expect to do more upgrades (or downgrades) over the next few days. We'll know more later today. If there are further issues this may end up being done during the day even, but this looks unlikely.
Update
12 Aug 17:42:46
One of the routers showing problems (a.aimless) had a further issue today, and as part of the defensive design of our kit has automatically downgraded to the previous release. We are still investigating the cause of this issue.
Update
14 Aug 17:30:32
We have a much better handle on the problem, and it looks related to "stuff" out on the internet having an unexpected knock-on effect on our routers. We have some plans for further changes that will address this.
Started 12 Aug 08:45:30
Closed 15 Aug
Previously expected 15 Aug

16 Aug
Details
8 Aug 14:33:12
We will be doing some router upgrades over the next week or so. These will usually have little or no disruption, and LNS upgrades will be done over night as usual.
Started 9 Aug
Closed 16 Aug
Previously expected 16 Aug

8 Aug 15:25:00
Details
8 Aug 15:42:28
At 15:15 we saw customer on the 'D' LNS's lose their connection and reconnect a few moments later. The cause of this is being looked in to.
Resolution Lines quickly came back online, we apologise for the drop though. The cause will be investigated.
Started 8 Aug 15:15:00
Closed 8 Aug 15:25:00

1 Aug 10:00:00
Details
We saw what looks to be congestion on some lines on the Rugby exchange (BT lines). This shows a slight packet loss on Sunday evening. We'll report this to BT.
Update
30 Jul 11:03:08
Card replaced early hours this morning, which should have fixed the congestion problems.
Started 27 Jul 21:00:00
Closed 1 Aug 10:00:00

4 Aug 13:44:05
Details
Due to a database problem viewing things such as graphs, submitting tests and other things relating to a Line is problematic at the moment. This is being worked on and should be restored shortly.
Resolution Database problem resolved. Sorry for the inconvenience.
Started 4 Aug 13:00:00
Closed 4 Aug 13:44:05

2 Aug 12:02:18
Details
2 Aug 12:02:18
We are planning an expansion of our VoIP services which will require the use of more IP addresses than we have told customers to allow through their firewalls, so have added additional IP ranges to the list.

We are not using these new IP ranges yet, but are giving advanced notice of the change to give customers time to update their firewall rules.

The list is at http://wiki.aa.org.uk/VoIP_Firewall

Started 2 Aug 11:50:35

9 Aug
Details
1 Aug 08:01:24
Many customers have regular payments on 1st of the month and so do not get a separate Direct Debit notice each month.

Unfortunately, this month, the system did not run correctly meaning that Direct Debits were not collected today. As such they are being re-notified with the agreed 5 working days notice for collection on 8th or 9th.

This should return to normal next month.

Sorry for any confusion that may be caused by this.

Started 1 Aug
Closed 9 Aug

25 Jul 21:00:00
Details
7 Jul 15:34:10
There is a problem activating some new data SIMs which being is caused by a problem at Three. This has been escalated within Three and we expect an update by the end of the day. Please note that this only affects the activation of new Three data SIMs. Existing data SIMs are not affected, nor is the data on our O2 based voice SIMs.
Update
8 Jul 09:15:12
Three are still working on this problem as a Priority 1 case. We should have updates every 4 hours and will post them here.
Update
8 Jul 16:19:30
An update: "Three have advised that the interface card that they believe was causing the fault has been replaced as of 01:00 this morning. We are continuing to see API failures, however these have lessened since 11:00. We have asked Three to investigate further and feed back."
Update
9 Jul 13:12:12
This seems to be fixed for new SIMs being activated now. Customers with SIMs which attempted to activate SIMs over the past few days may be stuck in a broken partially-activated state. We are getting these reset by the carrier so we can activate them again.
Update
14 Jul 16:11:18
Activating seems broken again. We've reported it and are awaiting an update.
Update
15 Jul 17:18:11
We do have a small number of SIMs that are in a 'stuck' state, and we're waiting for the carrier to clear them.
Update
17 Jul 15:48:25
Three are still having further problems with activating SIMS. This has been happening for the last couple of days, and is on going. This has been raised with Three.
Update
21 Jul 15:33:17
This is still an ongoing problem, the latest from our supplier is:

"The current understanding is that some of Three's load balanced platforms are returning an inconsistent response when the requests are submitted. This is in turn causing our platform to lock the SIMs from further activation attempts so that they can be investigated."

Update
24 Jul 11:27:28
This is still ongoing, but is more of an intermittent problem. We have been able to activate most SIMs.
Resolution Problems with upstream carrier now resolved. Activating SIMs is now working ok, we apologise for the inconvenience this caused.
Started 6 Jul 12:00:00
Closed 25 Jul 21:00:00

28 Jul 11:00:00
Details
28 Jul 09:20:03
Customers may have seen a drop and reconnect of their broadband lines this morning. Due to a problem with our RADIUS accounting on Sunday we have needed to restart our customer database server, Clueless. This has been done, and Clueless is back online. Due to the initial problem with RADIUS accounting most DSL lines have had to be restarted.
Update
28 Jul 10:02:13
We are also sending out order update messages in error - eg, emails about orders that have already completed. We apologise for this confusing and are investigating this.
Started 28 Jul 09:00:00
Closed 28 Jul 11:00:00

29 Jul 13:30:58
Details
29 Jul 13:30:58
The special offer price on the O2/EU SIMs, i.e. free apart from postage, ends on Thursday. After this they go back to the £5+VAT per SIM.
Started 29 Jul
Previously expected 1 Aug

29 Jul 12:22:25
Details
8 Jul 13:19:27
We have the SIMs and should be able to ship today with any luck. We still have some checks to do. However, I have enabled ordering. See http://aa.net.uk/telecoms-sip2sim.html

We have an umber of people wanting to upgrade the UK only O2 SIMs to roaming. For this month these are being supplied at zero buy cost with postage added at cost. After this month we will be back to £5+VAT.

Update
8 Jul 16:33:21
We have not been able to finish testing yet - just waiting on a minor change in the mobile network. We have the SIMs and hope to be able to ship tomorrow.
Update
8 Jul 17:48:31
The good news is that we now have the keys for the SIMs for provisioning now, and are all ready to go on that front. I see several orders waiting to ship.

However we have just been advised that there is a snag in the mobile operator tariffing systems which may take a couple of days to sort. Until that is sorted we cannot really send SIMs out as the call rates will be all screwed up.

So please bear with us.

Update
9 Jul 12:55:43
We are still waiting on an update from the mobile carrier. We have checked, and the tariffing is way out at present which means we still can't ship the SIMs. It could be a couple more days by the look of it.
Update
11 Jul 08:18:04
I am sorry to say that we are being told it may be another week before the tariff is sorted. Please bear with us.
Update
18 Jul 07:04:21
Still waiting on the mobile operator, sorry for delay.
Update
18 Jul 13:14:18
We should be posting SIMs Monday - or possibly today - we are just going through testing with the mobile carrier now to confirm everything is set up right. Thank you all for your patience.
Update
18 Jul 17:07:44
Still testing, I'm afraid - going well but not quite there yet, so we are looking at Monday for the SIMs to be shipped.
Update
21 Jul 15:30:37
Still not sorted - not sure if they will ship today. I appreciatet hat this is very frustrating.
Update
21 Jul 17:47:58
The good news is that testing has finally got close enough to ship the SIMs, in that the main part of the tariffing code is right. We have a couple of small snags to sort and texts from the mobile while roaming are not working yet but hopefully that can be sorted tomorrow. It is not a reason to avoid shipping them I feel.
Update
21 Jul 18:59:16
OK SIMs will go out tomorrow. At present texts from the mobile while on EU profile not working, but we expect that to be sorted very soon.
Update
22 Jul 13:51:24
SIMs are shipping today - at present outgoing texts are not working on the EU profile but we expect that to be fixed very soon.
Update
22 Jul 18:49:07
All of the back orders were shipped today. We expect to put prices back to £5 to buy the SIM from next month.
Update
29 Jul 12:22:25
The outgoing SMS issue when roaming is fixed.

Also, many SIMs went out without an operator name set in the SIM, but SIMs are now going out with the selected operator name. Customers can ask support to send a SIM update to change operator name.

Started 8 Jul 13:00:00