Order posts by limited to posts

Friday 15:47:47
Details
9 Dec 11:20:04
Some lines on the LOWER HOLLOWAY exchange are experiencing peak time packet loss. We have reported this to BT and they are investigating the issue.
Update
11 Dec 10:46:42
BT have passed this to TSO for investigation. We are waiting for a further update.
Update
12 Dec 14:23:56
BT's Tso are currently investigating the issue.
Update
16 Dec 12:07:31
Other ISPs are seeing the same problem. The BT Capacity team are now looking in to this.
Update
17 Dec 16:21:04
No update to report yet, we're still chasing BT...
Update
18 Dec 11:09:46
The latest update from this morning is: "The BT capacity team have investigated and confirmed that the port is not being over utilized, tech services have been engaged and are currently investigating from their side."
Update
Friday 15:47:47
BT are looking to move our affected circuits on to other ports.
Update expected Tuesday 14:00:00
Previously expected Friday 15:14:17 (Last Estimated Resolution Time from AAISP)

Friday 14:45:23
Details
Friday 09:44:48
Today the CVE-2014-9222 router vulnerability AKA 'misfortune cookie' has been announced at http://mis.fortunecook.ie/ This is reported to affect many broadband routers all over the world. The web page has further details.
We are contacting our suppliers for their take on this, we'll post follow-ups to this status post shortly.
It is also worth noting that at the time of writing CVE-2014-9222 is still 'reserved': http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-9222
Update
Friday 09:52:28
Technicolor Routers:- These routers are not (yet?) on the list, we are awaiting a response from Technicolor regarding this.
Update
Friday 09:59:46
ZyXEL P-660R-D1: This router is on the list. We are awaiting a response from ZyXEL though. We do already have this page regarding the web interface on ZyXELs: http://wiki.aa.org.uk/Router_-_ZyXEL_P660R-D1#Closing_WAN_HTTP and closing the Web server from the WAN may help with this vulnerability.
Update: The version of RomPager (the web server) on ZyXELs that we have been shipping for some time is 4.51. Allegedly versions before 4.34 are vulnerable, so they may not be vulnerable. You can tell the version with either:
wget -S IP.of.ZyXEL
or
curl --head IP.of.ZyXEL
Update
Friday 10:00:57
Dlink 320B: We supply these in Bridge mode and therefore are not vulnerable.
Update
Friday 10:02:38
FireBrick: Firebricks are not vulnerable.
Started Friday 09:00:00

18 Dec 20:00:52
Details
18 Dec 20:00:52
The working days between Christmas and New Year are "Christmas" rate, this means that any usage on 29th, 30th, and 31st December is not counted towards your Units allowance. As usual, bank holidays are treated as 'Weekend' rate.
(This doesn't apply to Home::1 or Office::1 customers.)
We wish all out customers a Merry Christmas!
Started 18 Dec 19:00:00

12 Dec 11:00:40
Details
11 Dec 10:42:15
We are seeing some TT connected lines with packetloss starting at 9AM yesterday and today. The loss lasts until 10AM and then there continues a low amount of loss. We have reported this to TalkTalk
Update
11 Dec 10:46:34
This is the pattern of loss we are seeing:
Update
12 Dec 12:00:04
No loss has been seen on these lines today. We're still chasing TT for any update though.
Resolution The problem went away... TT were unable to find the cause.
Broadband Users Affected 7%
Started 11 Dec 09:00:00
Closed 12 Dec 11:00:40

11 Dec 14:15:00
Details
11 Dec 14:13:58
BT issue affecting SOHO AKA GERRARD STREET 21CN-ACC-ALN1-L-GER. we have reported to this BT and they are now investigating.
Update
11 Dec 14:19:33
BT are investigating, however the circuits are mostly back online.
Started 11 Dec 13:42:11 by AAISP Pro Active Monitoring Systems
Closed 11 Dec 14:15:00
Previously expected 11 Dec 18:13:11 (Last Estimated Resolution Time from AAISP)

2 Dec 09:05:00
Details
1 Dec 21:54:24
All FTTP circuits on Bradwell Abbey have packetloss. This started at about 23:45 on 30th November. This is affecting other ISPs too. BT did have an Incident open, but this has been closed. They restarted a line card last night, but it seems the problem has been since the card was restarted. We are chasing BT.
Example graph:
Update
1 Dec 22:38:39
It has been a struggle to get the front line support and the Incident Desk at BT to accept that this is a problem. We have passed this on to our Account Manager and other contacts within BT in the hope of a speedy fix.
Update
2 Dec 07:28:40
BT have tried doing something overnight, but the packetloss still exists at 7am 2nd December. Our monitoring shows:
  • packet loss it stops at 00:30
  • The lines go off between 04:20 and 06:00
  • The packet loss starts again at 6:00 when they come back onine.
We've passed this on to BT.
Update
2 Dec 09:04:56
Since 7AM today, the lines have been OK... we will continue to monitor.
Started 30 Nov 23:45:00
Closed 2 Dec 09:05:00

3 Dec 09:44:00
Details
27 Nov 16:31:03
We are seeing what looks like congestion on the Walworth exchange. Customers will be experiencing high latency, packetloss and slow throughput in the evenings and weekends. We have reported this to TalkTalk.
Update
2 Dec 09:39:27
Talk Talk are still investigating this issue.
Update
2 Dec 12:22:04
The congestion issue has been discovered on Walworth Exchange and Talk Talk are in the process of traffic balancing.
Update
3 Dec 10:30:14
Capacity has been increased and the exchange is looking much better now.
Started 27 Nov 16:28:35
Closed 3 Dec 09:44:00

3 Dec 18:20:00
Details
3 Dec 10:45:55
We are seeing MTU issues on some 21CN lines this morning where lines are unable to pass more than 1462 byte IP packets. It isn't affecting all lines, and the common factor appears to be that they are on ACC-ALN1 BRASs in London, but not all lines on those BRASs are affected. We have reported the issue to BT Wholesale and they are investigating the issue.
Update
3 Dec 16:18:23
BT have now raised an incident and are investigating.
Update
3 Dec 18:57:49
This has been fixed. We've been speaking to some network guys at BT this evening and helping them find the fault.
Resolution BT fixed this last night. We believe BT had equipment in the network that was miss-configured.
Started 3 Dec 09:00:00
Closed 3 Dec 18:20:00

19 Nov 16:20:46
Details
19 Nov 15:11:12
Lonap (one of the main Internet peering points in the UK) has a problem. We have stopped passing traffic over Lonap. Customers may have seen packetloss for a short while, but routing should be OK now. We are monitoring the traffic and will bring back Lonap when all is well.
Update
19 Nov 16:21:29
The Lonap problem has been fixed, and we've re-enabled our peering.
Started 19 Nov 15:00:00
Closed 19 Nov 16:20:46

21 Nov 00:18:00
Details
21 Nov 10:58:09
We have a number of TT lines down all on the same RAS: HOST-62-24-203-36-AS13285-NET. We are chasing this with TalkTalk.
Update
21 Nov 11:01:29
Most lines are now back. We have informed TalkTalk.
Update
21 Nov 12:18:22
TT have come back to us. They were aware of the problem, it was caused by a software problem on an LTS.
Started 21 Nov 10:45:00
Closed 21 Nov 00:18:00

25 Nov 10:43:46
Details
21 Oct 14:10:19
We're seeing congestion from 10am up to 11:30pm across the BT Rose Street, PIMLICO and the High Wycombe exchange. A fault has been raised with BT and we will post updates as soon as we can. Thanks for your patience.
Update
28 Oct 11:23:44
Rose Street and High Wycombe are now clear. Still investigating Pimlico
Update
3 Nov 14:41:45
Pimlico has now been passed to BT's capacity team to deal with . Further capacity is needed and will be added asap. We will provide updates as soon as it's available.
Update
5 Nov 10:12:30
We have just been informed by the BT capacity team that end users will be moved to a different VLAN on Friday morning. We will post futher updates when we have them.
Update
11 Nov 10:23:59
Most of the Pimlico exchange is now fixed. Sorry for the delay.
Update
19 Nov 11:01:57
There is further planned work on the Pimlico exchange for the 20th November. This should resolve the congestion on the Exchange.
Update
25 Nov 10:44:43
Pimlico lines are now running as expected. Thanks for your patience.
Started 21 Oct 13:31:50
Closed 25 Nov 10:43:46

4 Nov 16:47:11
Details
4 Nov 09:42:18
Several graphs have been missing in recent weeks, some days, and some LNSs. This is something we are working on. Unfortunately, today, one of the LNSs is not showing live graphs again, and so these will not be logged over night. We hope to have a fix for this in the next few days. Sorry for any inconvenience.
Resolution The underlying cause has been identified and will be deployed over the next few days.
Started 1 Oct
Closed 4 Nov 16:47:11
Previously expected 10 Nov

5 Nov 02:27:31
Details
4 Nov 09:50:36
Once again we expect to reset one of the LNSs early in the morning. This will not be the usual LNS switch, with the preferred time of night, but all lines on the LNS at once. The exact time depends on staff availability, sorry. This means a clear of PPP which can immediately reconnect. This may be followed by a second PPP reset a few minutes later. We do hope to have a proper solution to this issue in a couple of days.
Resolution Reset completed. We will do a normal rolling update of LNSs over next three nights. This should address the cause of the problem. If we have issues with graphs before that is complete, we may have to do a reset like this again.
Broadband Users Affected 33%
Started 5 Nov
Closed 5 Nov 02:27:31
Previously expected 5 Nov 07:00:00

1 Nov 11:35:11
[Broadband] - Blip - Closed
Details
1 Nov 11:55:38
There appears to be something of a small DoS attack which resulted in a blip around 11:29:16 today, and caused some issues with broadband lines and other services. We're looking in to this at present and graphs are not currently visible on one of the LNSs for customers.
Update
1 Nov 13:09:44
We expect graphs on a.gormless to be back tomorrow morning after some planned work.
Resolution Being investigated further.
Started 1 Nov 11:29:16
Closed 1 Nov 11:35:11

2 Nov 04:08:38
Details
1 Nov 13:07:11
We normally do LNS switch overs without a specific planned notice - the process is routine for maintenance and means clearing the PPP session to reconnect immediately on another LNS. We do one line at a time, slowly. We even have a control on the clueless so you can state preferred time of night.

However, tomorrow morning, we will be moving lines off a.gormless (one third of customers) using a different process. It should be much the same, but all lines will be at one time of night, and this may mean some are slower to reconnect.

This plan is to do this early morning - the exact time depends on when staff are available. Sorry for any inconvenience.

Resolution Completed as planned. Graphs back from now on a.gormless.
Broadband Users Affected 33%
Started 1 Nov 03:00:00
Closed 2 Nov 04:08:38
Previously expected 1 Nov 07:00:00

7 Oct 06:17:13
Details
3 Oct 16:25:24
As we advised, we have had to make some radical changes to our billing to fix database load issues. These have gone quite well overall, but there have been a few snags. We think we have them all now, but this month we had to revert some usage charging giving some free usage.

We have identified that quarterly billed customers on units tariffs were not charged, so these are being applied shortly as a new invoice. Anyone with excess usage as a result, please do ask accounts for a credit.

We have also identified that call charges have not been billed - these can be billed to date if anyone asks, or if you leave it then they should finally catch up on next month's bill.

Sorry for any inconvenience.

Started 1 Oct
Previously expected 1 Nov

15 Oct 17:14:18
Details
6 Oct 14:22:50
For the next week or so we're considering 5am-7am to be a PEW window for some very low disruption work (a few seconds of "blip"). We're still trying very hard to improve our network configuration and router code to create a much more stable network. It seems, from recent experience, that this sort of window will be least disruptive to customers. It is a time where issues can be resolve by staff if needed (which is harder at times like 3am) and we get more feedback from end users. As before, we expect this work to have no impact in most cases, and maybe a couple of seconds of routing issues if it is not quite to plan. Sadly, all of our efforts to create the same test scenarios "on the bench" have not worked well. At this stage we are reviewing code to understand Sunday morning's work better, and this may take some time before we start. We'll update here and on irc before work is done. Thank you for your patience.
Update
7 Oct 09:06:41
We did do work around 6:15 to 6:30 today - I thought I had posted an update here before I started but somehow it did not show. If we do any more, I'll try and make it a little earlier.
Update
8 Oct 05:43:11
Doing work a little earlier today. We don't believe we caused any blips with today's testing.
Update
9 Oct 05:47:53
Another early start and went very well.
Update
10 Oct 08:22:53
We updated remaining core routers this morning, and it seemed to go very well. Indeed pings we ran had zero loss when upgrading routers in Telecity. However, we did lose TalkTalk broadband lines in the process. These all reconnected straight away, but we are no reviewing how this happens to try and avoid it in future.
Resolution Closing this PEW from last week. We may need to do more work at some point, but we are getting quite good at this now.
Started 7 Oct 06:00:00
Closed 15 Oct 17:14:18
Previously expected 14 Oct 07:00:00

5 Oct 07:26:50
Details
3 Oct 10:41:59
We do plan to upgrade routers again over the weekend, probably early saturday morning (before 9am). I'll post on irc at the time and update this notice.

The work this week means we expect this to be totally seamless, but the only way to actually be sure is to try it.

If we still see any issues we'll do more on Sunday.

Update
4 Oct 06:54:19
Upgrades starting shortly.
Update
4 Oct 07:24:47
Almost perfect!

We loaded four routers, each at different points in the network. We ran a ping that went through all four routers whilst doing this. For three of them we did see ping drop a packet. The fourth we did not see a drop at all.

This may sound good, but it should be better - we should not lose a single packet doing this. We're looking at the logs to work out why, and may try again Sunday morning.

Thank you for your patience.

Update
4 Oct 07:53:52
Plan for tomorrow is to pick one of the routers that did drop a ping, and shut it down and hold it without restarting - at that point we can investigate what is still routing via it and why. This should help us explain the dropped ping. Assuming that provides the clues we need we may load or reconfigure routers later on Sunday to fix it.
Update
5 Oct 06:57:39
We are starting work shortly.
Update
5 Oct 07:11:00
We are doing the upgrades as planned, but not able to do the level of additional diagnostics we wanted. We may look in to that next weekend.
Resolution Only 3 routers were upgraded, the 3rd having several seconds of issues. We will investigate the logs and do another planned work. It seems early morning like this is less disruptive to customers.
Started 4 Oct
Closed 5 Oct 07:26:50
Previously expected 6 Oct

2 Oct 19:05:55
Details
2 Oct 19:05:15
We'd like to thank customers for patience this week. The tests we have been doing in the evenings have been invaluable. The issues seen have mostly related to links to Maidenhead (so voice calls rather than broadband connections).

The work we are doing has involved a lot of testing "on the bench" and even in our offices (to the annoyance of staff) but ultimately testing on the live customer services is the final test. The results have been informative and we are very close to out goal now.

The goal is to allow router maintenance with zero packet loss. We finally have the last piece in the jigsaw for this, and so should have this in place soon. Even so, there may be some further work to achieve this.

Apart from a "Nice to have" goal, this also relates to failures of hardware, power cuts, and software crashes. The work is making the network configuration more robust and should allow for key component failures with outages as short as 300ms in some cases. LNS issues tend to take longer for PPP to reconnect, but we want to try and be as robust as possible.

So, once again, thank you all for your patience while we work on this. There may be some more planned works which really should now be invisible to customers.

Started 2 Oct 19:00:41

1 Oct 17:49:32
Details
30 Sep 18:04:06
Having been very successful with the router upgrade tonight, we are looking to move to the next router on Wednesday. Signs so far are that this should be equally seamless. We are, however, taking this slowly, one step at a time, to be sure.
Resolution We loaded 4 routers in all, and some were almost seamless, and some had a few seconds of outage, it was not perfect but way better than previously. We are now going to look in to the logs in detail and try to understand what we do next.

Our goal here is zero packet loss for maintenance.

I'd like to thank all those on irc for their useful feedback during these test.

Started 1 Oct 17:00:00
Closed 1 Oct 17:49:32
Previously expected 1 Oct 18:00:00

30 Sep 18:02:25
Details
29 Sep 21:57:11
We are going to spend much of tomorrow trying to track down why things did not go smoothly tonight, and hope to have a solution by tomorrow (Tuesday) evening.

This time I hope to make a test load before the peak period at 6pm, so between 5pm and 6pm when things are a bit of a lull between business and home use.

If all goes to plan there will be NO impact at all, and that is what we hope. If so we will update three routers with increasing risk of impact, and abort if there are any issues.

Please follow things on irc tomorrow.

If this works as planned we will finally have all routers under "seamless upgrade" processes.

Update
30 Sep 08:29:42
Tests on our internal systems this morning confirm we understand what went wrong last night, and as such the upgrade tonight should be seamless.

For the technically minded, we had an issue with VRRP becoming master too soon, i.e. before all routes are installed. The routing logic is now linked to VRRP to avoid this scenario, regarless of how long routing takes.

Resolution The upgrade went very nearly perfectly on the first router - we believe the only noticeable impact was the link to our office, which we think we understand now. However, we did only do the one router this time.
Started 30 Sep 17:00:00
Closed 30 Sep 18:02:25
Previously expected 30 Sep 18:00:00

29 Sep 22:37:36
Details
21 Aug 12:50:32
Over the past week or so we have been missing data on some monitoring graphs, this is shown as purple for the first hour in the morning. This is being caused by delays in collecting the data. This is being looked in to.
Resolution We believe this has been fixed now. We have been monitoring it for a fortnight after making an initial fix, and it looks to have been successful.
Closed 29 Sep 22:37:36

29 Sep 19:29:19
Details
29 Sep 14:06:12
We expect to reload a router this evening, which is likely to cause a few seconds of routing issues. This is part of trying to address the blips caused by router upgrades, which are meant to be seamless.
Update
29 Sep 18:48:37
The reload is expected shortly, and will be on two boxes at least. We are monitoring the effect of the changes we have made. They should be a big improvement.
Resolution Upgrade was tested only on one router (Maidenhead) and caused some nasty impact on routing to call servers and control systems - general DSL was unaffected. Changes are backed out now, and back to drawing board. Further PEW will be announced as necessary.
Started 29 Sep 17:00:00
Closed 29 Sep 19:29:19
Previously expected 29 Sep 23:00:00

29 Sep 13:17:50
Details
29 Sep 08:48:37
Some updates to the billing system have caused a problem for units billed customers resulting in their usage for next month starting early, i.e. usage is now being logged for October.

Because of the way usage carriers forward, this is unlikey to have much impact on customer in terms of additional charges. However, any customers that think they have lost out, please let us know and we'll make a manual adjustment.

The problem has been corrected for next month.

Update
29 Sep 08:57:00
It looks like customers won't get billed top-up and may not get billed units either, so we are working on un-doing this issue so that billing is done normally. Please bear with us.
Update
29 Sep 09:23:40
We are working on this now and should have usage billing back to normal later this morning.
Resolution Usage billing has been restored to around 1am Saturday, giving customers 2.5 days of unmetered usage.
Started 29 Sep 08:45:12
Closed 29 Sep 13:17:50

28 Sep 19:20:54
Details
28 Sep 18:52:50
We are experiencing a network problem affecting our broadband customers. Staff are investigating.
Update
28 Sep 19:08:28
This is looking like some sort of Denial of Service attack. We're lookig at mitigating this.
Update
28 Sep 19:16:36
The traffic has died down, things are starting to look better.
Update
28 Sep 19:21:46
Traffic is now back to normal.
Started 28 Sep 18:30:00
Closed 28 Sep 19:20:54

20 Sep 07:09:09
Details
20 Sep 11:59:13
RADIUS account is behind at the moment. This is causing the usage data to appear as missing from customer lines. The accounting is behind, but it's not broken, and is catching up. The usage data doesn't appear to be lost, and should appear later in the day.
Update
21 Sep 08:12:52
Records have now caught up.
Closed 20 Sep 07:09:09
Previously expected 20 Sep 15:57:11

26 Aug 09:15:00
Details
26 Aug 09:02:02
Yesterday's and today's line graphs are not being shown at the moment. We are working on restoring this.
Update
26 Aug 09:42:18
Today's graphs are back, yesterdays are lost though.
Started 26 Aug 08:00:00
Closed 26 Aug 09:15:00

29 Sep 16:57:23
Details
2 Sep 17:15:50
We had a blip on one of the LNSs yesterday, so we are looking to roll out some updates over this week which should help address this, and some of the other issues last month. As usual LNS upgrades would be over night. We'll be rolling out to some of the other routers first, which may mean a few seconds of routing changes.
Update
7 Sep 09:43:40
Upgrades are going well, but we are taking this slowly, and have not touched the LNSs yet. Addressing stability issues is always tricky as it can be weeks or months before we know we have actually fixed the problems. So far we have managed to identify some specific issues that we have been able to fix. We obviously have to be very careful to ensure these "fixes" do not impact normal service in any way. As such I have extended this PEW another week.
Update
13 Sep 11:07:13
We are making significant progress on this. Two upgrades are expected today (Saturday 13th) which should not have any impact. We are also working on ways to make upgrades properly seamless (which is often the case, but not always).
Update
14 Sep 17:21:35
Over the weekend we have done a number of tests, and we have managed to identify specific issues and put fixes in place on some of the routers on the network to see how they go.

This did lead to some blips (around 9am and 5pm on Sunday for example). We think we have a clearer idea on what happened with these too, and so we expect that we will load some new code early tomorrow or late tonight which may mean another brief blip. This should allow us to be much more seamless in future.

Later in the week we expect to roll out code to more routers.

Update
16 Sep 16:57:07
We really think we have this sussed now - including reloads that have near zero impact on customers. We have a couple more loads to do this week (including one at 5pm today), and some over night rolling LNS updates.
Update
17 Sep 12:23:59
The new release is now out, and we are planning upgrades this evening (from 5pm) and one of the LNSs over night. This should be pretty seamless now. At the end of the month we'll upgrade the second half of the core routers, assuming all goes well. Thank you for your patience.
Update
18 Sep 17:15:27
FYI, there were a couple of issues with core routers today, at least one of which would have impacted internet routing for some destinations for several seconds. These issues were on the routers which have not yet been upgraded, which is rather encouraging. We are, of course, monitoring the situatuion carefully. The plan is still to upgrade the second half of the routers at the end of the month.
Update
19 Sep 12:12:42
One of our LNS's (d.gormless) did restart unexpectedly this morning - this router is scheduled to be upgraded tonight.
Update
28 Sep 13:25:10
The new release has been very stable for the last week and is being upgraded on remaining routers during Sunday.
Resolution Stable releases loaded at weekend
Started 2 Sep 18:00:00
Closed 29 Sep 16:57:23
Previously expected 19 Sep

2 Sep 17:08:13
Details
2 Sep 15:38:09
Some people use the test LNS (doubtless) for various reasons, and it is also used some of the time for our NAT64 gateway.

We normally do re-loads on doubtless to test things with no notice, but we expect there may be quite a few this afternoon/evening as we are trying to track down an issue with new code that is not showing on the bench test systems.

As usual this is a PPP reset and reconnect and if it crashes may be a few seconds extra outage. With any luck this will not take many resets to find the issue.

Resolution Testing went well.
Started 2 Sep 15:40:00
Closed 2 Sep 17:08:13
Previously expected 3 Sep

1 Sep 19:42:08
Details
1 Sep 19:42:56
c.gormless rebooted, lines moved to other LNS automatically. We are investigating.
Broadband Users Affected 33%
Started 1 Sep 19:39:19
Closed 1 Sep 19:42:08

23 Apr 10:21:03
Details
01 Nov 2013 15:05:00
We have identified an issue that appears to be affecting some customers with FTTC modems. The issue is stupidly complex, and we are still trying to pin down the exact details. The symptoms appear to be that some packets are not passing correctly, some of the time.

Unfortunately one of the types of packet that refuses to pass correctly are FireBrick FB105 tunnel packets. This means customers relying on FB105 tunnels over FTTC are seeing issues.

The work around is to remove the ethernet lead to the modem and then reconnect it. This seems to fix the issue, at least until the next PPP restart. If you have remote access to a FireBrick, e.g. via WAN IP, and need to do this you can change the Ethernet port settings to force it to re-negotiate, and this has the same effect - this only works if directly connected to the FTTC modem as the fix does need the modem Ethernet to restart.

We are asking BT about this, and we are currently assuming this is a firmware issue on the BT FTTC modems.

We have confirmed that modems re-flashed with non-BT firmware do not have the same problem, though we don't usually recommend doing this as it is a BT modem and part of the service.

Update
04 Nov 2013 16:52:49
We have been working on getting more specific information regarding this, we hope to post an update tomorrow.
Update
05 Nov 2013 09:34:14
We have reproduced this problem by sending UDP packets using 'Scapy'. We are doing further testing today, and hope to write up a more detailed report about what we are seeing and what we have tested.
Update
05 Nov 2013 14:27:26
We have some quite good demonstrations of the problem now, and it looks like it will mess up most VPNs based on UDP. We can show how a whole range of UDP ports can be blacklisted by the modem somehow on the next PPP restart. It is crazy. We hope to post a little video of our testing shortly.
Update
05 Nov 2013 15:08:16
Here is an update/overview of the situation. (from http://revk.www.me.uk/2013/11/bt-huawei-fttc-modem-bug-breaking-vpns.html )

We have confirmed that the latest code in the BT FTTC modems appears to have a serious bug that is affecting almost anyone running any sort of VPN over FTTC.

Existing modems seem to be upgrading, presumably due to a roll out of new code in BT. An older modem that has not been on-line a while is fine. A re-flashed modem with non-BT firmware is fine. A working modem on the line for a while suddenly stopped working, presumably upgraded.

The bug appears to be that the modem manages to "blacklist" some UDP packets after a PPP restart.

If we send a number of UDP packets, using various UDP ports, then cause PPP to drop and reconnect, we then find that around 254 combinations of UDP IP/ports are now blacklisted. I.e. they no longer get sent on the line. Other packets are fine.

Sending 500 different packets, around 254 of them will not work again after the PPP restart. It is not actually the first or last 254 packets, some in the middle, but it seems to be 254 combinations. They work as much as you like before the PPP restart, and then never work after it.

We can send a batch of packets, wait 5 minutes, PPP restart, and still find that packets are now blacklisted. We have tried a wide range of ports, high and low, different src and dst ports, and so on - they are all affected.

The only way to "fix" it, is to disconnect the Ethernet port on the modem and reconnect. This does not even have to be long enough to drop PPP. Then it is fine until the next PPP restart. And yes, we have been running a load of scripts to systematically test this and reproduce the fault.

The problem is that a lot of VPNs use UDP and use the same set of ports for all of the packets, so if that combination is blacklisted by the modem the VPN stops after a PPP restart. The only way to fix it is manual intervention.

The modem is meant to be an Ethernet bridge. It should not know anything about PPP restarting or UDP packets and ports. It makes no sense that it would do this. We have tested swapping working and broken modems back and forth. We have tested with a variety of different equipment doing PPPoE and IP behind the modem.

BT are working on this, but it is a serious concern that this is being rolled out.
Update
12 Nov 2013 10:20:18
Work on this in still ongoing... We have tested this on a standard BT retail FTTC 'Infinity' line, and the problem cannot be reproduced. We suspect this is because when the PPP re-establishes a different IP address is allocated each time, and whatever is session tracking does not match the new connection.
Update
12 Nov 2013 11:08:17

Here is an update with some a more specific explanation as to what the problem we are seeing is:

On WBC FTTC, we can send a UDP packet inside the PPP and then drop the PPP a few seconds later. After the PPP re-establishes, UDP packets with the same source and destination IP and ports won't pass; they do not reach the LNS at the ISP.

Further to that, it's not just one src+dst IP and port tuple which is affected. We can send 254 UDP packets using different src+dest ports before we drop the PPP. After it comes back up, all 254 port combinations will fail. It is worth noting here that this cannot be reproduced on an FTTC service which allocates a dynamic IP which changes each time PPP re-established.

If we send more than 254 packets, only 254 will be broken and the others will work. It's not always the first 254 or last 254, the broken ones move around between tests.

So it sounds like the modem (or, less likely, something in the cab or exchange) is creating state table entries for packets it is passing which tie them to a particular PPP session, and then failing to flush the table when the PPP goes down.

This is a little crazy in the first place. It's a modem. It shouldn't even be aware that it's passing PPPoE frames, let along looking inside them to see that they are UDP.

This only happens when using an Openreach Huawei HG612 modem that we suspect has been recently remotely and automatically upgraded by Openreach in the past couple of months. Further - a HG612 modem with the 'unlocked' firmware does not have this problem. A HG612 modem that has probably not been automatically/remotely upgraded does not have this problem.

Side note: One theory is that the brokenness is actually happening in the street cab and not the modem. And that the new firmware in the modem which is triggering it has enabled 'link-state forwarding' on the modem's Ethernet interface.

Update
27 Nov 2013 10:09:42
This post has been a little quiet, but we are still working with BT/Openreach regarding this issue. We hope to have some more information to post in the next day or two.
Update
27 Nov 2013 10:10:13
We have also had reports from someone outside of AAISP reproducing this problem.
Update
27 Nov 2013 14:19:19
We have spent the morning with some nice chaps from Openreach and Huawei. We have demonstrated the problem and they were able to do traffic captures at various points on their side. Huawei HQ can now reproduce the problem and will investigate the problem further.
Update
28 Nov 2013 10:39:36
Adrian has posted about this on his blog: http://revk.www.me.uk/2013/11/bt-huawei-working-with-us.html
Update
13 Jan 14:09:08
We are still chasing this with BT.
Update
3 Apr 15:47:59
We have seen this affect SIP registrations (which use 5060 as the source and target)... Customers can contact us and we'll arrange a modem swap.
Update
23 Apr 10:21:03
BT are in the process of testing an updated firmware for the modems with customers. Any customers affected by this can contact us and we can arrange a new modem to be sent out.
Resolution BT are testing a fix in the lab and will deploy in due course, but this could take months. However, if any customers are adversely affected by this bug, please let us know and we can arrange for BT to send a replacement ECI modem instead of the Huawei modem. Thank you all for your patience.

--Update--
BT do have a new firmware that they are rolling out to the modems. So far it does seem to have fixed the fault and we have not heard of any other issues as of yet. If you do still have the issue, please reboot your modem, if the problem remains, please contact support@aa.net.uk and we will try and get the firmware rolled out to you.
Started 25 Oct 2013
Closed 23 Apr 10:21:03

19 Aug 12:59:53
Details
19 Aug 00:36:05
Initial reports suggest one of our fibre links to TalkTalk is down. This is affecting broadband lines using TalkTalk backhaul.
Update
19 Aug 00:43:35
00:05 TT Lines drop, looked like we had a router blip and a TT fibre blip - reasons yet unknown
00:15 Lines start to log back in
However, we are getting reports in intermittent access to some sites on internet - possible MTU related.
Update
19 Aug 01:33:16
MTU is still a problem. A workaround for the moment, is to lower the MTU setting in your router to 1432. Ideally this should not be needed, but try this until the problem is resolved.
Update
19 Aug 01:58:30
Other wholesalers using TT are reporting the same problem. TT helpdesk is aware of planned work that may be causing this. We have requested that that pass this MTU report on to the team involved in the planned work.
Update
19 Aug 07:14:05
TT tell us they think the problem with MTU has been fixed. We're still unsure at this moment, and will work with customers who still have problems.
Update
19 Aug 07:55:02
This is still a problem affecting customers using TT backhaul. TT are aware and are investigating. This is a result of a router upgrade within TT which looks to have been given incorrect settings.
Where possible, customers can change the MTU on their routers to be 1432
Update
19 Aug 08:55:47
We have been in contact with the TT Service Director who will be chasing this up internally at TT.
Update
19 Aug 09:05:48
Customers with bonded lines using TT and BT can turn off their TT modem or router for the time being.
Update
19 Aug 09:20:11
We are looking at re-routing TT connections through our secondary connection to TT...
Update
19 Aug 09:30:55
Traffic is now routing via our secondary connection to TT, this looks like it is not being routed via the faulty TT router and it is looks as if lines are passing traffic as normal
Update
19 Aug 09:55:32
Some customers are working OK, some are not.
The reason being is that we have 2 interconnects to TT. We are still seeing connections from both of them, however, we have a 1600 byte path from one but only 1500 from the other. The 1500 one is the one that TT did an upgrade on last night. So it looks like TT forgot to configure jumbo frames on an interface after the upgrade.
Needless to say, we've passed this information on to people at various levels within TT
Update
19 Aug 09:57:02
We are working on only accepting connections from TT via the working interconnect.
Update
19 Aug 10:39:32
We are forcing TT lines to reconnect, this should mean they then reconnect over the working interconnect and not the one with the faulty TT router.
Update
19 Aug 11:21:53
We are blocking connections from the faulty TT router and only accepting from the working one. This means when customers connect they have a working connection. However, this does mean that logins are being rejected from customers until they are routed via the working interconnect. It may take a few attempts for customers to connect first time.
Update
19 Aug 11:24:09
Some lines are taking a long time to come back. This is because they are still coming in via the broken interconnect - that we're rejecting. Unfortunately, affected lines just have to be left until they attempt to log in via the working interconnect. So, if we appear to be rejecting your login please leave your router to keep trying and it should fix itself.
Update
19 Aug 11:32:11
TT are reverting their upgrade from last night. This looks like it's underway at the moment.
Update
19 Aug 11:35:00
Latest from TT: "The roll back has been completed and the associated equipment has been restarted. Our (TT) engineers are currently performing system checks and a retest before confirming resolution on this incident. Further information will be provided shortly. "
Update
19 Aug 11:43:32
TT have completed their downgrade. It looks like the faulty link is working OK again, we'll be testing this before we unblock the link our side.
Update
19 Aug 13:01:55
We've re-enabled the faulty link, we are now back to normality! We do apologise for this outage. We will be discussing this fault and future upgrades of these TT routers with TT staff.
Started 19 Aug 00:05:00
Closed 19 Aug 12:59:53

13 Aug 09:15:00
Details
13 Aug 11:26:08
Due to a radius issue we were not receiving line statistics from just after midnight. As a result we needed to force lines to login again. This would have caused lines to lose their PPP connection and then reconnect at around 9AM. We apologise for this, and will be investigating the cause.
Started 13 Aug 09:00:00
Closed 13 Aug 09:15:00

8 Aug 15:25:00
Details
8 Aug 15:42:28
At 15:15 we saw customer on the 'D' LNS's lose their connection and reconnect a few moments later. The cause of this is being looked in to.
Resolution Lines quickly came back online, we apologise for the drop though. The cause will be investigated.
Started 8 Aug 15:15:00
Closed 8 Aug 15:25:00

1 Aug 10:00:00
Details
We saw what looks to be congestion on some lines on the Rugby exchange (BT lines). This shows a slight packet loss on Sunday evening. We'll report this to BT.
Update
30 Jul 11:03:08
Card replaced early hours this morning, which should have fixed the congestion problems.
Started 27 Jul 21:00:00
Closed 1 Aug 10:00:00

28 Jul 11:00:00
Details
28 Jul 09:20:03
Customers may have seen a drop and reconnect of their broadband lines this morning. Due to a problem with our RADIUS accounting on Sunday we have needed to restart our customer database server, Clueless. This has been done, and Clueless is back online. Due to the initial problem with RADIUS accounting most DSL lines have had to be restarted.
Update
28 Jul 10:02:13
We are also sending out order update messages in error - eg, emails about orders that have already completed. We apologise for this confusing and are investigating this.
Started 28 Jul 09:00:00
Closed 28 Jul 11:00:00

4 Aug
Details
29 Jul 07:19:26
We'll be moving some lines form "C" to "D" tonight after an issue early this morning. Later in the week we expect to do a rolling LNS upgrade over several nights. As usual this will be a PPP restart. You can set preferred time of night on the control pages.
Update
29 Jul 17:16:37
It is likely that the automated system to move lines from "C" to "D" will not work so this may be done in one go during the night or early morning. The knock on effects of the RADIUS issues early this morning have also caused to free usage, and some unexpected "line down" emails/texts/tweets. Sorry for any inconvenience.
Started 29 Jul 07:18:08
Closed 4 Aug
Previously expected 4 Aug

29 Jul 01:17:44
Details
28 Jul 21:38:18
We are having reports this evening of some lines being unable to log in, but are in sync. We are investigating.
Update
28 Jul 22:00:52
We believe we have identified the problem and are working on a fix.
Update
28 Jul 22:17:51
Lines are logging in successfully now. If you are still off, please keep trying.
Resolution An issue with authentication on the "C" LNS, and then on the "D" LNS. We have found the issue, and lines are connecting to "D" cleanly now. The underlying issue causing this is being investigated.
Started 28 Jul 21:37:18
Closed 29 Jul 01:17:44
Cause BT

17 Jul 17:45:00
Details
17 Jul 16:23:15
We have a few reports from customers, and a vague Incident report from BT that looks like there may be PPP problem within the BT network which is affecting customers logging in to us. Customers may see their ADSL router in sync, but not able to log in (no PPP).
Update
17 Jul 16:40:31
This looks to be affecting BT ADSL and FTTC circuits. A line which tries to log in may well fail.
Update
17 Jul 16:42:34
Some lines are logging in successfully now.
Update
17 Jul 16:54:15
Not all lines are back yet, but lines are still logging back in, so if you are still offline it may take a little more time.
Resolution This was a BT incident, reference IMT26151/14. This was closed by BT at 17:45 without giving us further details about what the problem was or what they did to restore service.
Started 17 Jul 16:00:00
Closed 17 Jul 17:45:00

28 Jul 12:10:28
Details
15 Jul 10:41:58
We are reworking the SMS/twitter/email line up/down notifications and hope to have the new system launched later this week. There may be slightly different wording of the messages.
Update
15 Jul 18:06:51
We're looking to do this in stages. i.e. switch over emails then texts then tweets or something like that. So please bear with us. Ideally the changes should not lose any messages.
Update
17 Jul 09:21:13
We have switched over to the new system - the most noticable change is that SMS and Tweets are now independant. You can have either or both if you require - settings are on the control pages. SMS still has a back off if you have lots of line flaps, but tweets and emails do not delay. Do let us have any feedback on the new system.
Started 16 Jul
Closed 28 Jul 12:10:28
Previously expected 20 Jul

15 Jul 12:52:51
Details
15 Jul 12:52:51
The usage reports sent on 15th of the month for customers that have requested it have apparently not all worked. Some were blank.

These are being resent now, so apologies if you get two of them.

Started 15 Jul

11 Jul 11:03:55
Details
11 Jul 17:00:48
The "B" LNS restarted today, unexpectedly. All lines reconnected within minutes (however fast the model retries). We'll clear some traffic off the "D" server back to the "B" server later this evening.
Resolution We're investigating the cause of this.
Broadband Users Affected 33%
Started 11 Jul 11:03:52
Closed 11 Jul 11:03:55

10 Jul 20:10:00
Details
10 Jul 19:18:35
We are seeing a problem with BT 21CN ADSL and FTTC circuits being unable to log in since approximately 18:00 today. Existing sessions are working fine but are failing to reconnect when they drop. 20CN ADSL and TalkTalk backhaul circuits are working fine.

BT have raised incident IMT25152/14 which looks to be related, but just says they are investigating a problem.

Update
10 Jul 22:16:28
BT have reported that service should have been restored as of 20:10 this evening.

Customers who are still having problems should attempt to re-connect as they may be stuck on a BT holding session.

Anyone still having problems after doing that should contact tech support.

Started 10 Jul 17:15:00
Closed 10 Jul 20:10:00
Cause BT

1 Jul 23:25:00
Details
1 Jul 20:50:32
We have identified some TalkTalk back haul lines with congestion starting around 16:20 and now 100ms with 2% loss. This affects around 3% of our TT lines.

We have techies in TalkTalk on the case and hope to have it resolved soon.

Update
1 Jul 20:56:19
"On call engineers are being scrambled now - we have an issue in the wider Oxford area and you should see an incident coming through shortly."
Resolution Engineers fixed the issue last night.
Started 1 Jul 16:20:00
Closed 1 Jul 23:25:00
Previously expected 2 Jul

19 Jun 14:33:59
Details
11 Mar 10:11:55
We are seeing multiple exchanges with packet loss over BT wholesale. We are chasing BT on this and will update as and when we have updates. GOODMAYES CANONBURY HAINAULT SOUTHWARK LOUGHTON HARLOW NINE ELMS UPPER HOLLOWAY ABERDEEN DENBURN HAMPTON INGREBOURNE COVENTRY 21CN-BRAS-RED6-SF
Update
14 Mar 12:49:28
This has now been escalated to the next level for further investigation.
Update
17 Mar 15:42:38
BT are now raising faults on each Individual exchange.
Update
21 Mar 10:19:24
Below are the exchanges/RAS which has been fixed by capacity upgrades. We are hoping for the remanding four exchanges to be fixed in the next few days.
HAINAULT
SOUTHWARK
LOUGHTON
HARLOW
ABERDEEN DENBURN
HAMPTON
INGREBOURNE
GOODMAYERS
RAS 21CN-BRAS-RED6-SF
Update
21 Mar 15:52:45
COVENTRY should be resolved later this evening when a new link is installed between Nottingham and Derby. CANONBURY is waiting for CVLAN moves that begin 19/03/2014 and will be competed 01/04/2014.
Update
25 Mar 10:09:23
CANONBURY - Planned Engineering works have taken place on 19.3.14, and there are three more planned 25.3.14 , 26.3.14 and 1.4.14.
COVENTRY - Is now fixed
NINE ELMS and UPPER HOLLOWAY- Still suffering from packet loss and BT are investigating further.
Update
2 Apr 15:27:11
BT are still investigating congestion on Canonbury, Nine Elms and Upper Holloway.
Update
23 Apr 11:45:44
CANONBURY - further PEW's on 7th and 8th May
NINE ELMS - A total of 384 EU’s have been migrated. A further 614 are planned to be migrated in the early hours of the 25/04/14.
UPPER HOLLOWAY - Planned Engineering Work on 28th April
BEULAH HILL and TEWKESBURY - Seeing congestion peak times and Chasing BT on this also.
Update
30 Apr 12:51:24
NINE ELMS - T11003 - Still ongoing investigations for nine elms.
UPPER HOLLOWAY - T11004 - BT are working on this and a resolution should be available soon.
TEWKESBURY - T11200 - This is on the Backhaul list and will be dealt with shortly. Work request closed as no investigation required. BT are working on this and a resolution should be available soon.
MONMOUTH - T11182 - ALS583669 - This was balanced. I have advised BT that this is still not up to standards. They will continue to investigate. This is on the Backhaul Spreadhsheet also. So this is being investigate by capacity.
BEULAH HILL - Being investigated.
Update
2 May 12:45:16
CANONBURY - 580 EU's being migrated on 7th May and 359 EU's on 8th May
NINE ELMS - Emergency Pew PW238650 that will take place in the early hours on the 02/05/14. This is to move 500 circuits off 4 ISPV's onto other IPSV's.
UPPER HOLLOWAY - Currently BT TSO have 12 projects scheduled for upper Holloway.
TEWKESBURY - This is with BT TSO / Backhaul upgrades.
MONMOUTH - This is with BT TSO / Backhaul upgrades.
BEULAH HILL - Possibly fixed last night. Will monitor to see if any better this evening
BAYSWATER - Packet loss identified and reported to BT
Update
6 May 11:44:59
TEWKESBURY - Fixed
CANONBURY - EU's being migrated on 7th May and 359 EU's on 8th May

Still seeing some lines with issues after the upgrade. Passed back to BT.
NINE ELMS
MONMOUTH
UPPER HOLLOWAY
BEULAH HILL
READING EARLEY
Update
9 May 16:16:33
CANONBURY - NINE ELMS- Now fixed
UPPER HOLLOWAY - Have asked the team dealing for the latest update. Email sent today 9/05/2014
MONMOUTH - BT TSO are still chasing this.
BEULAH HILL - BT Tso chasing for a date on a PEW for work to be carried out.
BAYSWATER - BT TSO are still chasing this
READING EARLEY - Unbalanced LAG identified. Rebalancing will be completed out of hours. No ETA on this sorry.
Update
15 May 10:47:22
UPPER HOLLOWAY - Now fixed
MONMOUTH - We have been advised that the target date for the capacity increase is the 22nd May.
BEULAH HILL - Escalated this to a Duty Manager asking if he can gain an update.
EARLEY - TSO advised Capacity team have replied and hope to get the new 10gig links into service this month. No further updates, so escalated to Duty Manager to try and ascertain a specific date in May 2014 when this will take place.
Update
21 May 09:32:00
Reading Early / Monmouth - Now fixed
Bayswater - We have received a reply from the capacity management team, advising that to alleviate capacity issues, moves are taking place on May 23rd and May 28th.
Beulah Hill - Due to issues with cabling this has been delayed , we are currently awaiting a date that the cables can be run so that the integration team can bring this into service
Update
2 Jun 15:15:55
Bayswater - Now fixed
Belauh Hill - To alleviate capacity issues, moves are taking place between June 2nd and June 6th.
Update
10 Jun 12:16:52
Belauh Hill - Now fixed
AYR - Seeing congestion on many lines, which has been reported.
Update
19 Jun 14:33:06
AYR - Is now fixed
Broadband Users Affected 1%
Started 9 Mar 10:08:25 by AAISP Pro Active Monitoring Systems
Closed 19 Jun 14:33:59

11 Jun 15:08:59
Details
11 Jun 15:12:53
It looks like one of our LNSs restarted. This will have affected a third of our broadband customers. Lines all reconnected straight away and customers should not see any further problems. The usage graphs from midnight until the restart will have been lost.
Broadband Users Affected 33%
Started 11 Jun 15:05:00
Closed 11 Jun 15:08:59

25 May 08:02:51
Details
23 May 20:05:56
We are making a number of changes to the main page on clueless, and the search options for dealers/managers. This should be gradually applied over the weekend as work is done. The end result should be faster and more flexible. Any issues do ask RevK on irc.
Resolution We have done the main work on this - changing over the search system completely. This has meant some small details having been removed which will be added back over the coming week depending on demand.
Started 23 May 16:00:00
Closed 25 May 08:02:51
Previously expected 27 May

12 May 08:55:06
Details
10 May 15:52:02
At 15:33 all 20CN lines on Kingston RASs dropped. We are chasing BT now.
Update
10 May 16:05:18
BT have raised an incident. Apparently issue has been caused by power issues at London Kingston.
Update
12 May 08:55:29
This was fixed after power was restored and a remote reset was performed.
Started 10 May 15:50:27 by AAISP Staff
Closed 12 May 08:55:06
Cause BT

10 May 13:18:53
Details
10 May 13:18:53
A number of customers had asked us about recent news reports that ISPs will be sending educational letters to customers suspected of downloading media without appropriate permission from the copyright holder.

Please be assured that AAISP are under no obligation to send such letters, any more than the power companies that power and charge the devices used for such activities, or the device manufacturers. We have no intention of sending such letters.

As always, if we receive an abuse report it will either go directly to the customer as the contact details on the whois for IP addresses, or come to us and we will simply pass it on (as well as pointing out the sender that we are neither the police or the civil courts).

I hope this clears up any misunderstandings.

Started 10 May

28 Apr 13:37:28
Details
24 Apr 14:23:02
Some TalkTalk connected lines dropped at around 14:14. They are reconnecting now though. We'll investigate and will update this post.
Update
24 Apr 14:29:01
This looked like it was a wider TalkTalk problem as other ISPs were also affected.
Most lines are back online now though. We will investigate further.
Update
24 Apr 14:40:50
TalkTalk have been contacted and a Reason for Outage has been requested.
Update
24 Apr 15:02:33
TalkTalk have confirmed the outage on their status page: http://solutions.opal.co.uk/network-status-report.php?reportid=3893
Update
24 Apr 16:24:24
Update from TalkTalk: 15:59 24/04/2014 Supplier has noticed a link flap between two exchanges which resulted in brief loss of service for some DSL customers. The traffic was reconverged over alternative links. Supplier is still investigating for the root cause.
Resolution Incident was due to a transmission failure which the supplier is investigating with the switch vendor. We've also had this update from TalkTalk: The cause was identified as a blown rectifier.
Started 24 Apr 14:14:00
Closed 28 Apr 13:37:28