Order posts by limited to posts

7 Oct 06:17:13
Details
3 Oct 16:25:24
As we advised, we have had to make some radical changes to our billing to fix database load issues. These have gone quite well overall, but there have been a few snags. We think we have them all now, but this month we had to revert some usage charging giving some free usage.

We have identified that quarterly billed customers on units tariffs were not charged, so these are being applied shortly as a new invoice. Anyone with excess usage as a result, please do ask accounts for a credit.

We have also identified that call charges have not been billed - these can be billed to date if anyone asks, or if you leave it then they should finally catch up on next month's bill.

Sorry for any inconvenience.

Started 1 Oct
Expected close 1 Nov

15 Oct 17:14:18
Details
6 Oct 14:22:50
For the next week or so we're considering 5am-7am to be a PEW window for some very low disruption work (a few seconds of "blip"). We're still trying very hard to improve our network configuration and router code to create a much more stable network. It seems, from recent experience, that this sort of window will be least disruptive to customers. It is a time where issues can be resolve by staff if needed (which is harder at times like 3am) and we get more feedback from end users. As before, we expect this work to have no impact in most cases, and maybe a couple of seconds of routing issues if it is not quite to plan. Sadly, all of our efforts to create the same test scenarios "on the bench" have not worked well. At this stage we are reviewing code to understand Sunday morning's work better, and this may take some time before we start. We'll update here and on irc before work is done. Thank you for your patience.
Update
7 Oct 09:06:41
We did do work around 6:15 to 6:30 today - I thought I had posted an update here before I started but somehow it did not show. If we do any more, I'll try and make it a little earlier.
Update
8 Oct 05:43:11
Doing work a little earlier today. We don't believe we caused any blips with today's testing.
Update
9 Oct 05:47:53
Another early start and went very well.
Update
10 Oct 08:22:53
We updated remaining core routers this morning, and it seemed to go very well. Indeed pings we ran had zero loss when upgrading routers in Telecity. However, we did lose TalkTalk broadband lines in the process. These all reconnected straight away, but we are no reviewing how this happens to try and avoid it in future.
Resolution Closing this PEW from last week. We may need to do more work at some point, but we are getting quite good at this now.
Started 7 Oct 06:00:00
Closed 15 Oct 17:14:18
Previously expected 14 Oct 07:00:00

5 Oct 07:26:50
Details
3 Oct 10:41:59
We do plan to upgrade routers again over the weekend, probably early saturday morning (before 9am). I'll post on irc at the time and update this notice.

The work this week means we expect this to be totally seamless, but the only way to actually be sure is to try it.

If we still see any issues we'll do more on Sunday.

Update
4 Oct 06:54:19
Upgrades starting shortly.
Update
4 Oct 07:24:47
Almost perfect!

We loaded four routers, each at different points in the network. We ran a ping that went through all four routers whilst doing this. For three of them we did see ping drop a packet. The fourth we did not see a drop at all.

This may sound good, but it should be better - we should not lose a single packet doing this. We're looking at the logs to work out why, and may try again Sunday morning.

Thank you for your patience.

Update
4 Oct 07:53:52
Plan for tomorrow is to pick one of the routers that did drop a ping, and shut it down and hold it without restarting - at that point we can investigate what is still routing via it and why. This should help us explain the dropped ping. Assuming that provides the clues we need we may load or reconfigure routers later on Sunday to fix it.
Update
5 Oct 06:57:39
We are starting work shortly.
Update
5 Oct 07:11:00
We are doing the upgrades as planned, but not able to do the level of additional diagnostics we wanted. We may look in to that next weekend.
Resolution Only 3 routers were upgraded, the 3rd having several seconds of issues. We will investigate the logs and do another planned work. It seems early morning like this is less disruptive to customers.
Started 4 Oct
Closed 5 Oct 07:26:50
Previously expected 6 Oct

2 Oct 19:05:55
Details
2 Oct 19:05:15
We'd like to thank customers for patience this week. The tests we have been doing in the evenings have been invaluable. The issues seen have mostly related to links to Maidenhead (so voice calls rather than broadband connections).

The work we are doing has involved a lot of testing "on the bench" and even in our offices (to the annoyance of staff) but ultimately testing on the live customer services is the final test. The results have been informative and we are very close to out goal now.

The goal is to allow router maintenance with zero packet loss. We finally have the last piece in the jigsaw for this, and so should have this in place soon. Even so, there may be some further work to achieve this.

Apart from a "Nice to have" goal, this also relates to failures of hardware, power cuts, and software crashes. The work is making the network configuration more robust and should allow for key component failures with outages as short as 300ms in some cases. LNS issues tend to take longer for PPP to reconnect, but we want to try and be as robust as possible.

So, once again, thank you all for your patience while we work on this. There may be some more planned works which really should now be invisible to customers.

Started 2 Oct 19:00:41

1 Oct 17:49:32
Details
30 Sep 18:04:06
Having been very successful with the router upgrade tonight, we are looking to move to the next router on Wednesday. Signs so far are that this should be equally seamless. We are, however, taking this slowly, one step at a time, to be sure.
Resolution We loaded 4 routers in all, and some were almost seamless, and some had a few seconds of outage, it was not perfect but way better than previously. We are now going to look in to the logs in detail and try to understand what we do next.

Our goal here is zero packet loss for maintenance.

I'd like to thank all those on irc for their useful feedback during these test.

Started 1 Oct 17:00:00
Closed 1 Oct 17:49:32
Previously expected 1 Oct 18:00:00

30 Sep 18:02:25
Details
29 Sep 21:57:11
We are going to spend much of tomorrow trying to track down why things did not go smoothly tonight, and hope to have a solution by tomorrow (Tuesday) evening.

This time I hope to make a test load before the peak period at 6pm, so between 5pm and 6pm when things are a bit of a lull between business and home use.

If all goes to plan there will be NO impact at all, and that is what we hope. If so we will update three routers with increasing risk of impact, and abort if there are any issues.

Please follow things on irc tomorrow.

If this works as planned we will finally have all routers under "seamless upgrade" processes.

Update
30 Sep 08:29:42
Tests on our internal systems this morning confirm we understand what went wrong last night, and as such the upgrade tonight should be seamless.

For the technically minded, we had an issue with VRRP becoming master too soon, i.e. before all routes are installed. The routing logic is now linked to VRRP to avoid this scenario, regarless of how long routing takes.

Resolution The upgrade went very nearly perfectly on the first router - we believe the only noticeable impact was the link to our office, which we think we understand now. However, we did only do the one router this time.
Started 30 Sep 17:00:00
Closed 30 Sep 18:02:25
Previously expected 30 Sep 18:00:00

29 Sep 22:37:36
Details
21 Aug 12:50:32
Over the past week or so we have been missing data on some monitoring graphs, this is shown as purple for the first hour in the morning. This is being caused by delays in collecting the data. This is being looked in to.
Resolution We believe this has been fixed now. We have been monitoring it for a fortnight after making an initial fix, and it looks to have been successful.
Closed 29 Sep 22:37:36

29 Sep 19:29:19
Details
29 Sep 14:06:12
We expect to reload a router this evening, which is likely to cause a few seconds of routing issues. This is part of trying to address the blips caused by router upgrades, which are meant to be seamless.
Update
29 Sep 18:48:37
The reload is expected shortly, and will be on two boxes at least. We are monitoring the effect of the changes we have made. They should be a big improvement.
Resolution Upgrade was tested only on one router (Maidenhead) and caused some nasty impact on routing to call servers and control systems - general DSL was unaffected. Changes are backed out now, and back to drawing board. Further PEW will be announced as necessary.
Started 29 Sep 17:00:00
Closed 29 Sep 19:29:19
Previously expected 29 Sep 23:00:00

29 Sep 13:17:50
Details
29 Sep 08:48:37
Some updates to the billing system have caused a problem for units billed customers resulting in their usage for next month starting early, i.e. usage is now being logged for October.

Because of the way usage carriers forward, this is unlikey to have much impact on customer in terms of additional charges. However, any customers that think they have lost out, please let us know and we'll make a manual adjustment.

The problem has been corrected for next month.

Update
29 Sep 08:57:00
It looks like customers won't get billed top-up and may not get billed units either, so we are working on un-doing this issue so that billing is done normally. Please bear with us.
Update
29 Sep 09:23:40
We are working on this now and should have usage billing back to normal later this morning.
Resolution Usage billing has been restored to around 1am Saturday, giving customers 2.5 days of unmetered usage.
Started 29 Sep 08:45:12
Closed 29 Sep 13:17:50

28 Sep 19:20:54
Details
28 Sep 18:52:50
We are experiencing a network problem affecting our broadband customers. Staff are investigating.
Update
28 Sep 19:08:28
This is looking like some sort of Denial of Service attack. We're lookig at mitigating this.
Update
28 Sep 19:16:36
The traffic has died down, things are starting to look better.
Update
28 Sep 19:21:46
Traffic is now back to normal.
Started 28 Sep 18:30:00
Closed 28 Sep 19:20:54

20 Sep 07:09:09
Details
20 Sep 11:59:13
RADIUS account is behind at the moment. This is causing the usage data to appear as missing from customer lines. The accounting is behind, but it's not broken, and is catching up. The usage data doesn't appear to be lost, and should appear later in the day.
Update
21 Sep 08:12:52
Records have now caught up.
Closed 20 Sep 07:09:09
Previously expected 20 Sep 15:57:11

26 Aug 09:15:00
Details
26 Aug 09:02:02
Yesterday's and today's line graphs are not being shown at the moment. We are working on restoring this.
Update
26 Aug 09:42:18
Today's graphs are back, yesterdays are lost though.
Started 26 Aug 08:00:00
Closed 26 Aug 09:15:00

29 Sep 16:57:23
Details
2 Sep 17:15:50
We had a blip on one of the LNSs yesterday, so we are looking to roll out some updates over this week which should help address this, and some of the other issues last month. As usual LNS upgrades would be over night. We'll be rolling out to some of the other routers first, which may mean a few seconds of routing changes.
Update
7 Sep 09:43:40
Upgrades are going well, but we are taking this slowly, and have not touched the LNSs yet. Addressing stability issues is always tricky as it can be weeks or months before we know we have actually fixed the problems. So far we have managed to identify some specific issues that we have been able to fix. We obviously have to be very careful to ensure these "fixes" do not impact normal service in any way. As such I have extended this PEW another week.
Update
13 Sep 11:07:13
We are making significant progress on this. Two upgrades are expected today (Saturday 13th) which should not have any impact. We are also working on ways to make upgrades properly seamless (which is often the case, but not always).
Update
14 Sep 17:21:35
Over the weekend we have done a number of tests, and we have managed to identify specific issues and put fixes in place on some of the routers on the network to see how they go.

This did lead to some blips (around 9am and 5pm on Sunday for example). We think we have a clearer idea on what happened with these too, and so we expect that we will load some new code early tomorrow or late tonight which may mean another brief blip. This should allow us to be much more seamless in future.

Later in the week we expect to roll out code to more routers.

Update
16 Sep 16:57:07
We really think we have this sussed now - including reloads that have near zero impact on customers. We have a couple more loads to do this week (including one at 5pm today), and some over night rolling LNS updates.
Update
17 Sep 12:23:59
The new release is now out, and we are planning upgrades this evening (from 5pm) and one of the LNSs over night. This should be pretty seamless now. At the end of the month we'll upgrade the second half of the core routers, assuming all goes well. Thank you for your patience.
Update
18 Sep 17:15:27
FYI, there were a couple of issues with core routers today, at least one of which would have impacted internet routing for some destinations for several seconds. These issues were on the routers which have not yet been upgraded, which is rather encouraging. We are, of course, monitoring the situatuion carefully. The plan is still to upgrade the second half of the routers at the end of the month.
Update
19 Sep 12:12:42
One of our LNS's (d.gormless) did restart unexpectedly this morning - this router is scheduled to be upgraded tonight.
Update
28 Sep 13:25:10
The new release has been very stable for the last week and is being upgraded on remaining routers during Sunday.
Resolution Stable releases loaded at weekend
Started 2 Sep 18:00:00
Closed 29 Sep 16:57:23
Previously expected 19 Sep

2 Sep 17:08:13
Details
2 Sep 15:38:09
Some people use the test LNS (doubtless) for various reasons, and it is also used some of the time for our NAT64 gateway.

We normally do re-loads on doubtless to test things with no notice, but we expect there may be quite a few this afternoon/evening as we are trying to track down an issue with new code that is not showing on the bench test systems.

As usual this is a PPP reset and reconnect and if it crashes may be a few seconds extra outage. With any luck this will not take many resets to find the issue.

Resolution Testing went well.
Started 2 Sep 15:40:00
Closed 2 Sep 17:08:13
Previously expected 3 Sep

1 Sep 19:42:08
Details
1 Sep 19:42:56
c.gormless rebooted, lines moved to other LNS automatically. We are investigating.
Broadband Users Affected 33%
Started 1 Sep 19:39:19
Closed 1 Sep 19:42:08

23 Apr 10:21:03
Details
01 Nov 2013 15:05:00
We have identified an issue that appears to be affecting some customers with FTTC modems. The issue is stupidly complex, and we are still trying to pin down the exact details. The symptoms appear to be that some packets are not passing correctly, some of the time.

Unfortunately one of the types of packet that refuses to pass correctly are FireBrick FB105 tunnel packets. This means customers relying on FB105 tunnels over FTTC are seeing issues.

The work around is to remove the ethernet lead to the modem and then reconnect it. This seems to fix the issue, at least until the next PPP restart. If you have remote access to a FireBrick, e.g. via WAN IP, and need to do this you can change the Ethernet port settings to force it to re-negotiate, and this has the same effect - this only works if directly connected to the FTTC modem as the fix does need the modem Ethernet to restart.

We are asking BT about this, and we are currently assuming this is a firmware issue on the BT FTTC modems.

We have confirmed that modems re-flashed with non-BT firmware do not have the same problem, though we don't usually recommend doing this as it is a BT modem and part of the service.

Update
04 Nov 2013 16:52:49
We have been working on getting more specific information regarding this, we hope to post an update tomorrow.
Update
05 Nov 2013 09:34:14
We have reproduced this problem by sending UDP packets using 'Scapy'. We are doing further testing today, and hope to write up a more detailed report about what we are seeing and what we have tested.
Update
05 Nov 2013 14:27:26
We have some quite good demonstrations of the problem now, and it looks like it will mess up most VPNs based on UDP. We can show how a whole range of UDP ports can be blacklisted by the modem somehow on the next PPP restart. It is crazy. We hope to post a little video of our testing shortly.
Update
05 Nov 2013 15:08:16
Here is an update/overview of the situation. (from http://revk.www.me.uk/2013/11/bt-huawei-fttc-modem-bug-breaking-vpns.html )

We have confirmed that the latest code in the BT FTTC modems appears to have a serious bug that is affecting almost anyone running any sort of VPN over FTTC.

Existing modems seem to be upgrading, presumably due to a roll out of new code in BT. An older modem that has not been on-line a while is fine. A re-flashed modem with non-BT firmware is fine. A working modem on the line for a while suddenly stopped working, presumably upgraded.

The bug appears to be that the modem manages to "blacklist" some UDP packets after a PPP restart.

If we send a number of UDP packets, using various UDP ports, then cause PPP to drop and reconnect, we then find that around 254 combinations of UDP IP/ports are now blacklisted. I.e. they no longer get sent on the line. Other packets are fine.

Sending 500 different packets, around 254 of them will not work again after the PPP restart. It is not actually the first or last 254 packets, some in the middle, but it seems to be 254 combinations. They work as much as you like before the PPP restart, and then never work after it.

We can send a batch of packets, wait 5 minutes, PPP restart, and still find that packets are now blacklisted. We have tried a wide range of ports, high and low, different src and dst ports, and so on - they are all affected.

The only way to "fix" it, is to disconnect the Ethernet port on the modem and reconnect. This does not even have to be long enough to drop PPP. Then it is fine until the next PPP restart. And yes, we have been running a load of scripts to systematically test this and reproduce the fault.

The problem is that a lot of VPNs use UDP and use the same set of ports for all of the packets, so if that combination is blacklisted by the modem the VPN stops after a PPP restart. The only way to fix it is manual intervention.

The modem is meant to be an Ethernet bridge. It should not know anything about PPP restarting or UDP packets and ports. It makes no sense that it would do this. We have tested swapping working and broken modems back and forth. We have tested with a variety of different equipment doing PPPoE and IP behind the modem.

BT are working on this, but it is a serious concern that this is being rolled out.
Update
12 Nov 2013 10:20:18
Work on this in still ongoing... We have tested this on a standard BT retail FTTC 'Infinity' line, and the problem cannot be reproduced. We suspect this is because when the PPP re-establishes a different IP address is allocated each time, and whatever is session tracking does not match the new connection.
Update
12 Nov 2013 11:08:17

Here is an update with some a more specific explanation as to what the problem we are seeing is:

On WBC FTTC, we can send a UDP packet inside the PPP and then drop the PPP a few seconds later. After the PPP re-establishes, UDP packets with the same source and destination IP and ports won't pass; they do not reach the LNS at the ISP.

Further to that, it's not just one src+dst IP and port tuple which is affected. We can send 254 UDP packets using different src+dest ports before we drop the PPP. After it comes back up, all 254 port combinations will fail. It is worth noting here that this cannot be reproduced on an FTTC service which allocates a dynamic IP which changes each time PPP re-established.

If we send more than 254 packets, only 254 will be broken and the others will work. It's not always the first 254 or last 254, the broken ones move around between tests.

So it sounds like the modem (or, less likely, something in the cab or exchange) is creating state table entries for packets it is passing which tie them to a particular PPP session, and then failing to flush the table when the PPP goes down.

This is a little crazy in the first place. It's a modem. It shouldn't even be aware that it's passing PPPoE frames, let along looking inside them to see that they are UDP.

This only happens when using an Openreach Huawei HG612 modem that we suspect has been recently remotely and automatically upgraded by Openreach in the past couple of months. Further - a HG612 modem with the 'unlocked' firmware does not have this problem. A HG612 modem that has probably not been automatically/remotely upgraded does not have this problem.

Side note: One theory is that the brokenness is actually happening in the street cab and not the modem. And that the new firmware in the modem which is triggering it has enabled 'link-state forwarding' on the modem's Ethernet interface.

Update
27 Nov 2013 10:09:42
This post has been a little quiet, but we are still working with BT/Openreach regarding this issue. We hope to have some more information to post in the next day or two.
Update
27 Nov 2013 10:10:13
We have also had reports from someone outside of AAISP reproducing this problem.
Update
27 Nov 2013 14:19:19
We have spent the morning with some nice chaps from Openreach and Huawei. We have demonstrated the problem and they were able to do traffic captures at various points on their side. Huawei HQ can now reproduce the problem and will investigate the problem further.
Update
28 Nov 2013 10:39:36
Adrian has posted about this on his blog: http://revk.www.me.uk/2013/11/bt-huawei-working-with-us.html
Update
13 Jan 14:09:08
We are still chasing this with BT.
Update
3 Apr 15:47:59
We have seen this affect SIP registrations (which use 5060 as the source and target)... Customers can contact us and we'll arrange a modem swap.
Update
23 Apr 10:21:03
BT are in the process of testing an updated firmware for the modems with customers. Any customers affected by this can contact us and we can arrange a new modem to be sent out.
Resolution BT are testing a fix in the lab and will deploy in due course, but this could take months. However, if any customers are adversely affected by this bug, please let us know and we can arrange for BT to send a replacement ECI modem instead of the Huawei modem. Thank you all for your patience.

--Update--
BT do have a new firmware that they are rolling out to the modems. So far it does seem to have fixed the fault and we have not heard of any other issues as of yet. If you do still have the issue, please reboot your modem, if the problem remains, please contact support@aa.net.uk and we will try and get the firmware rolled out to you.
Started 25 Oct 2013
Closed 23 Apr 10:21:03

19 Aug 12:59:53
Details
19 Aug 00:36:05
Initial reports suggest one of our fibre links to TalkTalk is down. This is affecting broadband lines using TalkTalk backhaul.
Update
19 Aug 00:43:35
00:05 TT Lines drop, looked like we had a router blip and a TT fibre blip - reasons yet unknown
00:15 Lines start to log back in
However, we are getting reports in intermittent access to some sites on internet - possible MTU related.
Update
19 Aug 01:33:16
MTU is still a problem. A workaround for the moment, is to lower the MTU setting in your router to 1432. Ideally this should not be needed, but try this until the problem is resolved.
Update
19 Aug 01:58:30
Other wholesalers using TT are reporting the same problem. TT helpdesk is aware of planned work that may be causing this. We have requested that that pass this MTU report on to the team involved in the planned work.
Update
19 Aug 07:14:05
TT tell us they think the problem with MTU has been fixed. We're still unsure at this moment, and will work with customers who still have problems.
Update
19 Aug 07:55:02
This is still a problem affecting customers using TT backhaul. TT are aware and are investigating. This is a result of a router upgrade within TT which looks to have been given incorrect settings.
Where possible, customers can change the MTU on their routers to be 1432
Update
19 Aug 08:55:47
We have been in contact with the TT Service Director who will be chasing this up internally at TT.
Update
19 Aug 09:05:48
Customers with bonded lines using TT and BT can turn off their TT modem or router for the time being.
Update
19 Aug 09:20:11
We are looking at re-routing TT connections through our secondary connection to TT...
Update
19 Aug 09:30:55
Traffic is now routing via our secondary connection to TT, this looks like it is not being routed via the faulty TT router and it is looks as if lines are passing traffic as normal
Update
19 Aug 09:55:32
Some customers are working OK, some are not.
The reason being is that we have 2 interconnects to TT. We are still seeing connections from both of them, however, we have a 1600 byte path from one but only 1500 from the other. The 1500 one is the one that TT did an upgrade on last night. So it looks like TT forgot to configure jumbo frames on an interface after the upgrade.
Needless to say, we've passed this information on to people at various levels within TT
Update
19 Aug 09:57:02
We are working on only accepting connections from TT via the working interconnect.
Update
19 Aug 10:39:32
We are forcing TT lines to reconnect, this should mean they then reconnect over the working interconnect and not the one with the faulty TT router.
Update
19 Aug 11:21:53
We are blocking connections from the faulty TT router and only accepting from the working one. This means when customers connect they have a working connection. However, this does mean that logins are being rejected from customers until they are routed via the working interconnect. It may take a few attempts for customers to connect first time.
Update
19 Aug 11:24:09
Some lines are taking a long time to come back. This is because they are still coming in via the broken interconnect - that we're rejecting. Unfortunately, affected lines just have to be left until they attempt to log in via the working interconnect. So, if we appear to be rejecting your login please leave your router to keep trying and it should fix itself.
Update
19 Aug 11:32:11
TT are reverting their upgrade from last night. This looks like it's underway at the moment.
Update
19 Aug 11:35:00
Latest from TT: "The roll back has been completed and the associated equipment has been restarted. Our (TT) engineers are currently performing system checks and a retest before confirming resolution on this incident. Further information will be provided shortly. "
Update
19 Aug 11:43:32
TT have completed their downgrade. It looks like the faulty link is working OK again, we'll be testing this before we unblock the link our side.
Update
19 Aug 13:01:55
We've re-enabled the faulty link, we are now back to normality! We do apologise for this outage. We will be discussing this fault and future upgrades of these TT routers with TT staff.
Started 19 Aug 00:05:00
Closed 19 Aug 12:59:53

13 Aug 09:15:00
Details
13 Aug 11:26:08
Due to a radius issue we were not receiving line statistics from just after midnight. As a result we needed to force lines to login again. This would have caused lines to lose their PPP connection and then reconnect at around 9AM. We apologise for this, and will be investigating the cause.
Started 13 Aug 09:00:00
Closed 13 Aug 09:15:00

8 Aug 15:25:00
Details
8 Aug 15:42:28
At 15:15 we saw customer on the 'D' LNS's lose their connection and reconnect a few moments later. The cause of this is being looked in to.
Resolution Lines quickly came back online, we apologise for the drop though. The cause will be investigated.
Started 8 Aug 15:15:00
Closed 8 Aug 15:25:00

1 Aug 10:00:00
Details
We saw what looks to be congestion on some lines on the Rugby exchange (BT lines). This shows a slight packet loss on Sunday evening. We'll report this to BT.
Update
30 Jul 11:03:08
Card replaced early hours this morning, which should have fixed the congestion problems.
Started 27 Jul 21:00:00
Closed 1 Aug 10:00:00

28 Jul 11:00:00
Details
28 Jul 09:20:03
Customers may have seen a drop and reconnect of their broadband lines this morning. Due to a problem with our RADIUS accounting on Sunday we have needed to restart our customer database server, Clueless. This has been done, and Clueless is back online. Due to the initial problem with RADIUS accounting most DSL lines have had to be restarted.
Update
28 Jul 10:02:13
We are also sending out order update messages in error - eg, emails about orders that have already completed. We apologise for this confusing and are investigating this.
Started 28 Jul 09:00:00
Closed 28 Jul 11:00:00

4 Aug
Details
29 Jul 07:19:26
We'll be moving some lines form "C" to "D" tonight after an issue early this morning. Later in the week we expect to do a rolling LNS upgrade over several nights. As usual this will be a PPP restart. You can set preferred time of night on the control pages.
Update
29 Jul 17:16:37
It is likely that the automated system to move lines from "C" to "D" will not work so this may be done in one go during the night or early morning. The knock on effects of the RADIUS issues early this morning have also caused to free usage, and some unexpected "line down" emails/texts/tweets. Sorry for any inconvenience.
Started 29 Jul 07:18:08
Closed 4 Aug
Previously expected 4 Aug

29 Jul 01:17:44
Details
28 Jul 21:38:18
We are having reports this evening of some lines being unable to log in, but are in sync. We are investigating.
Update
28 Jul 22:00:52
We believe we have identified the problem and are working on a fix.
Update
28 Jul 22:17:51
Lines are logging in successfully now. If you are still off, please keep trying.
Resolution An issue with authentication on the "C" LNS, and then on the "D" LNS. We have found the issue, and lines are connecting to "D" cleanly now. The underlying issue causing this is being investigated.
Started 28 Jul 21:37:18
Closed 29 Jul 01:17:44
Cause BT

17 Jul 17:45:00
Details
17 Jul 16:23:15
We have a few reports from customers, and a vague Incident report from BT that looks like there may be PPP problem within the BT network which is affecting customers logging in to us. Customers may see their ADSL router in sync, but not able to log in (no PPP).
Update
17 Jul 16:40:31
This looks to be affecting BT ADSL and FTTC circuits. A line which tries to log in may well fail.
Update
17 Jul 16:42:34
Some lines are logging in successfully now.
Update
17 Jul 16:54:15
Not all lines are back yet, but lines are still logging back in, so if you are still offline it may take a little more time.
Resolution This was a BT incident, reference IMT26151/14. This was closed by BT at 17:45 without giving us further details about what the problem was or what they did to restore service.
Started 17 Jul 16:00:00
Closed 17 Jul 17:45:00

28 Jul 12:10:28
Details
15 Jul 10:41:58
We are reworking the SMS/twitter/email line up/down notifications and hope to have the new system launched later this week. There may be slightly different wording of the messages.
Update
15 Jul 18:06:51
We're looking to do this in stages. i.e. switch over emails then texts then tweets or something like that. So please bear with us. Ideally the changes should not lose any messages.
Update
17 Jul 09:21:13
We have switched over to the new system - the most noticable change is that SMS and Tweets are now independant. You can have either or both if you require - settings are on the control pages. SMS still has a back off if you have lots of line flaps, but tweets and emails do not delay. Do let us have any feedback on the new system.
Started 16 Jul
Closed 28 Jul 12:10:28
Previously expected 20 Jul

15 Jul 12:52:51
Details
15 Jul 12:52:51
The usage reports sent on 15th of the month for customers that have requested it have apparently not all worked. Some were blank.

These are being resent now, so apologies if you get two of them.

Started 15 Jul

11 Jul 11:03:55
Details
11 Jul 17:00:48
The "B" LNS restarted today, unexpectedly. All lines reconnected within minutes (however fast the model retries). We'll clear some traffic off the "D" server back to the "B" server later this evening.
Resolution We're investigating the cause of this.
Broadband Users Affected 33%
Started 11 Jul 11:03:52
Closed 11 Jul 11:03:55

10 Jul 20:10:00
Details
10 Jul 19:18:35
We are seeing a problem with BT 21CN ADSL and FTTC circuits being unable to log in since approximately 18:00 today. Existing sessions are working fine but are failing to reconnect when they drop. 20CN ADSL and TalkTalk backhaul circuits are working fine.

BT have raised incident IMT25152/14 which looks to be related, but just says they are investigating a problem.

Update
10 Jul 22:16:28
BT have reported that service should have been restored as of 20:10 this evening.

Customers who are still having problems should attempt to re-connect as they may be stuck on a BT holding session.

Anyone still having problems after doing that should contact tech support.

Started 10 Jul 17:15:00
Closed 10 Jul 20:10:00
Cause BT

1 Jul 23:25:00
Details
1 Jul 20:50:32
We have identified some TalkTalk back haul lines with congestion starting around 16:20 and now 100ms with 2% loss. This affects around 3% of our TT lines.

We have techies in TalkTalk on the case and hope to have it resolved soon.

Update
1 Jul 20:56:19
"On call engineers are being scrambled now - we have an issue in the wider Oxford area and you should see an incident coming through shortly."
Resolution Engineers fixed the issue last night.
Started 1 Jul 16:20:00
Closed 1 Jul 23:25:00
Previously expected 2 Jul

19 Jun 14:33:59
Details
11 Mar 10:11:55
We are seeing multiple exchanges with packet loss over BT wholesale. We are chasing BT on this and will update as and when we have updates. GOODMAYES CANONBURY HAINAULT SOUTHWARK LOUGHTON HARLOW NINE ELMS UPPER HOLLOWAY ABERDEEN DENBURN HAMPTON INGREBOURNE COVENTRY 21CN-BRAS-RED6-SF
Update
14 Mar 12:49:28
This has now been escalated to the next level for further investigation.
Update
17 Mar 15:42:38
BT are now raising faults on each Individual exchange.
Update
21 Mar 10:19:24
Below are the exchanges/RAS which has been fixed by capacity upgrades. We are hoping for the remanding four exchanges to be fixed in the next few days.
HAINAULT
SOUTHWARK
LOUGHTON
HARLOW
ABERDEEN DENBURN
HAMPTON
INGREBOURNE
GOODMAYERS
RAS 21CN-BRAS-RED6-SF
Update
21 Mar 15:52:45
COVENTRY should be resolved later this evening when a new link is installed between Nottingham and Derby. CANONBURY is waiting for CVLAN moves that begin 19/03/2014 and will be competed 01/04/2014.
Update
25 Mar 10:09:23
CANONBURY - Planned Engineering works have taken place on 19.3.14, and there are three more planned 25.3.14 , 26.3.14 and 1.4.14.
COVENTRY - Is now fixed
NINE ELMS and UPPER HOLLOWAY- Still suffering from packet loss and BT are investigating further.
Update
2 Apr 15:27:11
BT are still investigating congestion on Canonbury, Nine Elms and Upper Holloway.
Update
23 Apr 11:45:44
CANONBURY - further PEW's on 7th and 8th May
NINE ELMS - A total of 384 EU’s have been migrated. A further 614 are planned to be migrated in the early hours of the 25/04/14.
UPPER HOLLOWAY - Planned Engineering Work on 28th April
BEULAH HILL and TEWKESBURY - Seeing congestion peak times and Chasing BT on this also.
Update
30 Apr 12:51:24
NINE ELMS - T11003 - Still ongoing investigations for nine elms.
UPPER HOLLOWAY - T11004 - BT are working on this and a resolution should be available soon.
TEWKESBURY - T11200 - This is on the Backhaul list and will be dealt with shortly. Work request closed as no investigation required. BT are working on this and a resolution should be available soon.
MONMOUTH - T11182 - ALS583669 - This was balanced. I have advised BT that this is still not up to standards. They will continue to investigate. This is on the Backhaul Spreadhsheet also. So this is being investigate by capacity.
BEULAH HILL - Being investigated.
Update
2 May 12:45:16
CANONBURY - 580 EU's being migrated on 7th May and 359 EU's on 8th May
NINE ELMS - Emergency Pew PW238650 that will take place in the early hours on the 02/05/14. This is to move 500 circuits off 4 ISPV's onto other IPSV's.
UPPER HOLLOWAY - Currently BT TSO have 12 projects scheduled for upper Holloway.
TEWKESBURY - This is with BT TSO / Backhaul upgrades.
MONMOUTH - This is with BT TSO / Backhaul upgrades.
BEULAH HILL - Possibly fixed last night. Will monitor to see if any better this evening
BAYSWATER - Packet loss identified and reported to BT
Update
6 May 11:44:59
TEWKESBURY - Fixed
CANONBURY - EU's being migrated on 7th May and 359 EU's on 8th May

Still seeing some lines with issues after the upgrade. Passed back to BT.
NINE ELMS
MONMOUTH
UPPER HOLLOWAY
BEULAH HILL
READING EARLEY
Update
9 May 16:16:33
CANONBURY - NINE ELMS- Now fixed
UPPER HOLLOWAY - Have asked the team dealing for the latest update. Email sent today 9/05/2014
MONMOUTH - BT TSO are still chasing this.
BEULAH HILL - BT Tso chasing for a date on a PEW for work to be carried out.
BAYSWATER - BT TSO are still chasing this
READING EARLEY - Unbalanced LAG identified. Rebalancing will be completed out of hours. No ETA on this sorry.
Update
15 May 10:47:22
UPPER HOLLOWAY - Now fixed
MONMOUTH - We have been advised that the target date for the capacity increase is the 22nd May.
BEULAH HILL - Escalated this to a Duty Manager asking if he can gain an update.
EARLEY - TSO advised Capacity team have replied and hope to get the new 10gig links into service this month. No further updates, so escalated to Duty Manager to try and ascertain a specific date in May 2014 when this will take place.
Update
21 May 09:32:00
Reading Early / Monmouth - Now fixed
Bayswater - We have received a reply from the capacity management team, advising that to alleviate capacity issues, moves are taking place on May 23rd and May 28th.
Beulah Hill - Due to issues with cabling this has been delayed , we are currently awaiting a date that the cables can be run so that the integration team can bring this into service
Update
2 Jun 15:15:55
Bayswater - Now fixed
Belauh Hill - To alleviate capacity issues, moves are taking place between June 2nd and June 6th.
Update
10 Jun 12:16:52
Belauh Hill - Now fixed
AYR - Seeing congestion on many lines, which has been reported.
Update
19 Jun 14:33:06
AYR - Is now fixed
Broadband Users Affected 1%
Started 9 Mar 10:08:25 by AAISP Pro Active Monitoring Systems
Closed 19 Jun 14:33:59

11 Jun 15:08:59
Details
11 Jun 15:12:53
It looks like one of our LNSs restarted. This will have affected a third of our broadband customers. Lines all reconnected straight away and customers should not see any further problems. The usage graphs from midnight until the restart will have been lost.
Broadband Users Affected 33%
Started 11 Jun 15:05:00
Closed 11 Jun 15:08:59

25 May 08:02:51
Details
23 May 20:05:56
We are making a number of changes to the main page on clueless, and the search options for dealers/managers. This should be gradually applied over the weekend as work is done. The end result should be faster and more flexible. Any issues do ask RevK on irc.
Resolution We have done the main work on this - changing over the search system completely. This has meant some small details having been removed which will be added back over the coming week depending on demand.
Started 23 May 16:00:00
Closed 25 May 08:02:51
Previously expected 27 May

12 May 08:55:06
Details
10 May 15:52:02
At 15:33 all 20CN lines on Kingston RASs dropped. We are chasing BT now.
Update
10 May 16:05:18
BT have raised an incident. Apparently issue has been caused by power issues at London Kingston.
Update
12 May 08:55:29
This was fixed after power was restored and a remote reset was performed.
Started 10 May 15:50:27 by AAISP Staff
Closed 12 May 08:55:06
Cause BT

10 May 13:18:53
Details
10 May 13:18:53
A number of customers had asked us about recent news reports that ISPs will be sending educational letters to customers suspected of downloading media without appropriate permission from the copyright holder.

Please be assured that AAISP are under no obligation to send such letters, any more than the power companies that power and charge the devices used for such activities, or the device manufacturers. We have no intention of sending such letters.

As always, if we receive an abuse report it will either go directly to the customer as the contact details on the whois for IP addresses, or come to us and we will simply pass it on (as well as pointing out the sender that we are neither the police or the civil courts).

I hope this clears up any misunderstandings.

Started 10 May

28 Apr 13:37:28
Details
24 Apr 14:23:02
Some TalkTalk connected lines dropped at around 14:14. They are reconnecting now though. We'll investigate and will update this post.
Update
24 Apr 14:29:01
This looked like it was a wider TalkTalk problem as other ISPs were also affected.
Most lines are back online now though. We will investigate further.
Update
24 Apr 14:40:50
TalkTalk have been contacted and a Reason for Outage has been requested.
Update
24 Apr 15:02:33
TalkTalk have confirmed the outage on their status page: http://solutions.opal.co.uk/network-status-report.php?reportid=3893
Update
24 Apr 16:24:24
Update from TalkTalk: 15:59 24/04/2014 Supplier has noticed a link flap between two exchanges which resulted in brief loss of service for some DSL customers. The traffic was reconverged over alternative links. Supplier is still investigating for the root cause.
Resolution Incident was due to a transmission failure which the supplier is investigating with the switch vendor. We've also had this update from TalkTalk: The cause was identified as a blown rectifier.
Started 24 Apr 14:14:00
Closed 28 Apr 13:37:28

2 May 08:48:41
Details
2 May 08:13:36
We did some work yesterday to try and ensure we are correctly tracking lines being up and down. If ever there is any problem with RADIUS accounting this can get out of step. It is meant to sort itself out automatically, but there seemed to be some cases where that was not quite right.

Unfortunately the change led to lots of up/down emails, texts, and tweeks over night.

We think we have managed to address that now, and will be monitoring during the day.

Resolution We believe this is all sorted now.
Started 1 May 20:00:00
Closed 2 May 08:48:41
Previously expected 2 May 12:00:00

2 May 08:48:46
Details
22 Mar 07:36:41
We have started to see yet more congestion on BT lines last night. This looks again a bit like a link aggregation issue (where one leg of a multiple link trunk within BT is full). The patten is not as obvious this time. Looking at the history we can see that some of the affected lines have had slight loss in the evenings. We did not spot this with our tools because of the rather odd pattern. Obviously we are trying to get this sorted with BT, but we are pleased to confirm that BT are actually providing more data now that shows where each circuit will use network components within their network. We plan to integrate this soon so that we can correlate some of these newer congestion issues and point BT in the right direction more quickly.
Started 21 Mar 18:00:00
Closed 2 May 08:48:46

24 Apr 13:36:17
Details
17 Feb 20:13:09
We are seeing packet loss at peak times on some lines on the Crouch End exchange. It's a small number of customers, and it looks like a congested SVLAN. This has been reported to BT.
Update
18 Feb 10:52:26
Initially BT were unable to see any problem, their monitoring was not showing any congestion and they wanted us to report individual line faults rather than this being dealt as a specific BT network problem. However we have spoken to another ISP who confirms the problem. BT have now opened an Incident and will be investigating.
Update
18 Feb 11:12:47
We have passed all our circuit details and graphs to proactive to investigate.
Update
18 Feb 16:31:17
TSO will investigate overnight
Update
20 Feb 10:15:02
No updates from TSO, proactive are chasing.
Update
27 Feb 13:24:38
There is still congestion, we are chasing BT again.
Update
28 Feb 09:34:50
Appears the issue is on the MSE router. Lines connected to the MSE are due to be migrated on 21st March however BT are hoping to get this done by 21th March.
Update
24 Apr 15:25:06
All lines on the Crouch End exchange are now showing clear.
Broadband Users Affected 0.10%
Started 17 Feb 20:10:29
Closed 24 Apr 13:36:17

4 Apr 17:05:09
Details
8 Apr 16:58:41
Some lines on the BT LEITH exchange have gone down. BT are aware and are investigating at the moment.
Started 8 Apr 16:30:20 by Customer report
Closed 4 Apr 17:05:09

3 Apr 12:26:40
Details
25 Mar 09:55:20

We are seeing customer routers being attacked this morning, which is causing them to drop. This was previously reported in the status post http://status.aa.net.uk/1877 where we saw that the attacks were affecting ZyXEL routers, as well as other makes.

Since that post we have updated the configuration of customer ZyXEL routers, where possible and these are no longer being affected. However, these attacks are affecting other types of routers.

We suggest that customers with lines that are dropping to check their router configuration and disable access to the router's web interface from the internet, or at least to change the the port used (eg one in the range of 1024-65535)

Please speak to Support for more information.

Update
28 Mar 10:13:13
This is happening again, do speak to suport if you need help changing the web interface settings.
Customers with ZyXELs can change the port from the control pages.
Started 25 Mar 09:00:40
Closed 3 Apr 12:26:40

1 Apr 10:00:00
Details
1 Apr 12:13:31
Some TalkTalk connected lines dropped at around 09:50 and reconnected a few minutes after. It looks like a connectivity problem between us and TalkTalk on one of our connections to them. We are investigating further.
Started 1 Apr 09:50:00
Closed 1 Apr 10:00:00

31 Mar 15:03:25
Details
31 Mar 09:40:40
Some TalkTalk line diagnostics (Signal graphs and line tests) as available from the Control Pages are not working at the moment. This is being looked in to.
Update
31 Mar 15:03:17
This is resolved. The TalkTalk side appears of have a bug relating to timezones.
Resolution This is resolved. The TalkTalk side appears of have a bug relating to timezones.
Started 31 Mar 09:00:00
Closed 31 Mar 15:03:25

20 Mar 11:17:21
Details
20 Mar 08:38:52
Customers will be seeing what looks like 'duplicated' usage reporting on the control for last night and this morning. This has been caused by a database migration that is taking longer than expected. The usage 'duplication' has been caused by usage reports being missed and so on subsequent hours the usage has been spread equally across missed hours.
This means that overall the usage reporting will be correct, but an individual hour will be incorrect.
This has also affected a few other related things such as the Line Colour states.
Update
20 Mar 11:17:55
Usage reporting is now back to normal.
Started 19 Mar 18:00:00
Closed 20 Mar 11:17:21

2 Mar 11:33:29
Details
1 Mar 04:24:02
Lines: 100% 21CN-REGION-GI-B dropped at 2014-03-01 04:22:17
We have advised BT
This is likely to have affected multiple internet providers using BT
Update
1 Mar 04:25:06
Lines: 100% 21CN-REGION-GI-B dropped again at 2014-03-01 04:23:21.
Broadband Users Affected 2%
Started 1 Mar 04:22:17 by AAISP automated checking
Closed 2 Mar 11:33:29
Cause BT

18 Mar 11:32:53
Details
18 Mar 11:32:53
We have removed the 'Services' hyperlink from our Accounts (billing) system that logs you into Clueless directly.
The alternative is for staff to set your '@a' login to be a Group login. This will then mean that your @a login will be able to see all services on your billing account.
Do email in if you'd like this set up.
Started 18 Mar 09:00:00

11 Mar 09:32:42
Details
6 Mar 13:07:51

We have had a small number of reports from customers who have had the DNS settings on their routers altered. The IPs we are seeing set are 199.223.215.157 and 199.223.212.99 (there may be others)

This type of attack is called Pharming. In short, it means that any internet traffic could be redirected to servers controlled by the attacker.

There is more information about pharming on the following pages:

At the moment we are logging when customers try to accesses these IP addresses and we are then contacting the customers to make them aware.

To solve the problem we are suggesting that customers replace the router or speak to their local IT support.

Update
6 Mar 13:33:10
Changing the DNS settings back to auto, changing the administrator password and disabling WAN side access to the router may also prevent this from happening again.
Update
6 Mar 13:48:14
Also reported here: http://www.pcworld.com/article/2104380/
Resolution We have contacted the few affected customers.
Started 6 Mar 09:00:00
Closed 11 Mar 09:32:42

7 Mar 15:08:45
Details
7 Mar 15:10:59
Some broadbands lined blipped at 15:05. This was a result of one of our LNSs restarting. Lines are back online and we'll investigate the cause
Started 7 Mar 15:03:00
Closed 7 Mar 15:08:45

4 Mar 14:30:58
Details
4 Mar 14:30:58

We are pleased to confirm that we are extending the links to BT for broadband to a third gigabit hostlink. This means we will actually have six gigabit fibres to them allowing lots of headroom and redundancy. This should be seamless to customers but the LNSs known as "A", "B", "C", and "D" will have a new "E" and "F" added and we will run 5 of the 6 LNSs as "live" and one backup. We also have multiple gigabit links in to Talk Talk.

This will happen over the next few months and have planned work "at risk" announcements as necessary.

We are actually growing quite well now, and a lot of this has been put down to Baroness Howe mentioning us in The House of Lords recently. I'd really like to thank her for her support, even if unintentional. (see http://revk.www.me.uk/2014/01/mentioned-in-house-of-lords.html)

We have even put another person in to the sales team to handle the extra load.

Started 4 Mar 14:00:00

3 Mar 13:31:25
Details
17 Jan 16:13:23
It seems that BE/Sky are informing their customers that they can no longer have their public blocks of IPs on their service. As a one off special offer, from now until the end of February March if an ex BE customer migrates to our Home::1 tariff then we can include a /30, /29 or /28 block of IPv4 in additional to the usual IPv6 blocks, for no extra cost.
Information about Home::1 is here: http://aa.net.uk/broadband-home1.html Do mention this offer when ordering.
Do see our page about what we do when we run out of IPv4 though: http://aa.net.uk/kb-broadband-ipv6-plans.html
Update
3 Mar 13:30:56
Offer continued until the end of March.
Started 17 Jan 16:00:00

27 Feb 20:40:00
Details
27 Feb 20:29:14
We are seeing some TT lines dropping and a routing problem.
Update
27 Feb 20:39:20
Things are ok now, we're investigating. This looks to have affected some routing for broadband customers and caused some TT lines to drop.
Resolution We are not entirely sure what caused this, however we do believe it to be related to BGP flapping. This also looks to have affected other ISPs and networks too.
Started 27 Feb 20:18:00
Closed 27 Feb 20:40:00