Order posts by limited to posts

25 Jan 11:00:03
Posted: 25 Jan 11:32:23
Between 10:45:43 - 10:57:33 we saw several blips on half of our new BT host links. They do seem stable at the moment; however we are still waiting on BT to comment on the cause.
Resolution BT Have confirmed that this was a problem on their side. We are awaiting further details about the incident.
Started 25 Jan 10:45:43
Closed 25 Jan 11:00:03

17 Nov 2017 18:34:27
Posted: 17 Nov 2017 11:36:33

We are seeing a denial of service attack, which is causing more problems that usual, and this is disrupting traffic to some customers - but it is moving, and so different customers at different times.

Obviously we are working on this, and unfortunately cannot say a lot more of the details.

17 Nov 2017 12:09:40
We are still seeing problems, customer on this LNS would have seen drops and routing problems. We are working on this.
17 Nov 2017 12:10:32
This problem has also been affecting some of our transit routes, and so routing to parts of the internet will have had problems too.
17 Nov 2017 12:42:15
We are working through this still, we have moved lines off A.Gormless, and have had further problems with those lines. Please do bear with us, we hope things to calm down shortly.
17 Nov 2017 13:39:21
We cannot go in to a lot of detail but I will say this is a complete new sort of attack. We have made some changes and will be reviewing ways we can mitigate attacks like this in the future. I'll re-open this issue if problems continue. Thank you all for your patience.
17 Nov 2017 13:54:44
Hmm, that attacker is clearly back from lunch.
17 Nov 2017 15:15:18
Not gone away - we are working on more...
17 Nov 2017 16:21:02
This now appears to be affecting VoIP too.
17 Nov 2017 17:19:54
We're rebalancing some lines due to the issues early morning today as per https://aastatus.net/2457 and additionally due to today's issues.
Some PPP sessions will disconnect and shortly reconnect. This is to fix an imbalance in the number of sessions we have per LNS.
17 Nov 2017 17:34:50
The line rebalancing is fine, and mostly broadband is fine, but we are tackling a moving target aiming at various services on an ongoing basis.
Resolution Quiet for now, we are monitoring still.
Started 17 Nov 2017 11:25:00
Closed 17 Nov 2017 18:34:27

17 Nov 2017 13:40:06
Posted: 17 Nov 2017 01:26:59
At 00:42, a large number of lines across all carriers disconnected, and then reconnected. Some lines may have taken longer than others to come back, however session numbers now are noted to have slowly recovered to their usual levels. During this outage, a significant number of BT lines have ended up biased to one LNS in particular which will need dealing with.
As session numbers have stabilised and traffic levels look normal, a further investigation into this event will follow in the morning, along with plans to move customers off of I.gormless where the large number of BT sessions have accumulated.
17 Nov 2017 10:50:48
We expect to do some PPP restarted early evening before the evening peak traffic to put lines back on the right LNS. This will not be all customers. We are then looking to do some PPP restarts over night during the weekend to distribute over the newly installed LNSs.
Resolution The original issue has gone away, because of other problems today (DoS attack) we expect to be rebalancing lines later in the day and over night anyway. Thank you for your understanding.
Broadband Users Affected 90%
Started 17 Nov 2017 00:41:25 by AA Staff
Closed 17 Nov 2017 13:40:06

02 Nov 2017 22:05:00
Posted: 02 Nov 2017 21:59:48
We're experiences high traffic levels on some of our core routers - we're investigating, but this may be causing some disruption for customers.
02 Nov 2017 22:06:32
Thing are looking back to normal now...
Started 02 Nov 2017 21:45:00
Closed 02 Nov 2017 22:05:00

14 Jul 2017 16:48:39
Posted: 14 Jul 2017 10:33:01
We've just seen BT and TT lines drop. We are investigating.
14 Jul 2017 10:57:12
This is similar to yesterday's problem. We have lost BGP connectivity to both our carriers. Over the past fee minutes the BGP sessions have been going up and down, meaning customers are logging in and then out again. Updates to follow shortly.
14 Jul 2017 11:16:46
Sessions are looking a bit more stable now... customer lines are still reconnecting
14 Jul 2017 11:56:08
We have about half DSL lines logged in, but the remaining half are struggling due to what is looking like a Layer 2 issue on our network.
14 Jul 2017 12:23:19
More DSL lines are now back up. If customers are still off, a reboot may help.
14 Jul 2017 12:35:51
A number of TT and BT lines just dropped. They are starting to reconnect now though.
14 Jul 2017 12:52:31
This is still on going - most DSL lines are back, some have been dropping but the network is still unstable. We are continuing to investigate this.
14 Jul 2017 12:55:50
We managed to get a lot of lines up after around an hour, during which there was a lot of flapping. The majority of the rest (some of the talk talk back haul lines) came up, and flapped a bit, at 12:00 and came back properly around 12:40. However, we are still trying to understand the issue, and still have some problems, and we suspect there may be more outages. The problem appears to be something at layer 2 but impacting several sites at once.
14 Jul 2017 14:41:23
We believe these outages are due to a partial failure of one of our core switches. We've moved most services away from this switch and are running diagnostics on it at the moment. We are not expecting these diagnostics to affect other services.
14 Jul 2017 17:00:40
The network has been stable for a good few hours now. One of our 40G Telehouse-to-Harbour Exchange interconnects has been taken down and some devices have been moved off of the suspect switch. We have further work to do in investigating the root cause of this and what we plan to do try to stop this from happening again. We do apologise to our customers for the disruption these two outages have caused and we work on trying to prevent this from happening again.
Started 14 Jul 2017 10:30:00
Closed 14 Jul 2017 16:48:39

13 Jul 2017 21:38:01
Posted: 13 Jul 2017 18:56:13
Multiple circuits have disconnected and reconnected, staff are investigating
13 Jul 2017 19:00:34
Sessions seem to be repeatedly flapping rather than reconnecting - staff are investigating.
13 Jul 2017 20:05:47
We are still working on this, it's a rather nasty outage I'm afraid and is proving difficult to track down.
13 Jul 2017 20:17:42
Lines are re-connecting now...
13 Jul 2017 20:19:16
Apologies for the loss of graphs on a.gormless. We usually assume the worst and that out kit is the cause and tried a reboot of a FireBrick LNS. It did not help, but did clarify the real cause which was cisco switches. Sorry for the time it took to track this one down.
13 Jul 2017 20:19:27
Some line are being forced to reconnect so as to move them to the correct LNS, this will cause a logout/login for some customers...
13 Jul 2017 20:25:52
Lines are still connecting, not all are back, but the number of connected lines is increasing.
13 Jul 2017 20:29:55
We're doing some work which may cause some lines to go offline - we expect line to start reconnecting in 10 minutes time.
13 Jul 2017 20:34:35
We are rebooting stuff to try and find the issue. This is very unusual.
13 Jul 2017 20:42:56
Things are still not stable and lines are still dropping. We're needing to reboot some core network switches as part of our investigations and this is happening at the moment.
13 Jul 2017 20:52:29
Lines are reconnecting once more
13 Jul 2017 21:17:36
Looking stable.
13 Jul 2017 21:23:48
Most lines are back online now, if customers re still not online they a reboot of the router or modem may be required as the session may have got stuck inside the back haul network.
13 Jul 2017 21:41:38

We'll close this incident as lines have been stable for an hour. We'll update the post with further information as to the cause and any action we will be taking to help stop this type of incident from happening again.

We would like to thank our customers for their patience and support this evening. We had many customers in our IRC channel who were in good spirits and supportive to our staff whilst they worked on this incident.

14 Jul 2017 14:42:20
A similar problem occurred on Friday morning, this is covered on the following post: https://aastatus.net/2411
Closed 13 Jul 2017 21:38:01

27 Mar 2017 09:30:00
Posted: 19 Feb 2017 18:35:15
We have seen some cases with degraded performance on some TT lines, and we are investigating. Not a lot to go on yet, but be assured we are working on this and engaging the engineers within TT to address this.
21 Feb 2017 10:13:20

We have completed further tests and we are seeing congestion manifesting itself as slow throughput at peak times (evenings and weekends) on VDSL (FTTC) lines that connect to us through a certain Talk Talk LAC.

This has been reported to senior TalkTalk staff.

To explain further; VDSL circuits are routed from TalkTalk to us via two LACs. We are seeing slow thoughput at peak times on one LAC and not the other.

27 Feb 2017 11:08:58
Very often with congestion it is easy to find the network port or system that is overloaded but so far, sadly, we've not found the cause. A&A staff and customers and TalkTalk network engineers have done a lot of checks and tests on various bits of the backhaul network but we are finding it difficult to locate the cause of the slow throughput. We are all still working on this and will update again tomorrow.
27 Feb 2017 13:31:39
We've been in discussions with other TalkTalk wholesalers who have also reported the same problem to TalkTalk. There does seem to be more of a general problem within the TalkTalk network.
27 Feb 2017 13:32:12
We have had an update from TalkTalk saying that based on multiple reports from ISPs that they are investigating further.
27 Feb 2017 23:21:21
Further tests this evening by A&A staff shows that the throughput is not relating to a specific LAC, but that it looks like something in TalkTalk is limiting single TCP sessions to 7-9M max during peak times. Running single iperf tests results in 7-9M, but running ten at the same time can fill a 70M circuit. We've passed these findings on to TalkTalk.
28 Feb 2017 09:29:56
As expected the same iperf throughput tests are working fine this morning. TT are shaping at peak times. We are pursuing this with senior TalkTalk staff.
28 Feb 2017 11:27:45
TalkTalk are investigating. They have stated that circuits should not be rate limited and that they are not intentionally rate limiting. They are still investigating the cause.
28 Feb 2017 13:14:52
Update from TalkTalk: Investigations are currently underway with our NOC team who are liaising with Juniper to determine the root cause of this incident.
01 Mar 2017 16:38:54
TalkTalk are able to reproduce the throughput problem and investigations are still on going.
02 Mar 2017 16:51:12
Some customers did see better throughput on Wednesday evening, but not everyone. We've done some further testing with TalkTalk today and they continue to work on this.
02 Mar 2017 22:42:27
We've been in touch with the TalkTalk Network team this evening and have been performing further tests (see https://aastatus.net/2363 ). Investigations are still ongoing, but the work this evening has given a slight clue.
03 Mar 2017 14:24:48
During tests yesterday evening we saw slow throughput when using the Telehouse interconnect and fast (normal) throughput over Harbour Exchange interconnect. Therefore, this morning, we disabled our Telehouse North interconnect. We will carry on running tests over the weekend and we welcome customers to do the same. We are expecting throughput to but fast for everyone. We will then liaise with TalkTalk engineers regarding this on Monday.
06 Mar 2017 15:39:33

Tests over the weekend suggest that speeds are good when we only use our Harbour Exchange interconnect.

TalkTalk are moving the interconnect we have at Telehouse to a different port at their side so as to rule out a possible hardware fault.

06 Mar 2017 16:38:28
TalkTalk have moved our THN port and we will be re-testing this evening. This may cause some TalkTalk customers to experience slow (single thread) downloads this evening. See: https://aastatus.net/2364 for the planned work notice.
06 Mar 2017 21:39:55
The testing has been completed, and sadly we still see slow speeds when using the THN interconnect. We are now back to using the Harbour Exchange interconnect where we are seeing fast speeds as usual.
08 Mar 2017 12:30:25
Further testing happening today: Thursday evening https://aastatus.net/2366 This is to try and help narrow down where the problem is occurring.
09 Mar 2017 23:23:13
We've been testing, tis evening, this time with some more customers, so thank you to those who have been assisting. (We'd welcome more customers to be involved - you just need to run an iperf server on IPv4 or IPv6 and let one of our IPs through your firewall - contact Andrew if you're interested). We'll be passing the results on to TalkTalk, and the investigation continues.
10 Mar 2017 15:13:43
Last night we saw some line slow and some line fast, so having extra lines to test against should help in figuring out why this is the case. Quite a few customers have set up iperf server for us and we are now testing 20+ lines. (Still happy to add more). Speed tests are being run three times an hour and we'll collate the results after the weekend and will report back to TalkTalk the findings.
11 Mar 2017 20:10:21
13 Mar 2017 15:22:43

We now have samples of lines which are affected by the slow throughput and those that are not.

Since 9pm Sunday we are using the Harbour Exchange interconnect in to TalkTalk and so all customers should be seeing fast speeds.

This is still being investigated by us and TalkTalk staff. We may do some more testing in the evenings this week and we are continuing to run iperf tests against the customers who have contacted us.
14 Mar 2017 15:59:18

TalkTalk are doing some work this evening and will be reporting back to us tomorrow. We are also going to be carrying out some tests ourselves this evening too.

Our tests will require us to move traffic over to the Telehouse interconnect, which may mean some customers will see slow (single thread) download speeds at times. This will be between 9pm and 11pm

14 Mar 2017 16:45:49
This is from the weekend:

17 Mar 2017 10:42:28
We've stopped the iperf testing for the time being. We will start it back up again once we or TalkTalk have made changes that require testing to see if things are better or not, but at the moment there is no need for the testing as all customers should be seeing fast speeds due to the Telehouse interconnect not being in use. Customers who would like quota top-ups, please do email in.
17 Mar 2017 18:10:41
To help with the investigations, we're also asking for customers with BT connected FTTC/VDSL lines to run iperf so we can test against them too - details on https://support.aa.net.uk/TTiperf Thank you!
20 Mar 2017 12:54:02
Thanks to those who have set up iperf for us to test against. We ran some tests over the weekend whilst swapping back to the Telehouse interconnect, and tested BT and TT circuits for comparison. Results are that around half the TT lines slowed down but the BT circuits were unaffected.

TalkTalk are arranging some further tests to be done with us which will happen Monday or Tuesday evening this week.

22 Mar 2017 09:37:30
We have scheduled testing of our Telehouse interlink with TalkTalk staff for this Thursday evening. This will not affect customers in any way.
22 Mar 2017 09:44:09
In addition to the interconnect testing on Thursday mentioned above, TalkTalk have also asked us to retest DSL circuits to see if they are still slow. We will perform these tests this tonnight, Wednesday evening.

TT have confirmed that they have made a configuration change on the switch at their end in Telehouse - this is the reason for the speed testing this evening.

22 Mar 2017 12:06:50
We'll be running iperf3 tests against our TT and BT volunteers this evening, very 15 minutes from 4pm through to midnight.
22 Mar 2017 17:40:20
We'll be changing over to the Telehouse interconnect between 8pm and 9pm this evening for testing.
23 Mar 2017 10:36:06

Here are the results from last night:

And BT Circuits:

Some of the results are rather up and down, but these lines are in use by customers so we would expect some fluctuations, but it's clear that a number of lines are unaffected and a number are affected.

Here's the interesting part. Since this problem started we have rolled out some extra logging on to our LNSs, this has taken some time as we only update one a day. However, we are now logging the IP address used at our side of L2TP tunnels from TalkTalk. We have eight live LNSs and each one has 16 IP addresses that are used. With this logging we've identified that circuits connecting over tunnels on 'odd' IPs are fast, whilst those on tunnels on 'even' IPs are slow. This points to a LAG issue within TalkTalk, which is what we have suspected from the start but this data should hopefully help TalkTalk with their investigations.

23 Mar 2017 16:27:28
As mentioned above, we have scheduled testing of our Telehouse interlink with TalkTalk staff for this evening. This will not affect customers in any way.
23 Mar 2017 22:28:53

We have been testing the Telehouse interconnect this evening with TalkTalk engineers. This involved a ~80 minute conference call and setting up a very simple test of a server our side plugged in to the switch which is connected to our 10G interconnect, and running iperf3 tests against a laptop on the TalkTalk side.

The test has highlighted a problem at the TalkTalk end with the connection between two of their switches. When plugged in to the second switch we got about 300Mbit/s, but when their laptop was in the switch directly connected to our interconnect we got near full speed or around 900Mb/s.

This has hopefully given them a big clue and they will now involve the switch vendor for further investigations.

23 Mar 2017 23:02:34
TalkTalk have just called us back and have asked us to retest speeds on broadband circuits. We're moving traffic over to the Telehouse interconnect and will test....
23 Mar 2017 23:07:31
Initial reports show that speeds are back to normal! Hooray! We've asked TalkTalk for more details and if this is a temporary or permanent fix.
24 Mar 2017 09:22:13

Results from last night when we changed over to test the Telehouse interlink:

This shows that unlike the previous times, when we changed over to use the Telehouse interconnect at 11PM speeds did not drop.

We will perform hourly iperf tests over the weekend to be sure that this has been fixed.

We're still awaiting details from TalkTalk as to what the fix was and if it is a temporary or permanent fix.

24 Mar 2017 16:40:24
We are running on the Telehouse interconnect and are running hourly iperf3 tests against a number of our customers over the weekend. This will tell us if the speed issues are fixed.
27 Mar 2017 09:37:12

Speed tests against customers over the weekend do not show the peak time slow downs, this confrims that what TalkTalk did on Thursday night has fixed the problem. We are still awaiting the report from TalkTalk regarding this incident.

The graph above shows iperf3 speed test results taken once an hour over the weekend against nearly 30 customers. Although some are a bit spiky we are no longer seeing the drastic reduction in speeds at peak time. The spikyness is due to the lines being used as normal by the customers and so is expected.

28 Mar 2017 10:52:25
We're expecting the report from TalkTalk at the end of this week or early next week (w/b 2017-04-03).
10 Apr 2017 16:43:03
We've not yet had the report from TalkTalk, but we do expect it soon...
04 May 2017 09:16:33
We've had an update saying: "The trigger & root cause of this problem is still un-explained; however investigations are continuing between our IP Operation engineers and vendor".

This testing is planned for 16th May.

Resolution From TT: Planned work took place on the 16th May which appears to have been a success. IP Ops engineers swapped the FPC 5 and a 10 gig module on the ldn-vc1.thn device They also performed a full reload to the entire virtual chassis (as planned). This appears to have resolved the slow speed issues seen by the iperf testing onsite. Prior to this IP ops were seeing consistent slow speeds with egress traffic sourced from FPC5 to any other FPC; therefore they are confident that this has now been fixed. IP Ops have moved A&A's port back to FPC 5 on LDN-vc1.thn.
Started 18 Feb 2017
Closed 27 Mar 2017 09:30:00
Cause TT

14 Mar 2017 21:10:00
Posted: 14 Mar 2017 21:05:28
Looks like we just had some sort of blip affecting broadband customers. We're investigating.
Resolution This was a LNS crash, and so affected customers on the "i" LNS. The cause is being investigated, but preliminary investigations show that it's probably a problem that is fixed in software that is scheduled to be loaded on to this LNS in a couple of days time as part of the rolling software update that we're performing at the moment.
Broadband Users Affected 12%
Started 14 Mar 2017 21:00:57
Closed 14 Mar 2017 21:10:00