We have seen some cases with degraded performance on some TT lines, and we are investigating. Not a lot to go on yet, but be assured we are working on this and engaging the engineers within TT to address this.
We have completed further tests and we are seeing congestion manifesting itself as slow throughput at peak times (evenings and weekends) on VDSL (FTTC) lines
that connect to us through a certain Talk Talk LAC.
This has been reported to senior TalkTalk staff.
To explain further; VDSL circuits are routed from TalkTalk to us via two LACs. We are seeing slow thoughput at peak times on one LAC and not the other.
Very often with congestion it is easy to find the network port or system that is overloaded but so far, sadly, we've not found the cause. A&A staff and customers and TalkTalk network engineers have done a lot of checks and tests on various bits of the backhaul network but we are finding it difficult to locate the cause of the slow throughput. We are all still working on this and will update again tomorrow.
We've been in discussions with other TalkTalk wholesalers who have also reported the same problem to TalkTalk. There does seem to be more of a general problem within the TalkTalk network.
We have had an update from TalkTalk saying that based on multiple reports from ISPs that they are investigating further.
Further tests this evening by A&A staff shows that the throughput is not relating to a specific LAC, but that it looks like something in TalkTalk is limiting single TCP sessions to 7-9M max during peak times. Running single iperf tests results in 7-9M, but running ten at the same time can fill a 70M circuit. We've passed these findings on to TalkTalk.
As expected the same iperf throughput tests are working fine this morning. TT are shaping at peak times. We are pursuing this with senior TalkTalk staff.
TalkTalk are investigating. They have stated that circuits should not be rate limited and that they are not intentionally rate limiting. They are still investigating the cause.
Update from TalkTalk: Investigations are currently underway with our NOC team who are liaising with Juniper to determine the root cause of this incident.
TalkTalk are able to reproduce the throughput problem and investigations are still on going.
Some customers did see better throughput on Wednesday evening, but not everyone. We've done some further testing with TalkTalk today and they continue to work on this.
We've been in touch with the TalkTalk Network team this evening and have been performing further tests (see https://aastatus.net/2363 ). Investigations are still ongoing, but the work this evening has given a slight clue.
During tests yesterday evening we saw slow throughput when using the Telehouse interconnect and fast (normal) throughput over Harbour Exchange interconnect. Therefore, this morning, we disabled our Telehouse North interconnect. We will carry on running tests over the weekend and we welcome customers to do the same. We are expecting throughput to but fast for everyone. We will then liaise with TalkTalk engineers regarding this on Monday.
Tests over the weekend suggest that speeds are good when we only use our Harbour Exchange interconnect.
TalkTalk are moving the interconnect we have at Telehouse to a different port at their side so as to rule out a possible hardware fault.
TalkTalk have moved our THN port and we will be re-testing this evening. This may cause some TalkTalk customers to experience slow (single thread) downloads this evening. See: https://aastatus.net/2364 for the planned work notice.
The testing has been completed, and sadly we still see slow speeds when using the THN interconnect. We are now back to using the Harbour Exchange interconnect where we are seeing fast speeds as usual.
Further testing happening
today: Thursday evening https://aastatus.net/2366 This is to try and help narrow down where the problem is occurring.
We've been testing, tis evening, this time with some more customers, so thank you to those who have been assisting. (We'd welcome more customers to be involved - you just need to run an iperf server on IPv4 or IPv6 and let one of our IPs through your firewall - contact Andrew if you're interested). We'll be passing the results on to TalkTalk, and the investigation continues.
Last night we saw some line slow and some line fast, so having extra lines to test against should help in figuring out why this is the case. Quite a few customers have set up iperf server for us and we are now testing 20+ lines. (Still happy to add more). Speed tests are being run three times an hour and we'll collate the results after the weekend and will report back to TalkTalk the findings.
We now have samples of lines which are affected by the slow throughput and those that are not.
Since 9pm Sunday we are using the Harbour Exchange interconnect in to TalkTalk and so all customers should be seeing fast speeds.This is still being investigated by us and TalkTalk staff. We may do some more testing in the evenings this week and we are continuing to run iperf tests against the customers who have contacted us.
TalkTalk are doing some work this evening and will be reporting back to us tomorrow. We are also going to be carrying out some tests ourselves this evening too.
Our tests will require us to move traffic over to the Telehouse interconnect, which may mean some customers will see slow (single thread) download speeds at times. This will be between 9pm and 11pm
We've stopped the iperf testing for the time being. We will start it back up again once we or TalkTalk have made changes that require testing to see if things are better or not, but at the moment there is no need for the testing as all customers should be seeing fast speeds due to the Telehouse interconnect not being in use. Customers who would like quota top-ups, please do email in.
To help with the investigations, we're also asking for customers with BT connected FTTC/VDSL lines to run iperf so we can test against them too - details on https://support.aa.net.uk/TTiperf Thank you!
Thanks to those who have set up iperf for us to test against. We ran some tests over the weekend whilst swapping back to the Telehouse interconnect, and tested BT and TT circuits for comparison. Results are that around half the TT lines slowed down but the BT circuits were unaffected.
TalkTalk are arranging some further tests to be done with us which will happen Monday or Tuesday evening this week.
We have scheduled testing of our Telehouse interlink with TalkTalk staff for this Thursday evening. This will not affect customers in any way.
In addition to the interconnect testing on Thursday mentioned above, TalkTalk have also asked us to retest DSL circuits to see if they are still slow. We will perform these tests this tonnight, Wednesday evening.
TT have confirmed that they have made a configuration change on the switch at their end in Telehouse - this is the reason for the speed testing this evening.
We'll be running iperf3 tests against our TT and BT volunteers this evening, very 15 minutes from 4pm through to midnight.
We'll be changing over to the Telehouse interconnect between 8pm and 9pm this evening for testing.
Here are the results from last night:
And BT Circuits:
Some of the results are rather up and down, but these lines are in use by customers so we would expect some fluctuations, but it's clear that a number of lines are unaffected and a number are affected.
Here's the interesting part. Since this problem started we have rolled out some extra logging on to our LNSs, this has taken some time as we only update one a day. However, we are now logging the IP address used at our side of L2TP tunnels from TalkTalk. We have eight live LNSs and each one has 16 IP addresses that are used. With this logging we've identified that circuits connecting over tunnels on 'odd' IPs are fast, whilst those on tunnels on 'even' IPs are slow. This points to a LAG issue within TalkTalk, which is what we have suspected from the start but this data should hopefully help TalkTalk with their investigations.
As mentioned above, we have scheduled testing of our Telehouse interlink with TalkTalk staff for this evening. This will not affect customers in any way.
We have been testing the Telehouse interconnect this evening with TalkTalk engineers. This involved a ~80 minute conference call and setting up a very simple test of a server our side plugged in to the switch which is connected to our 10G interconnect, and running iperf3 tests against a laptop on the TalkTalk side.
The test has highlighted a problem at the TalkTalk end with the connection between two of their switches. When plugged in to the second switch we got about 300Mbit/s, but when their laptop was in the switch directly connected to our interconnect we got near full speed or around 900Mb/s.
This has hopefully given them a big clue and they will now involve the switch vendor for further investigations.
TalkTalk have just called us back and have asked us to retest speeds on broadband circuits. We're moving traffic over to the Telehouse interconnect and will test....
Initial reports show that speeds are back to normal! Hooray! We've asked TalkTalk for more details and if this is a temporary or permanent fix.
Results from last night when we changed over to test the Telehouse interlink:
This shows that unlike the previous times, when we changed over to use the Telehouse interconnect at 11PM speeds did not drop.We will perform hourly iperf tests over the weekend to be sure that this has been fixed.
We're still awaiting details from TalkTalk as to what the fix was and if it is a temporary or permanent fix.
We are running on the Telehouse interconnect and are running hourly iperf3 tests against a number of our customers over the weekend. This will tell us if the speed issues are fixed.
Speed tests against customers over the weekend do not show the peak time slow downs, this confrims that what TalkTalk did on Thursday night has fixed the problem. We are still awaiting the report from TalkTalk regarding this incident.
The graph above shows iperf3 speed test results taken once an hour over the weekend against nearly 30 customers. Although some are a bit spiky we are no longer seeing the drastic reduction in speeds at peak time. The spikyness is due to the lines being used as normal by the customers and so is expected.
We're expecting the report from TalkTalk at the end of this week or early next week (w/b 2017-04-03).
We've not yet had the report from TalkTalk, but we do expect it soon...
We've had an update saying: "The trigger & root cause of this problem is still un-explained; however investigations are continuing between our IP Operation engineers and vendor".
This testing is planned for 16th May.
From TT: Planned work took place on the 16th May which appears to have been a success. IP Ops engineers swapped the FPC 5 and a 10 gig module on the ldn-vc1.thn device They also performed a full reload to the entire virtual chassis (as planned). This appears to have resolved the slow speed issues seen by the iperf testing onsite. Prior to this IP ops were seeing consistent slow speeds with egress traffic sourced from FPC5 to any other FPC; therefore they are confident that this has now been fixed. IP Ops have moved A&A's port back to FPC 5 on LDN-vc1.thn.