We do apologise for the inconvenience these drops are having on our customers, this is still an ongoing problem.
The drops can be seen as long vertical lines on our blip graph: https://aastatus.net/#blip
There have been a number of problems affecting TalkTalk ADSL and VDSL circuits this week. They have been a mix of overnight planned work, overnight planned work going wrong and causing disruption in the mornings and drops during the day.
These are less of a problem where customers have bonded lines with services from different back-haul providers (TalkTalk and BT) but they are disruptive to single-line customers.
Earlier in the week we took our Telehouse interconnect out of service as it was connected to equipment at TalkTalk that was having serious problems. However, we have still seen a few drops during the day. These drops are not affecting all circuits but also don't seem to be localised to particular areas of the country.
TalkTalk say they have no more work to do on the Telehouse side and so over this weekend we will bring back the downed interconnect which will restore resilience. This should not cause a blip, but will be done out of hours.We are still in dialogue with TalkTalk about the unexplained drops and will report here the findings and outcome.
There were two further drops Saturday - 00:06, 19:56 and 22:26. TalkTalk circuits either saw a some packetloss (about 15 seconds) or their PPP drop and reconnect. We have had email and telephone calls with TalkTalk on Saturday evening and they are investigating but are not yet finding a network issue. From our side, we see 15 seconds or so packet loss to the TalkTalk side of the L2TP tunnels from all of our LNSs. We see no errors on our physical interconnect to TalkTalk and our BGP sessions to TalkTalk stay up. It does seem as though the loss is on the TalkTalk side but investigations are still happening. As mentioned before we have an interconnect to TalkTalk in a different datacentre (Telehouse) which is currently down as TalkTalk had problems on their equipment earlier in the week and have asked us not to bring it in to service. Usually for a problem like this we would move traffic off one interconnect to the other to see if the problem follows of stays; but we're unable to do that at this moment.
We've seen 6 drops on Sunday so far - We are currently awaiting TalkTalk to confirm if we're able to bring up our Telehouse interconnect as so far, the cause of the packetloss on our Equinix interconnect has not been discovered.
We've seen further drops on Sunday. TalkTalk have advised us not to re-enable our Telehouse interconnect as they say that will cause an increase in problems. They are also have a meeting booked with their core engineers on Monday morning to go over the issues.
We're awaiting a call with TalkTalk and will update this post as soon as possible. To help things a little we've increased the timeout of PPP connections. This means that we will wait longer before dropping PPP which should make lines a little more stable, albeit they will still have packet loss. Unfortunately this change won't apply until the next PPP log in - which means it won't help on the next blip, but hopefully on the one after.
Update Monday afternoon: TalkTalk engineers are still investigating the problems and are focusing on the Telehouse side of their network. The last drop we saw was at 00:30 on Monday 3rd October. We have added extra monitoring of our Equinix interconnect.
Having not had a drop since 00:30 on Monday morning we had a drop Tuesday evening at around 17:39. This is being treated very seriously. (As mentioned above, any further drops may have less of an impact on customers due to our 'timeout' change.)
We have a call booked with TalkTalk this morning to review the problems.
Summary of where we are at:
We have two independent interconencts to TalkTalk and both are experiencing problems and affecting our service to customer on TalkTalk backhaul.
TalkTalk are not wanting us to enable our Telehouse interconnect as they have capacity problems with their equipment. They are afraid that if we bring up our link the extra traffic would cause authentications problems for both our customers and other TalkTalk partners - lines would not be able to connect. In addition to the problems, (coincidently?) TalkTalk have an an ongoing project to move their partners in Telehouse to a new platform. This is a long term project with partners being moved over one by one and with our link being scheduled for February 2023.
For resilience, for cases just like this, we have a second link to TalkTalk in a separate datacentre, Equinix LD8. Ever since we downed our Telehouse link this Equinix link stops passing traffic for 30s to a minute or so - this causes broadband lines do drop and reconnect (or have packetloss). Neither TalkTalk nor us can see anything wrong with this link that pinpoints to the cause of this loss. The cause is yet unknown.
Under normal circumstances having a strange fault like we do on the Eqinix link, we would simply move traffic over to Telehouse and investigate further without causing any interruption to our customers. However, due to TalkTalk having problems in Telehouse we are unable to do this.
Next steps: We have made it abundantly clear, without a shadow of doubt, that we are totally, 100%, not happy. TalkTalk engineers are, today, working on a plan to move our Telehouse link to their new platform earlier than planned. Initially they have told us it could take a week or more to move us over. This is unacceptable and we have asked them to go away and come back to us with a better plan. We will hear back from TalkTalk this afternoon.
PLANNED WORK THIS EVENING: TalkTalk have re-patched our Telehouse cable in to their new hardware. We are both in the process of configuring each end of the link. This means we should be able to re-enable our Telehouse link this evening or overnight tonight. We'll post further updates as work is carried out.
We're working on bringing the Telehouse link up now.
The Telehouse link is up and passing some traffic. We have further work to do with our configuration to move all traffic over to Telehouse before we can take down the Equinix port. So at the moment things are on their way to being better but we're still at risk of packetloss/drops.
We have successfully moved all TalkTalk traffic off our Equinix link and on to our Telehouse link. We are now hopeful that the service will now be stable and we will begin further investigations in to the problems with the Equinix link without further interruption.
The TalkTalk service remains stable whilst running on our Telehouse link. We are in the process of moving over our Equinix crossconnect to TalkTalks new platform and this will be the first stage of diagnosing the packet loss issue we have with the existing connection. This move is likely to take a few days to organise.
We are still in the process of moving our Equinix interconnect on to TalkTalk's new platform, we're hoping this will complete later this week.
We are still waiting for the datacentre to move our cross-connect to TalkTalk's new platform. This is currently scheduled to happen on 20th October though.
We are in the process of testing the new interconnect.
We will be having some further planned work to bring the link back in to service. These will raised as separate incidents on the status page.
Our link to TalkTalk in Equinix is back online.