We're investigating the loss of a number of TalkTalk connected lines. Updates to follow shortly.
Lines are reconnecting...
Whilst many lines have reconnected successfully, some are failing to stay online, we're still investigating.
The initial cause of this outage was planned work on our TalkTalk interconnect in our Telehouse datacentre by TalkTalk. This would normally be OK, except that our second interconnect in a different datacentre, Equinix, was taken out of service by ourselves last week due to a separate incident: https://aastatus.net/37182 (aggggghhhhhhhhhhhhhhh)
Not all lines have reconnected, we're expecting those that are not back yet should reconnect soon.
A (smaller) number of lines are still failing to reconnect. We're investigating this - at this point we're not sure if this is related to the planned work TalkTalk are doing at the moment or if it's something separate.
There is still the smaller number of TalkTalk lines that have not been connect - we're still investigating the cause of this. The TalkTalk work is still happening and we'd expect these lines to reconnect once the work has completed (by 6AM) - but in the mean time we're still looking at what is causing these lines not to reconnect.
We're expecting TalkTalk to finish their work any moment, our interface to them is still down so they have not finished just yet. Currently most lines are working, there are about 100 TalkTalk lines which are still down.
TalkTalk's work is over running and has not completed. We're taking some steps to re-route the affected lines still down in an effort to restore their service.
The ~100 affected lines are not connecting successfully.
TalkTalk's work in Telehouse datacentre is still ongoing, so our traffic is going over our interconnect to TalkTalk in Equinix datacentre.
The work we carried out to re-route the remaining ~100 lines involved us making changes to the LACP configuration of some of our LNSs. Tonight's problem has highlighted a problem with one of our core switches which was also partly involved in the problems last week. We'll be planning some out of hours work ourselves in the coming days to diagnose this further.
The remaining ~100 customers are back online, but we'll keep this incident open for the time being. If customers are not online, then please try rebooting your router so as to force a fresh connection, and then get in touch with us.
TalkTalk have raised an incident regarding their work over running - https://aastatus.net/37301 however, our services are working over our second interconnect so this is not affecting our services.
There are a handful of customers who still have lines down, we're looking at these on an individual basis - do get in touch if this affects you.
The lines that are still offline are rather odd - most seem to not be relayed on to us by TalkTalk - so we never see the connection coming in to us. TalkTalk are still in the process of rolling back their planned work changes: https://aastatus.net/37301 Once that work has been completed we're expecting these remaining lines to reconnect - in the meantime we're still investigating why they are not connecting now!
TalkTalk have finished their planned work, and our Telehouse interconnect is back up and running. We still have a small number of lines that are failing to connect - in these cases, either TalkTalk are not seeing the end user router attempt to log in, or TalkTalk are not passing the connection on to us. We are compiling a list of these affected lines, so do get in touch if you have not already. We are then passing examples on to TalkTalk to investigate between is.
In some of these cases, powering off the router/modem for 20 minutes has resulted in the line reconnecting successfully.
We have seen some lines reconnect that have been off - maybe not all lines are back, but many have reconnected. There are probably only a small handful of lines that are off still - but do get in touch if you are still off as we are investigating these on an individual basis.
Summary and further work
Here is a brief summary of this incident and what we are planning to do to improve matters.
We do apologise to those affected by this incident, especially those who didn't reconnect successfully early on.