All lines dropped - major connectivity issue affecting many ISPs
Connections are re-establishing - we'll try and get an explanation out of BT.
Seems clear this affected a lot of ISPs using BT backhaul, not just us. Customers with dual BT+TT or BT+BE services through us automatically fell back to just the non BT lines.
It seems to be taking some time for lines to come back.
We are seeing a second wave of connections. When BT have a major issue like this they "default accept" and give a dummy connection for a short period. As those lines drop and reconnect to us we get a second lot of connections. Even so we only have about half of our lines back yet which suggests that BT may be struggling at a RADIUS level.
People have connected in several waves, and we suspect BT RADIUS servers were struggling. This looks pretty widespread. No more details yet. It looks like almost everyone is back on line though.
Details from BT: " Retrospective report. There was a brief loss of service to 100,00 broadband customers. This was cause when the Boarder gateway protocols (BGP) dropped causing a loss of service to all WBMC customers in the Stepney Green area for 2 minutes. The root cause is under investigation by the operational team. Service was fully restored at 15:04 BT regrets any inconvenience this may have caused. "
Since line dropped we are seeing issues with apparent latency spikes and packet loss.
Lines have dropped again.
We have spoken to BT, they are well aware of the drops today, and we have given them new information regarding the packetloss that we and other ISPs have been noticing since the initial drop at 3pm
Since the drop at 17:53, many customers are getting logged in to a default BT service and getting BT IP addresses. We suggest customers wait a short while, and reboot their routers. This incident is still open with BT.
Lines have dropped again.
They are flapping still - all going offer, and then coming back over a period of time, rinse, repeat. Still waiting for BT to fix it.
The last big blip was 18:27, but we are seeing smaller blips every 10 minutes or so. Up until 19:16 this was around 10% of lines, and since has been around 5% of lines. Clearly this is something BT are having to work on throughout there network somehow. The last small blip was 20:05 and it has been well over 10 minutes now, so looking hopeful that this may finally be sorted.
Sorry, as I typed that, another small blip at 20:20
Last blip was 20:26, so looking like things may finally be sorted.
And all BT lines go again - chasing BT now
BGP sessions still down, over 12 minutes now. BT think they are changing a card, but that it should not have caused a loss in service as they re-routed traffic via a different node while they do it. Hence they are now on conference calls.
From BT: This is what happened yesterday: As promised here is the latest situation regarding the issues seen at Stephney. It had been identified that a ‘supervisor’ card is faulty and half the traffic from Stephney has been diverted to Faraday to lighten the strain. You will see a likelihood of a loss of resilience however hopefully traffic is flowing, albeit at a lowered rate, but customer should have some connection. There is currently a conference call regarding this issue on-going and there are multiple parts of the BT Group endeavouring to resolve this issue. There is a plan to change out the faulty card on an IMT with reference 30031/13 at or around 02:00 in the morning.
So, basically, they found the dodgy card last night, and planned to change it. They diverted traffic, or so they thought, and then started the card change just after 3am. Then they noticed that all the traffic stopped. As usual we were the first to get in touch with BT but other ISPs are on to them as well now. They are continuing with the card change (understandable). I imagine there will be questions to answer as to why the diverting did not work, and once again how we have a "single point of failure" again. This has been upgraded to a major incident in BT now.
BGP back, sessions coming up now.
Looks like over 95% of lines back now - some on a default accept will try again and all should be normal shortly. Lets hope that is it sorted for good. We'll post if we get more details from BT - I expect a proper report in a few days.