We're investigating the cause of major stability issues. Update to follow ASAP.
It looks like the flappiness may have been caused by BT hostlink. Still investigating.
Things are looking better having taken a suspect hostlink out of service.
Just saw another hostlink flap!
In summary, so far, we've seen two separate hostlinks (one of which is scheduled for disruptive maintenance *tomorrow early hours*) go flappy, and just when things were settling another link chucked a load of sessions off. Very much a BT problem.
Our BT hostlinks have been stable for 25 minutes. The odd thing is that the two links to BT that have had problems this evening are in separate datacentres and connect to BT physically at different exchanges!
BT hostlinks have been stable since around 3:30. We'll keep this post open for the time being.
The cause of this disruption was BT planned work to carry out 'invasive testing' on our links. They have confirmed that the work has been completed.
They failed to inform us of this. We already have a formal complaint regarding previous lack of notifications, and BT have since been sending us notification of works (eg the one for 27th March) manually to us. This is being followed up with our account manager.
We do apologise to our customers who were affected by this.
We're furious.
We have had further information from BT about their work. The work was on a transmission link between two datacentres, and as part of that all ports on devices that use the link also have their ports disabled and enabled. As a result we saw one port on each pair of our host links go down and up around 15 times each - at the same time. As this was not cleanly shutdown by BT it caused traffic to break and customers to drop and reconnect multiple times between midnight and 3:30AM.