Blip on 1/3 of BT lines
MINOR Closed Broadband and Ethernet
STATUS
Closed
CREATED
Apr 29, 02:43 PM (8¾ years ago)
AFFECTED
Broadband and Ethernet
STARTED
Apr 29, 02:38 PM (8¾ years ago)
CLOSED
Apr 29, 03:23 PM (8¾ years ago)
REFERENCE
2128 / AA2128
INFORMATION
  • INITIAL
    8¾ years ago by Adrian

    A third of our BT lines bliped - this looks to be an issue with routing on one of our LNSs in to BT.

  • UPDATE
    8¾ years ago by Andrew

    Many lines are failing to reconnect properly, we are investigating this.

  • UPDATE
    8¾ years ago by Andrew

    Lines are connecting successfully now

  • UPDATE
    8¾ years ago by Andrew

    The bulk of lines are back onlne. There are a small number of lines that are still failing to reconnect. These are being looked in to.

  • UPDATE
    8¾ years ago by Andrew

    The remain lines are reconnecting successfully now.

  • RESOLUTION
    8¾ years ago by Adrian

    I wanted to try and explain more about what happened today, but it is kind of tricky without saying "Something crazy in the routing to/from BT". We did, in fact make a change - something was not working with our test LNS and a customer needed to connect. We spotted that, for some unknown reason, the routing used a static route internally instead of one announced by BGP, for just one of the four LNSs, and that on top of that the static route was wrong, hence the test LNS not working via that LNS. It made no sense, and as all three other LNSs were configured sensibly we changed the "A" LNS to be the same, after all, this is clearly a config that just worked and was no problem, or so it seemed. Things went flappy, but we could not see why. It looks like BGP in to BT was flapping, so people connected and disconnected rather a lot. We returned the config and things seemed to be fixed for most people, but not quite all. This made no sense. Some people are connecting and going on line, and then falling off line. The "fix" to that was to change the endpoint LNS IP address used by BT to an alias on the same LNS. We have done this in the past where BT have had a faulty link in a LAG. We wonder if this issue was "lurking" and the problem we created showed it up. This shows that there was definitely an issue in BT somehow as the fix should not have made any difference otherwise. What is extra special is that this looks like it has happened before - the logs suggest the bodge of a static route was set up in 2008, and I have this vague recollection of a mystery flappiness like this which was never solved. Obviously I do apologise for this, and having corrected the out of data static route this should not need touching again, but damn strange.

  • Closed