BT Core Link Drop
MAJOR Closed Broadband and Ethernet
STATUS
Closed
CREATED
Aug 13, 01:33 PM (11½ years ago)
AFFECTED
Broadband and Ethernet
STARTED
Aug 13, 01:23 PM (11½ years ago)
CLOSED
Aug 13, 05:30 PM (11½ years ago)
SCOPE
33%
REFERENCE
1593 / IMT29366 ESC270616 AB44096
INFORMATION
  • INITIAL
    11½ years ago by Andrew

    One of our core links to BT droped affecting a third of our customers.

    Lines are recovering and re-logging in.

    It may take a few minutes for all lines to reconnect.

    More details to follow

  • UPDATE
    11½ years ago

    Most lines are back online. There are some which are not though and are 'flapping' (going up and down).

    To restore service we are clearing lines off the LNS (core router) that the faulty BT link is on. This should then fix these remaining lines. - a side affect is that it will also mean that some BE lines will disconnect and reconnect. We apologise for this.

  • UPDATE
    11½ years ago

    We are still seeing some lines 'flapping' we're investigating the cause of this.

  • UPDATE
    11½ years ago

    More lines are now in this 'flapping' state, we are still working on this.

  • UPDATE
    11½ years ago

    We have restarted the 'A' LNS in an attempt to stabalise connections.

  • UPDATE
    11½ years ago

    Lines on the 'A' LNS are now looking stable. We're working on the other LNS's.

  • UPDATE
    11½ years ago

    We have restarted the 'C' LNS and lines are reconnecting.

  • UPDATE
    11½ years ago

    We have restarted the 'D' LNS and lines are reconnecting.

  • UPDATE
    11½ years ago

    Having restarted our LNs's lines are reconnecting and are remaing stable.

    (Graphs for today up to ~2:15pm would have been lost though.)

  • UPDATE
    11½ years ago

    ADSL lines are looking stable now.

    We are still in contact with BT about the link that is currently still down, and we will be reviewing how we can cope better with this type of outage in the future.

  • UPDATE
    11½ years ago

    It is not clear exactly why the loss of a BT link has caused things to be come unstable. Restarting all of the LNSs had resolved this, and we are investigating this. We suspect there is an issue with the routing to BT when one of the links is down, and so may be doing some planned work in due course to make changes that could improve matters.

    In the mean time we are trying to get the failed link back up and working.

  • UPDATE
    11½ years ago

    BT have raised an incident and escalated it internally.
    More update when we hear back from BT. 

  • UPDATE
    11½ years ago

    We have seen a couple of knock on effects with the flapping lines - the LNSs are very fast, and so handled lines flapping way faster than the RADIUS accounting database can keep up. As a result, things like colours of lines shown on clueless control pages, and text updates for lines flapping, are a tad behind. It is all catching up.

    We also experienced a knock on effect with database updates causing some secondary servers to get busy. This is something we are fixing properly in the long term, but it meant some issues with telephony, which were a bit unexpected. That has now been resolved, and in the longer term these secondary servers should no longer struggle when there are a lot ofg database updated.

  • UPDATE
    11½ years ago

    Several people have reported some web pages being slow - it seems some of the sessios have come up with low MTUs and this is affecting a number of customers. We are clearing affected tunnels manually to rectify this so some people may see a PPP restart.

  • UPDATE
    11½ years ago

    We can see how the timing of the LNS restart we did could have resulted in the MTU settings being wrong on the first few tunnels, and have made config changes for the future. I have cleared the affected sessions and they have reconnected cleanly.

  • UPDATE
    11½ years ago

    A key thing here is ensuring that the niggles of today are permanently fixed - we are, of course, working on that. Most things are fixed for long term, and the one issue (the flapping) is still being investigated and we have an idea what to do for that as planned maintenance. Sorry for any inconvience.

  • UPDATE
    11½ years ago

    Looking at the overall usage graphs for transit links, we can see that even though some lines were flapping a lot, and most had some issues, overall people were still able to use the internet.

  • UPDATE
    11½ years ago

    We have normality, I repeat, we have normality. Anything you still can't cope with is therefore your own problem...

    The accounting has all caught up at last, sorry for the delay, and the various delayed texts and emails.

    Thank you all for your patience.

  • UPDATE
    11½ years ago

    BT have an engineer on site investigating the fault. Customers are conected via the other fibre links we have though.

  • UPDATE
    11½ years ago

    BT have fixed the broken fibre by replacing the service card.

    This will be monitored.

  • UPDATE
    11½ years ago

    Our monitoring is reporting that the fibre is not stable, and has been dropping. This fibre is not in use by customers so it's not affecting any customers. This has been passed on to BT.

  • UPDATE
    11½ years ago

    BT have been testing this fibre today, they are not seeing a problem and the drops that we saw this morning have not happened since 9am. We'll continue to monitor it though.

  • UPDATE
    11½ years ago

    BT confirm that they are seeing alarms on equipment, the investigation continues.

  • UPDATE
    11½ years ago

    We are still working with BT to resolve this fibre issue. On of our staff was at the datacentre yesterday and BT have changed the service card yet again. However, we're still seeing the port flapping, and BT see alarms at their side. Further BT engineers are tasked to work on this again today.

    In the meantime, we continue to use the other fibres that are part of this 'resilient' set so that service is unaffected.

  • RESOLUTION
    11½ years ago by Andrew

    BT replaced the service card on the NTE at our side of this circuit, and the fibre has remained stable.

  • Closed