Order posts by limited to posts

Yesterday 01:17:44
Details
Monday 21:38:18
We are having reports this evening of some lines being unable to log in, but are in sync. We are investigating.
Update
Monday 22:00:52
We believe we have identified the problem and are working on a fix.
Update
Monday 22:17:51
Lines are logging in successfully now. If you are still off, please keep trying.
Resolution An issue with authentication on the "C" LNS, and then on the "D" LNS. We have found the issue, and lines are connecting to "D" cleanly now. The underlying issue causing this is being investigated.
Started Monday 21:37:18
Closed Yesterday 01:17:44
Cause BT

10 Jul 20:10:00
Details
10 Jul 19:18:35
We are seeing a problem with BT 21CN ADSL and FTTC circuits being unable to log in since approximately 18:00 today. Existing sessions are working fine but are failing to reconnect when they drop. 20CN ADSL and TalkTalk backhaul circuits are working fine.

BT have raised incident IMT25152/14 which looks to be related, but just says they are investigating a problem.

Update
10 Jul 22:16:28
BT have reported that service should have been restored as of 20:10 this evening.

Customers who are still having problems should attempt to re-connect as they may be stuck on a BT holding session.

Anyone still having problems after doing that should contact tech support.

Started 10 Jul 17:15:00
Closed 10 Jul 20:10:00
Cause BT

19 Nov 2013 23:22:00
Details
19 Nov 2013 21:51:40
It looks like some lines on the Wolverhampton BRAS are showing significant problems. BT have issued an incident report IMT45679/13 but there are no further details at this time.
Resolution Closure Details: Service was restored at 23:22 when redundancy switch over was performed to move the service to standby path. Service stability have been observed. BT regrets any inconvenience this may have caused.
Started 19 Nov 2013 20:30:00 by AAISP Staff
Closed 19 Nov 2013 23:22:00

21 Oct 2013 16:04:00
Details
21 Oct 2013 15:57:52
TalkTalk Wholesale lines are currently off line. We are investigating.
Update
21 Oct 2013 15:59:09
Some lines are coming back online..
Update
21 Oct 2013 16:01:56
Other wholesalers are having similar problems
Update
21 Oct 2013 16:04:56
Most lines are back online.
Update
21 Oct 2013 16:20:42
We had notified TalkTalk at the start of this, they have replied saying that they have identified some network alarms. We'll update when we get more information. For now this fault is closed as lines are back online.
Update
21 Oct 2013 16:30:43
TalkTalk confirmed that this affected many other of their wholesalers.
Update
23 Oct 2013 12:01:23
We've not had the report from TT yet, but we have chased TT for it.
Update
15 Nov 2013 16:50:54
This is the report from TalkTalk regarding this incident:
Problem Management have carried our further investigations. Please see the root cause below and preventative measures:
On Monday 21/10/2013, following some network device reboots, traffic was lost on two aggregate interconnects. We confirmed this using graphs on Cacti. Investigations were carried out by the B2B team and at 3:53pm on Monday 21st October a configuration addition was implemented on the ldn-vc1.thn device to try and load balance traffic over the two interconnects. Unfortunately this change affected the global routing of all traffic on the box (including management and customer data traffic). As part of the standard B2B procedure, the commit confirmed feature was used on the device when the change was made which resulted in the device returning back to normal service (management and data traffic) after 2 minutes. Following a review of this incident by B2B it was determined that the change was high risk. B2B processes have now been changed to always perform risk analysis on any future configuration changes. As well as this, any future configuration changes deemed necessary should be completed out of hours via the TTT PEW change process.
Sincere apologies for any inconvenience this has caused.
Resolution Summary of events:
Lines went off at 15:53
The started coming back online at 15:56
Most, if not all lines were back online by 16:04
We will have a report from TalkTalk as to the reason for this later today (22nd)
Started 21 Oct 2013 15:52:37
Closed 21 Oct 2013 16:04:00

14 Jul 2013 04:05:05
Details
13 Jul 2013 15:00:49
All lines dropped - major connectivity issue affecting many ISPs
Update
13 Jul 2013 15:05:16
Connections are re-establishing - we'll try and get an explanation out of BT.
Update
13 Jul 2013 15:09:34
Seems clear this affected a lot of ISPs using BT backhaul, not just us. Customers with dual BT+TT or BT+BE services through us automatically fell back to just the non BT lines.
Update
13 Jul 2013 15:14:35
It seems to be taking some time for lines to come back.
Update
13 Jul 2013 15:24:45
We are seeing a second wave of connections. When BT have a major issue like this they "default accept" and give a dummy connection for a short period. As those lines drop and reconnect to us we get a second lot of connections. Even so we only have about half of our lines back yet which suggests that BT may be struggling at a RADIUS level.
Update
13 Jul 2013 16:00:45
People have connected in several waves, and we suspect BT RADIUS servers were struggling. This looks pretty widespread. No more details yet. It looks like almost everyone is back on line though.
Update
13 Jul 2013 16:41:55
Details from BT: " Retrospective report. There was a brief loss of service to 100,00 broadband customers. This was cause when the Boarder gateway protocols (BGP) dropped causing a loss of service to all WBMC customers in the Stepney Green area for 2 minutes. The root cause is under investigation by the operational team. Service was fully restored at 15:04 BT regrets any inconvenience this may have caused. "
Update
13 Jul 2013 17:17:54
Since line dropped we are seeing issues with apparent latency spikes and packet loss.
Update
13 Jul 2013 17:43:28
Lines have dropped again.
Update
13 Jul 2013 17:50:59
We have spoken to BT, they are well aware of the drops today, and we have given them new information regarding the packetloss that we and other ISPs have been noticing since the initial drop at 3pm
Update
13 Jul 2013 17:52:13
Since the drop at 17:53, many customers are getting logged in to a default BT service and getting BT IP addresses. We suggest customers wait a short while, and reboot their routers. This incident is still open with BT.
Update
13 Jul 2013 18:29:13
Lines have dropped again.
Update
13 Jul 2013 19:04:38
They are flapping still - all going offer, and then coming back over a period of time, rinse, repeat. Still waiting for BT to fix it.
Update
13 Jul 2013 20:21:10
The last big blip was 18:27, but we are seeing smaller blips every 10 minutes or so. Up until 19:16 this was around 10% of lines, and since has been around 5% of lines. Clearly this is something BT are having to work on throughout there network somehow. The last small blip was 20:05 and it has been well over 10 minutes now, so looking hopeful that this may finally be sorted.
Update
13 Jul 2013 20:21:54
Sorry, as I typed that, another small blip at 20:20
Update
13 Jul 2013 20:50:38
Last blip was 20:26, so looking like things may finally be sorted.
Update
14 Jul 2013 03:08:58
And all BT lines go again - chasing BT now
Update
14 Jul 2013 03:18:32
BGP sessions still down, over 12 minutes now. BT think they are changing a card, but that it should not have caused a loss in service as they re-routed traffic via a different node while they do it. Hence they are now on conference calls.
Update
14 Jul 2013 03:23:02
From BT: This is what happened yesterday: As promised here is the latest situation regarding the issues seen at Stephney. It had been identified that a ‘supervisor’ card is faulty and half the traffic from Stephney has been diverted to Faraday to lighten the strain. You will see a likelihood of a loss of resilience however hopefully traffic is flowing, albeit at a lowered rate, but customer should have some connection. There is currently a conference call regarding this issue on-going and there are multiple parts of the BT Group endeavouring to resolve this issue. There is a plan to change out the faulty card on an IMT with reference 30031/13 at or around 02:00 in the morning.
Update
14 Jul 2013 03:30:38
So, basically, they found the dodgy card last night, and planned to change it. They diverted traffic, or so they thought, and then started the card change just after 3am. Then they noticed that all the traffic stopped. As usual we were the first to get in touch with BT but other ISPs are on to them as well now. They are continuing with the card change (understandable). I imagine there will be questions to answer as to why the diverting did not work, and once again how we have a "single point of failure" again. This has been upgraded to a major incident in BT now.
Update
14 Jul 2013 03:42:35
BGP back, sessions coming up now.
Update
14 Jul 2013 04:06:01
Looks like over 95% of lines back now - some on a default accept will try again and all should be normal shortly. Lets hope that is it sorted for good. We'll post if we get more details from BT - I expect a proper report in a few days.
Started 13 Jul 2013 14:57:30
Closed 14 Jul 2013 04:05:05

31 May 2013 20:20:00
Details
31 May 2013 18:43:08

Looks like a lot of lines via BT dropped, and a few reconnected, but dropped again.

Looks like BT only, not specific to any one of our LNSs, started with most of Stepney based BRASs going off.

Update
31 May 2013 18:43:59

What is odd is it seemed to be BRASs toppling one after another. Very strange.

Update
31 May 2013 18:44:53

Looks to have stabalised with around half of BT lines being off.

Update
31 May 2013 19:11:19

No update from BT yet.

Update
31 May 2013 19:14:43

Some of our customers with access to other ISPs using BT backhaul have confirmed that this issue is not just A&A/BT lines but is more widespread.

Update
31 May 2013 19:21:25

We still cannot get hold of BT, and still have no update from BT...

Update
31 May 2013 19:32:26

Customers are getting a web page which suggests various possible reasons but does not list BT as a possible cause, not amused.

We also got "21CN Incident Report :21CN WBC : STEPNEY GREEN : LOSS OF SERVICE ( CANCELLED ) : Reference 23981 : Issue FINAL" which makes no sense as they seem to cancelled the "incident".

Still no reply trying to contact BT

Update
31 May 2013 19:38:55

Finally. BT have confirmed they did get our report, sent within 60 seconds of the incident starting. And they are aware of a major issue. Trying to get more details.

So far "seems to be quite wide-spread, with multiple CPs, not yet got to root cause as we have also lost some management layers"

Update
31 May 2013 20:05:33

It looks like around half of lines are down, but all of RADIUS is down, so anyone reconnecting now will not connect - if you are on-line and working then don't touch anything!

Update
31 May 2013 20:10:59

Finally we get "INITIAL REPORT for IMT 23989/13 : Advanced notice of potential loss of Broadband services."

Yes, they say "advanced notice" for something taht started 90 minutes ago, and "potential" for something that has wiped out half our lines!

"Geographical area     : Nationwide - Geographical region unknown at present"

Update
31 May 2013 20:53:38

We are seeing lines coming back.

Update
31 May 2013 20:56:30

I'll leave the status post open for now and will update when we get explanation from BT. Thanks for your patience everyone.

Update
31 May 2013 21:07:53

Please note, on an unrelated note, we are re-mapping LNSs. This was to be done over night but we changed the routing while lines were down to avoid a second blip over night. This means graphs are showing without any history earlier in the day for lots of lines that were off during the outage. The graphs will be re-instated over night as usual and you can then see the outage clearly.

Update
11 Jun 2013 14:42:59

Report from BT regarding this outage:

http://wiki.aaisp.org.uk/index.php/File:IMT23989.pdf

Broadband Users Affected 50%
Closed 31 May 2013 20:20:00
Cause BT

31 May 2013 19:10:31
Details
31 May 2013 18:41:03
Lines: 28% ALL and 33% BT and 39% 20CN and 31% 21CN and 35% FTTC and 100% 21CN-REGION-21CN-BRAS-RED10-LO0-PE and 80% 21CN-REGION-21CN-BRAS-RED11-MQD and 100% 21CN-REGION-21CN-BRAS-RED11-SL and 100% 21CN-REGION-21CN-BRAS-RED12-GI-B and 100% 21CN-REGION-21CN-BRAS-RED12-L-WAT dropped at 2013-05-31 18:39:47
We have advised BT
This is likely to have affected multiple internet providers using BT
Resolution

Duplicate post

Broadband Users Affected 13%
Started 31 May 2013 18:39:47 by AAISP automated checking
Closed 31 May 2013 19:10:31
Cause BT

25 Mar 2013 04:41:50
Details
25 Mar 2013 02:04:59

Since around 1am it looks liike custoemrs are having problems connecting. We have done an LNS upgrade which means around a third of customers are off line. This is being investigated now.

Update
25 Mar 2013 02:12:12

This looks related to the upgrade and seems to be related to platform RADIUS, but we are still investigating now.

Update
25 Mar 2013 02:21:37

Lines are coming back, but we are working to understand the issue as it appears to be RADIUS or BGP related.

Update
25 Mar 2013 02:31:30

The fix to this makes no sense, and will be reviewed later in the morning. The issue appears to relate to platform RADIUS from BT. We changed the way the endpoints are announced to BT and it all sprung in to life, but that makes no sense. The LNS upgrade does not explain it either. There may be some planned work as a result of this to investigate further.

Update
25 Mar 2013 02:50:21

The problem was not jus BT, but the fix we found was a result of changing BGP to BT which caused it to flip which RADIUS it was using. We have confirmed the issue is platform RADIUS and caused by LNS upgrades. The platform RADIUS is currently directed to a non-upgraded box and so BT, BE and data SIMs should all be able to connect now.

There will be s/w upgrades over the coming nights to correct this.

Update
25 Mar 2013 02:56:04

It looks like some issues with BE lines still, which we are working on

Update
25 Mar 2013 03:02:23

We have found the underlying issue, and will be re-jigging lines over night, sorry for any inconvenience.

This will me some loss of graphs.

Update
25 Mar 2013 03:16:06

The error has been corrected, and lines are being moved between the two affected LNSs now. Graphs for around 1/3 of customers will be missing before around 3:30 this morning.

Update
25 Mar 2013 03:34:20

Even with two working platform RADIUS servers, Be lines are still not connecting. We are not sure if their RADIUS have somehow blacklisted us because of lack of response, or what. Still working on that.

Update
25 Mar 2013 03:47:15

We have asked fluidata (who manage the BE connection) to check their RADIUS servers for us. This may come back anyway over night if it is a timeout/blacklist issue on their RADIUS.

Update
25 Mar 2013 04:42:55

We have just had a resonse from Fluidata and they seem to have given their end a kick and all is well now.

Graphs lost for some lines before around 03:30, and some other LNS updates over night during the week.

Started 25 Mar 2013 00:50:00
Closed 25 Mar 2013 04:41:50

02 Jan 2013 03:45:00
Details
02 Jan 2013 03:44:08

Not sure what yet

Update
02 Jan 2013 03:46:53

Routers and LNSs are all up and did not restart or anything, not clear what happened yet.

Update
02 Jan 2013 03:49:42

The issue lasted a few minutes and seems to be over with services back to normal. So far we can see that a number of BGP sessions internal to the Telehouse rack restarted. It almost looks like a switch restart, but we are still checking.

Update
02 Jan 2013 03:54:47

Checks confirm nothing rebooted, not even one of the network switches, so it is not clear what happened at this stage, we are still investigating, but the problem itself is over at present.

Started 02 Jan 2013 03:39:00
Closed 02 Jan 2013 03:45:00

13 Dec 2012 11:52:33
Details
13 Dec 2012 09:07:30

We have just seen a lot of 20CN and 21CN lines around scotland drop.

We are reporting this to BT now. 

Update
13 Dec 2012 09:15:24

Looks like the BT metro node in Scotland is down as it is affecting 20 and 21CN lines. 

Update
13 Dec 2012 09:26:56

BT operate are aware and currently investingating. 
BT will update us when they have more news. 

Update
13 Dec 2012 09:52:41

BT have raised a incident against Scotland, still awaiting an update as to what is happening.

Update
13 Dec 2012 10:10:46

Bt say they have lost service to 2 10Gb circuits. Diagnostics still ongoing.

Update
13 Dec 2012 10:31:53

BT engineers are on there way to site.

Update
13 Dec 2012 10:35:05

BT have an engineer on site and a second engineer en route to assist.
Should have further news within the hour.

Update
13 Dec 2012 11:04:46

BT are currently working to restoe service.

Update
13 Dec 2012 11:09:54

Lines coming back up.

Update
13 Dec 2012 11:22:32

From BT:

The affected router has had stopped process restarted and the controlled recovery of the device is being closely monitored. Service should be available within the next 30 minutes. The geographical area of impact is across Scotland, the North of England and Northern Ireland, on traffic that was terminating on this router. 

Update
13 Dec 2012 11:53:29
Service fully restored at 11:15 after a process restart. If you are still down please reboot your router, if you are still unable to reconnect please contact support.
Update
13 Dec 2012 18:40:00

Some more info about this outage on the ISP Review website: http://www.ispreview.co.uk/index.php/2012/12/bt-20cn-and-21cn-broadband-isp-lines-down-in-scotland-and-north-england.html

Started 13 Dec 2012 09:06:19 by AAISP Staff
Closed 13 Dec 2012 11:52:33
Cause BT

08 Dec 2012 08:49:00
Details
08 Dec 2012 08:47:12

More issues, this time broadband lines.

To summarise - a routine upgrade this morning resulted in a problem, but only when the last box was upgraded. We had upgraded routers during the week, as we don't upgrade everything all at once. However the nature of the problem was such that only when the last box was upgraded did problems happen.

The effect was a few seconds with no routing, and then a period where all worked. This repeated for around 6 minutes until fixes were applied.

We can see the cause, but this effect was not foreseen, so we are looking in to how we can avoid such issues in futre.

Sorry for any inconvenience.

Note: This would not have causes broadband lines to drop sync or lose connection at all - this was entirely a routing issue to the wider internet.

Update
08 Dec 2012 08:49:39

The issue is exactly the same as we had in Maidenhead so has been fixed quickly.

Update
08 Dec 2012 09:33:26

There have been reports that some parts of the internet damped flapping routes for some minutes after we applied a fix. These seem to be relatively few and far bewteen as overall traffic levels returned to normal levels immediately. Obviously there is not a lot we can do over such policies in individual ISPs.

Started 08 Dec 2012 08:43:00
Closed 08 Dec 2012 08:49:00
Previously expected 08 Dec 2012 08:49:00

13 Aug 2012 17:30:28
Details
13 Aug 2012 13:33:01

One of our core links to BT droped affecting a third of our customers.

Lines are recovering and re-logging in.

It may take a few minutes for all lines to reconnect.

More details to follow

Update
13 Aug 2012 13:48:04

Most lines are back online. There are some which are not though and are 'flapping' (going up and down).

To restore service we are clearing lines off the LNS (core router) that the faulty BT link is on. This should then fix these remaining lines. - a side affect is that it will also mean that some BE lines will disconnect and reconnect. We apologise for this.

Update
13 Aug 2012 14:02:24

We are still seeing some lines 'flapping' we're investigating the cause of this.

Update
13 Aug 2012 14:11:41

More lines are now in this 'flapping' state, we are still working on this.

Update
13 Aug 2012 14:13:39

We have restarted the 'A' LNS in an attempt to stabalise connections.

Update
13 Aug 2012 14:15:36

Lines on the 'A' LNS are now looking stable. We're working on the other LNS's.

Update
13 Aug 2012 14:17:08
We have restarted the 'C' LNS and lines are reconnecting.
Update
13 Aug 2012 14:18:52
We have restarted the 'D' LNS and lines are reconnecting.
Update
13 Aug 2012 14:20:11

Having restarted our LNs's lines are reconnecting and are remaing stable.

(Graphs for today up to ~2:15pm would have been lost though.)

Update
13 Aug 2012 14:32:14

ADSL lines are looking stable now.

We are still in contact with BT about the link that is currently still down, and we will be reviewing how we can cope better with this type of outage in the future.

Update
13 Aug 2012 14:33:13

It is not clear exactly why the loss of a BT link has caused things to be come unstable. Restarting all of the LNSs had resolved this, and we are investigating this. We suspect there is an issue with the routing to BT when one of the links is down, and so may be doing some planned work in due course to make changes that could improve matters.

In the mean time we are trying to get the failed link back up and working.

Update
13 Aug 2012 14:35:35

BT have raised an incident and escalated it internally.
More update when we hear back from BT. 

Update
13 Aug 2012 15:41:28

We have seen a couple of knock on effects with the flapping lines - the LNSs are very fast, and so handled lines flapping way faster than the RADIUS accounting database can keep up. As a result, things like colours of lines shown on clueless control pages, and text updates for lines flapping, are a tad behind. It is all catching up.

We also experienced a knock on effect with database updates causing some secondary servers to get busy. This is something we are fixing properly in the long term, but it meant some issues with telephony, which were a bit unexpected. That has now been resolved, and in the longer term these secondary servers should no longer struggle when there are a lot ofg database updated.

Update
13 Aug 2012 16:33:29

Several people have reported some web pages being slow - it seems some of the sessios have come up with low MTUs and this is affecting a number of customers. We are clearing affected tunnels manually to rectify this so some people may see a PPP restart.

Update
13 Aug 2012 16:50:15

We can see how the timing of the LNS restart we did could have resulted in the MTU settings being wrong on the first few tunnels, and have made config changes for the future. I have cleared the affected sessions and they have reconnected cleanly.

Update
13 Aug 2012 16:54:28

A key thing here is ensuring that the niggles of today are permanently fixed - we are, of course, working on that. Most things are fixed for long term, and the one issue (the flapping) is still being investigated and we have an idea what to do for that as planned maintenance. Sorry for any inconvience.

Update
13 Aug 2012 17:06:23

Looking at the overall usage graphs for transit links, we can see that even though some lines were flapping a lot, and most had some issues, overall people were still able to use the internet.

Update
13 Aug 2012 17:25:43

We have normality, I repeat, we have normality. Anything you still can't cope with is therefore your own problem...

The accounting has all caught up at last, sorry for the delay, and the various delayed texts and emails.

Thank you all for your patience.

Update
14 Aug 2012 13:45:51

BT have an engineer on site investigating the fault. Customers are conected via the other fibre links we have though.

Update
14 Aug 2012 13:53:41

BT have fixed the broken fibre by replacing the service card.

This will be monitored.

Update
15 Aug 2012 10:00:35

Our monitoring is reporting that the fibre is not stable, and has been dropping. This fibre is not in use by customers so it's not affecting any customers. This has been passed on to BT.

Update
15 Aug 2012 16:05:27

BT have been testing this fibre today, they are not seeing a problem and the drops that we saw this morning have not happened since 9am. We'll continue to monitor it though.

Update
16 Aug 2012 10:15:58

BT confirm that they are seeing alarms on equipment, the investigation continues.

Update
18 Aug 2012 08:36:52

We are still working with BT to resolve this fibre issue. On of our staff was at the datacentre yesterday and BT have changed the service card yet again. However, we're still seeing the port flapping, and BT see alarms at their side. Further BT engineers are tasked to work on this again today.

In the meantime, we continue to use the other fibres that are part of this 'resilient' set so that service is unaffected.

Resolution

BT replaced the service card on the NTE at our side of this circuit, and the fibre has remained stable.

Broadband Users Affected 33%
Started 13 Aug 2012 13:23:00
Closed 13 Aug 2012 17:30:28

18 Jul 2012 21:18:37
Details
18 Jul 2012 21:13:41

We're seeing odd routing problems at the moment, this will affect internet access for DSL and Ethernet customers in London. Staff are investigating.

Update
18 Jul 2012 21:18:37

This looks to have recovered. We're still investigating the cause though.

Started 18 Jul 2012 21:05:50
Closed 18 Jul 2012 21:18:37

31 May 2012 17:59:08
Details
31 May 2012 16:14:16

We're investigating a packet loss / routing problem at the moemnt, this will be affecting broadband customers, more details to follow.

Update
31 May 2012 16:54:57

Looks like this was something quite major and country wide affecting various interconnects that the internet uses.

Things are geting back to normal now.

ADSL lines which did go down are re-connecting.

Resolution

Service has been restored. This looks to have been caused by a problem within the LINX peering network which affected other networks and interconnects accross the country.

We've re-enabled our LINX peering.

Started 31 May 2012 16:03:32
Closed 31 May 2012 17:59:08

12 May 2012 10:35:45
Details
12 May 2012 08:48:06

At 08:06:58 all 20CN and 21CN lines routed via Manchester dropped and at the same time we stopped seeing any platform RADIUS requests from BT.

This suggests a major routing issue within BT. It means that nobody can re-connect any broadband line.

We are trying to get an answer out of BT on this.

Update
12 May 2012 08:58:33

BT have a string of open issues covering Manchester area. No ETA yet.

Update
12 May 2012 09:08:05

We have confirmation that this is not just affecting us (thanks Thomas). BT are investigating.

Update
12 May 2012 09:42:39

Whilst BT are being a tad inconsistent in telling various ISPs what is happening, we have heard from another ISP that "Engineer is onsite at Manchester with a spare card. Changeout is about to commence with the assistance of vendor  support. Next update will be at 10:20"

Update
12 May 2012 10:20:20

BT are not confirming the statement that a card is to be replaced, and in fact appear to be totally ignoring me now.

Update
12 May 2012 10:42:58

Looks to be working.

Manchester lines reconnecting

Platform RADIUS working

Resolution

We are waiting for more detail on BT on this, and how it is they appear to depend entirely on a single card in Manchester for the entire BTW platform.

Started 12 May 2012 08:06:58
Closed 12 May 2012 10:35:45

11 May 2012 09:08:46
Details
10 May 2012 11:51:17

Lines on our "b.gormless" LNS dropped. We're investigating. Lines are coming back.

Update
10 May 2012 11:57:04

It looks like BT. Still investigating.

Update
10 May 2012 11:59:33

It looks like BT's link to b.gormless has gone down. We've taken b.gormless out of service.

Update
10 May 2012 12:00:24

BT fibre link is down - lines are moving to the backup LNS automatically.

Update
10 May 2012 12:35:49

BT have raised a fault and are investigating their end.
Update due by 13:15 

Update
10 May 2012 13:00:50

Most lines swicthed very quickly to the alternative LNS automatically.

The status of lines on the control pages too some time to catch up on the RADIUS accounting, but this has now happened.

Graphs may have gaps which will be repaired on the over night status.

Update
10 May 2012 19:55:22

Although broadband lines look fine this link is still down, meaning we have lost resilience on links to BT.

We are still chasing.

Update
10 May 2012 20:11:47

BT are arranging an engineer to go to the Datacentre.

Update
10 May 2012 22:25:45

Still not heard anything more.
Chasing again. 

Update
11 May 2012 09:09:57

Came back around 00:09.
It appears that this was a problem with a card BT side.
Need to confirm though. 

Resolution

Cleared fault on Card.

Closed 11 May 2012 09:08:46

16 Apr 2012 19:04:46
Details
16 Apr 2012 18:58:02

One of the three live LNSs just restarted.

We are investigating.

Update
16 Apr 2012 18:58:50
I looks like the backup systems kicked in within seconds as planned.
Update
16 Apr 2012 19:00:41
Many lines have correctly fallen back to D.gormless. They will be moved over night back to B.
Update
16 Apr 2012 19:05:14

Lines have reconnected cleanly as per the contingency plans. We have to find the cause, but I'll close this major issue for now.

(Graphs would have been lost for the day, sorry)

Broadband Users Affected 33%
Started 16 Apr 2012 18:53:14
Closed 16 Apr 2012 19:04:46

02 Apr 2012 09:03:24
Details
29 Mar 2012 00:24:09
Lines: 100% 20CN-REGION-READING and 100% 21CN-REGION-SL dropped at 2012-03-29 00:22:50
We have advised BT
This is likely to have affected multiple internet providers using BT
Update
30 Mar 2012 12:07:50

Lines came back shortly after, we'll chasing BT for information and reasons.

Update
02 Apr 2012 09:03:33

This was caused by a BT PEW.

Broadband Users Affected 7%
Started 29 Mar 2012 00:22:50 by AAISP automated checking
Closed 02 Apr 2012 09:03:24
Cause BT

14 Mar 2012 16:19:53
Details
14 Mar 2012 16:14:21

One of the three LNSs in live use just restarted and we are investigating.

Lines are moving automatiucally to the backup LNS within seconds.

Update
14 Mar 2012 16:19:53

This is the first test of the three LNS plus fallback confiuguration.

By all accounts customers reconnected very quickly - within seconds.

We have a clue what triggered the problem and we are investigating now.

Broadband Users Affected 33%
Closed 14 Mar 2012 16:19:53

27 Feb 2012
Details
26 Feb 2012 09:38:55

Something else went wrong this morning and we are investigating.

Update
26 Feb 2012 09:44:02

Another crash - this time somewhere completely different and equally impossible.

Update
26 Feb 2012 09:46:52

It looks like lines reconnected quickly over both LNSs as you would expect from a crash. This is an excellent test but obviously this sort of thing simply should not happen.

Update
26 Feb 2012 12:39:54

OK we are moving people that connected to D after it crashed over to C. It is about a 1/3 of customers that are being moved. The controlled LNS switch is deliberately slow taking one line at a time over and ensuring no load on BT or our RADIUS that could cause default accepts or other delays or problems.

Update
26 Feb 2012 12:49:06

We are installing new code with additional debugging on D, and will move some lines back to D shortly.

Update
26 Feb 2012 12:49:35

This means we will lose graphs for this morning.

Update
26 Feb 2012 12:57:35

Looks like a few dozen lines got a PPP kill twice. We have updated the scripts now, and we are moving 1000 lines over from C back to D now. The plan is to leave this for this afternoon to confirm all is working and stable.

Update
26 Feb 2012 14:20:34

We are moving some more lines over now.

Update
26 Feb 2012 15:06:16

We are moving the last of the lines over now.

Update
26 Feb 2012 15:50:43

All lines are on the new code on the "D" LNS.

The problem is we have not got to the underlying cause yet.

We'll leave service like this for the rest of the day.

Update
27 Feb 2012 14:01:47

Having been stable for over 24 hours I am closing this major issue but we will continue to monitor carefully and try and find the underlying cause.

Started 26 Feb 2012 09:07:42
Closed 27 Feb 2012

23 Feb 2012 07:28:40
Details
23 Feb 2012 07:12:01

There seems to have been some major issue over night with one of the LNSs in the new rack. Something odd with RADIUS from around 1am, and then lines dropping and reconnecting around 04:52. Graphs before 04:52 on one of the LNSs are lost as well. It seems problems persist with some BE lines though.

We are investigating.

Update
23 Feb 2012 07:15:23

We are switching lines to the right LNS now.

Update
23 Feb 2012 07:25:50

Stuck BE tunnels have been cleared and BE lines are all reconnecting now.

Update
23 Feb 2012 07:29:11

Looks like the issue here resulted in BT lines coming straight back, but BE lines not doing so. We now have all lines back.

Started 23 Feb 2012 01:00:00
Closed 23 Feb 2012 07:28:40

17 Feb 2012 12:06:00
Details
17 Feb 2012 12:07:06

We've just had a blip on some BT ADSL/FTTC lines, most are back online already. We'll investigate the cause.

Update
17 Feb 2012 12:11:53

BT lines on just one of the LNS's blipped - lines on C.Gormless blipped, lines on D.Gormless did not. 

Lines are almost all back online now.

Update
17 Feb 2012 12:14:52

This was affecting BT lines. BE lines were not affected.

Update
17 Feb 2012 12:24:32

The network switch port connected to one of our BT fibres was offline between 12:02:36 and 12:04:23.

We're investigating further.

Resolution

Half of our BT circuits dropped at 12:02, and starting coming back online at 12:05, most lines were online within a few minutes.

This was caused by a drop in one of our links to BT. A fault has been logged to BT regarding this. We will move customers off this link over the weekend so as to enable us to investigate this with BT in more detail.

Started 17 Feb 2012 12:03:00
Closed 17 Feb 2012 12:06:00

10 Feb 2012 06:27:00
Details
10 Feb 2012 03:04:22

It looks like around half od the BE lines went off around 02:43 and the rest at 02:58.

We were not aware of planned work this morning, though various work has been planned this week.

Update
10 Feb 2012 03:27:35

We have tried routing traffic via our old rack as well, and no joy.

Update
10 Feb 2012 03:30:16

Obviously customers using BE+BT lines as a fallback arrangement are fine, using BT.

Obviously customers using 3G backup are working.

If you only have a BE line you may want to consider some of the fallback arrangements we can offer. Though it is rare for it to be BE that is the carrier that has failed, it is sensible to consider contingencies.

That said, we would hope they have this fixed during the night.

Update
10 Feb 2012 03:39:19

There were unplanned outtages on 14 exchanges at approximately those times, we're still waiting to find out why this occured.

Update
10 Feb 2012 05:30:00

O2 have confirmed that changes made by them to the network overnight resulted in loss of service to large parts of the network, including all wholesale services.

Their engineers are currently rebuilding configurations on all aggregation nodes on their network to reverse the changes. We are beginning to see sessions establish again, about 20% have reconnected in the past few minutes.

Update
10 Feb 2012 06:30:00

O2 engineers have restored configuration on about 40% of core devices. We are seeing roughly that % of sessions back up.

Update
10 Feb 2012 08:03:43

It appears that O2 engineers have now finished restoring configurations, as we can see all ISAMs on our monitoring again and we are seeing the majority of customers back online.

Resolution

This was due to BE engineering work that failed. An initial report of the problem is available at http://aa.net.uk/news-2012-02-10-be.html

Started 10 Feb 2012 02:43:00
Closed 10 Feb 2012 06:27:00

09 Feb 2012 14:50:00
Details
09 Feb 2012 14:55:22

There was a routing problem affecting ADSL lines.

We initially thought this was just our office (hence the previous post)

Routing is now restored and we're investigating the cause.

Started 09 Feb 2012 14:30:00
Closed 09 Feb 2012 14:50:00

01 Feb 2012 02:11:00
Details
01 Feb 2012 02:43:36

There was, quite separate to any other issues, a problem at 02:06 which resulted in the main LNS resetting. All customers quickly reconnected with no problems.

We have detailed logs of the actual problem, but it is baffling us slightly at present. We will be investigating this later in the day. This does, however, appear to be completely unrelated to everything else that has been happening.

We are rebalancing some lines between LNSs over night after this incident.

Started 01 Feb 2012 02:06:07
Closed 01 Feb 2012 02:11:00
Previously expected 01 Feb 2012 02:11:00

31 Jan 2012 19:18:42
Details
31 Jan 2012 18:57:10

Investigating

Update
31 Jan 2012 19:12:55

Something major with our LNS - investigating

Update
31 Jan 2012 19:16:09

No clear on cause but looks like our fault not BT this time.

Update
31 Jan 2012 19:16:32

Graphs lost, sorry

Update
31 Jan 2012 19:16:59

Looks like people coming back on line now

Update
31 Jan 2012 19:18:42

Not good - lost graphs - and cause unclear...

Resolution

We will be investigating this in the morning. More details on my blog http://revk.www.me.uk/2012/02/ooops.html

Started 31 Jan 2012 18:56:00
Closed 31 Jan 2012 19:18:42

13 Jan 2012 11:37:13
Details
13 Jan 2012 11:07:52

We're looking in to a network problem affecting broadband customers - More details to follow...

Lines are coming back and we arte trying to find what happened.

Update
13 Jan 2012 11:22:05

We have engineers onsite in telehouse looking in to this.

However, lines and routing are recovering now.

Update
13 Jan 2012 11:22:07

Looks like some network issue in our core rack in telehouse - engineers on site investigating now. Lines mostly back up now.

Update
13 Jan 2012 11:25:35

We are still somewhat at a loss as to what happened, but it has meant all lines being dropped of the live LNS. We have switched the the backup anyway.

However, and possibly unrelated, we are seeing issues with BT 20CN lines coming up - i.e. they are not. This was happening before this more major issue, and means 20CN lines are still down. We are chasing With BT.

Update
13 Jan 2012 11:35:49

20CN lines now on - looks like BT did a config change early which resulted in problems. This would have been a non issue if there was not also some sort of major blip which we have yet to identify.

Resolution

Two problems it seems.

1. BT did a configuration change earlier than expected. No idea why, and normally this would have affected a couple of customers at most. We were looking in to this already when...

2. Somehow there was a loss of contact between our live LNS and core network. We can see this must have happened and we are checking the switch logs to see if we can find why. Engineers were on site in the new rack (not connected, do not the cause!) and are checking this out now.

The effect was all lines went off. We have changed the system to be less sensitive in such cases now and not drop all sessions due to a loss of connectivity to the core network.

However, as all lines went off, the BT config change meant none of the 20CN lines came back until we discovered that what they had done.

Almost all lines on-line, and all should now be able to connect as normal.

All of the switches are being replaced as part of the upgrade anyway, but we obviously want to understand the underlying issue here.

Update: This is looking increasingly like a switch issue - the switch is working now, and new switches are in the new rack ready to be deployed over the next week or two.

Started 13 Jan 2012 11:03:30
Closed 13 Jan 2012 11:37:13

12 Jan 2012 02:42:51
Details
12 Jan 2012 01:34:38

Seems a major problem affecting all broadband lines - service si restoring automatically, but taking several minutes to get people back on line.

Update
12 Jan 2012 01:39:17

Lines are coming back but much more slowly than we would expect

Update
12 Jan 2012 01:49:24

Both LNSs are seeing people connect constantly and then disconnect, which makes little sense. We are still investigating the issue now.

Update
12 Jan 2012 02:00:00

We have reset equipment and switched LNSs, which has affected lines that were not previously affected. We are trying to see a pattern here.

Update
12 Jan 2012 02:12:24

Logs confirm that at 2am. suddenly things started working a lot better and many more lines coming back. Still investigating what is going on here.

Update
12 Jan 2012 02:19:29

We think, whatever it was, has magically fixed itself. We are going to clear all sessions to one LNS now, so a blip again for some people.

Update
12 Jan 2012 02:33:27

We are not entirely sure of the cause of the original problem, but it is such a long time since we have had a major issue like this it appears that we currently have a problem with the speed with which our systems can recover.

It seems our RADIUS server is struggling and once overloaded it is too slow to handle the connections and so the connections timeout and re-try causing more load and more delay. The result is that while everyone is trying to connect, nobody can, and it basically took well over an hour for lines to connect.

More investigation to follow.

Update
12 Jan 2012 02:43:21

I am closing the mahor incident on this now - but we are investigating the cause and the slow reconnect.

Started 12 Jan 2012 01:08:53
Closed 12 Jan 2012 02:42:51

15 Nov 2011 15:21:16
Details
15 Nov 2011 14:18:51

A significant percentage of our broadband lines blipped briefly, but appear to be coming back.

We're investigating.

Update
15 Nov 2011 14:30:42

A number of lines have ended up on the wrong LNS - we're rebalancing sessions now.

Resolution

This appears to have been an over sensitive test on one of our systems that thought we had lost connectivity and shutdown sessions. The test is being changed and code changes to adjust timeouts to avoid this in future.

Started 15 Nov 2011 14:17:22
Closed 15 Nov 2011 15:21:16

21 Oct 2011 09:54:00
Details
21 Oct 2011 08:53:16

BT ADSL lines are down, Ethernet circuits are down too.

Update
21 Oct 2011 08:55:50

The seems to be a BT outage in Stepney Green where we (and others) interconnect in to BT.

Update
21 Oct 2011 08:59:23

Lines are coming back

BT have raised an Incident and are aware of a problem

Update
21 Oct 2011 09:01:25

Lines are still logging in.

If you're not up yet, please do wait a bit longer.

There should be no need to reboot ADSL routers, as they should reconnect. 

Update
21 Oct 2011 09:08:05

Lines are still reconnecting now - it will take a bit more time before all lines are back on.

Update
21 Oct 2011 09:09:53

This does not seem to have affected Ethernet Lines, just BT ADSL. (BE ADSL lines were not affected)

Update
21 Oct 2011 09:16:47

If customers are getting BT 'Service information' webpage coming up then please reboot your router.

-This will be caused by BT accepting your login but not passing it on to us. Rebooting will cause the router to relogin, and should get through to us.

Update
21 Oct 2011 09:32:27

It is worth noting that BT have had issues with Stepney all week. We have been relatively lucky up until now not to be affected, but we have been working with a number of ISPs connected to Stepney during the week who have suffered similar issues. We don't have any exact details from BT as to the cause of the problems as yet, but clearly they are working on them.

Update
21 Oct 2011 11:41:33

This incident is now closed. 

We've had confirmation from BT that this was affecting other ISPs too.

BT clearred the fault at 08:54.

By 09:05 we had about 50% of ADSL lines back online.

It was also reported by ThinkBroadband
http://www.thinkbroadband.com/news/4829-major-broadband-outage.html 

Started 21 Oct 2011 08:37:00
Closed 21 Oct 2011 09:54:00
Cause BT

21 Oct 2011 09:02:34
Details
21 Oct 2011 08:39:02
Lines: 20% ALL and 22% BT and 28% 20CN and 19% 21CN and 100% 20CN-BRAS-ERX11-READING3 and 100% 20CN-BRAS-ERX2-FARADAY and 100% 20CN-BRAS-ERX7-READING2 and 100% 20CN-BRAS-ERX8-BIRMINGHAM2 and 100% 20CN-BRAS-ESR1-MILTONKEYNES3 and 100% 20CN-BRAS-ESR1-SHEFFIELD3 and 100% 20CN-BRAS-ESR10-MILTONKEYNES4 and 100% 20CN-BRAS-ESR6-KINGSTON4 and 100% 20CN-BRAS-ESR6-READING4 and 100% 20CN-BRAS-ESR7-KINGSTON4 and 82% 21CN-BRAS-RED1-NT-B and 100% 21CN-BRAS-RED2-PE dropped at 2011-10-21 08:37:06
We have advised BT
This is likely to have affected multiple internet providers using BT
Update
21 Oct 2011 08:56:18

Updates to this incident have moved to: http://status.aa.nu/apost.cgi?incident=1271

Update
21 Oct 2011 09:08:59

Lines coming back - more info onthe other status message posted above.

Broadband Users Affected 18%
Started 21 Oct 2011 08:37:06 by AAISP automated checking
Closed 21 Oct 2011 09:02:34
Cause BT

20 Oct 2011 19:31:00
Details
20 Oct 2011 19:30:33

It looks like most/all BE lines are down. We're investigating...

Update
20 Oct 2011 19:35:06

BE lines are reconnecting now.

Update
20 Oct 2011 19:45:45

From the look of the logs, we lost all the L2TP tunnels to BE between 19:24 and 19:31.

We'll post again when we find out what the cuase was.

Update
20 Oct 2011 20:01:04

The outage has been confirmed to be cuased by a power outage.

Started 20 Oct 2011 19:25:00
Closed 20 Oct 2011 19:31:00

19 Oct 2011 09:33:33
Details
18 Oct 2011 11:33:36

BT have informed us that an:

Emergency PEW is planned to carry out urgent maintenance work at Stepney Green. From 00:01 AM -02:00AM there will be a possible short impact to service while traffic is being rerouted within BT core to wbmc shared service at Stepney Green.
From 02:00 AM to 05:00 AM The impact will be:
Total loss of service to wbmc shared end users who are connected to the Stepney Green 21cn Bras's during the Pew window of 02:00 to 04:00.
Total loss of service to IPSC/wbmc end users who are connected to the Ilford 20cn Bras's during the Pew window of 02:00 to 04:00.
The customers will experience will be that they will be unable to connect to their CP's will the planned work in being actioned.

Update
18 Oct 2011 12:16:49

We think this will be little impact for most customers, but may be a period off line for people on Stepney BRASs directly.

Update
19 Oct 2011 09:33:43

the planned works are now over.

Started 19 Oct 2011 by BT
Closed 19 Oct 2011 09:33:33
Previously expected 19 Oct 2011 06:00:00 (Last Estimated Resolution Time from BT)

12 Oct 2011 23:59:59
Details
12 Oct 2011 20:48:25

We see usage have occasional spikes, and they often have a reason, but this evening we have seen silly high usage since around 18:40, not only on 21CN but also 20CN. Something is clearly "up" and there is some "internet event" happening. No idea what yet, but has caught us unawares.

The system does the usual to manange the traffic - give more to the "premium" paying customers, and to generally give small packets for VoIP, DNS, and interactive traffic priority.

But we'll try and work out why and if necessary get more capacity in shortly.

P.S. it is not just us cuaght out by this - logs show some lines seeing loss and latency which is within BT's network (as we ensure the LCP echos get sent anyway).

Update
12 Oct 2011 20:53:51

This is worse than the world cup traffic!

Update
12 Oct 2011 20:55:08

Only clue is new Apple IOS5 stuff - if that is the cause I am impressed.

Update
12 Oct 2011 20:57:04

Usage has just reached unprecidented levels - we have not seen anothing like this...

Update
12 Oct 2011 20:59:52

Customers are even reporting issues on Be lines, what is this?

Update
12 Oct 2011 21:02:30

We have core links hitting 1Gb/s - something we have been planning for in 6 months time - this is really unprecidented usage levels.

Update
12 Oct 2011 21:34:07

One good thing from this is that it has provided the FireBrick team with a good benchmark of what happens when you push both the LNS and BGP routers to the full gigabit throughput. The answer is that it copes really well and is using CPU levels that are actaully very good. But what on earth is causing this unprecidented levels of usage is not entirely clear.

So far we thing there is a windows update and apple IOS5 release, but is that all it is?

Update
13 Oct 2011 08:45:44

We are guessing this was IOS5 release.

Update
14 Oct 2011 18:15:07

Whilst Thursday night was also somewhat excessively busy it was nothing like the previous nigh, and caused some congestion mostly on 21CN.

Closed 12 Oct 2011 23:59:59

18 Sep 2011 21:35:00
Details
19 Sep 2011 09:33:10

All our BE ADSL lines lost connection at 21:30 on Sunday night due to a power failure in our supplier's network. The outage was brief and most lines had reconnected by 21:35.

Started 18 Sep 2011 21:30:00
Closed 18 Sep 2011 21:35:00

09 Sep 2011 11:13:49
Details
09 Sep 2011 11:00:09

We are investigating a problem with short periods (seconds) of routing problems which will be affecting BT and BE broadband lines. Customers may be seeing no connectivity for short periods at a time. 

Update
09 Sep 2011 11:13:49

This has been now resolved. 

Started 09 Sep 2011 10:50:00
Closed 09 Sep 2011 11:13:49

23 Aug 2011 03:30:00
Details
22 Aug 2011 11:05:20

As the planned works notice is rather short, we are posting this as a major issue now. We expect all BT lines to go out of service over night from around 2am to 3am. This will not affect BE lines.

Resolution

20CN lines had a short outage (half hour) and 21CN lines longer (90 minutes)

Started 22 Aug 2011 11:00:00
Closed 23 Aug 2011 03:30:00

11 Jul 2011 10:00:00
Details
11 Jul 2011 09:57:56

We're currently investigating a routing problem, this will be affecting internet access for customers. More information to follow shortly.

Update
11 Jul 2011 09:59:50

Routing has been restored, we're currently investigating what the cause was.

Update
11 Jul 2011 10:36:34

The routing problem, which lasted a couple of minutes, looks to have been caused by a netowrk loop at one of our peering points. -This also affected other network providers.

Started 11 Jul 2011 09:50:00 by AAISP Staff
Closed 11 Jul 2011 10:00:00

04 Jul 2011 23:59:12
Details
05 Jul 2011 08:24:41

Sorry for the delay posting this, and thanks for all the MSO texts.

Last night a firewall config change was made on the BE side of our link to them. I am not sure we had a planned works notice. Anyway, the effect was to drop all sessions to us, and at least one other ISP. They reversed the change just under 20 minutes later and lines reconnected.

We are in discussions now as to what they were doing and why?

Started 04 Jul 2011 23:42:51
Closed 04 Jul 2011 23:59:12

08 Jun 2011 19:07:09
Details
08 Jun 2011 18:31:06

We're currently looking in to an outage that looks to have affected BT ADSL lines. More info to follow...

Update
08 Jun 2011 18:34:41

Lines are logging back in again now... it may take a while for them all to come up - but looks like there have been a couple of blips that took lines down - more details to follow...

Update
08 Jun 2011 18:48:54

This is one of the two main links we have to BT which is currently just not working - lines are coming up on the other LNS slowly.

Update
08 Jun 2011 18:53:01

Stats show that most lines went semsibly to the B server quite quickly.

The A server link is up and down like a yoyo which is not good for automated fall back, and manual intervention has ensured all lines are preferring the B server for now.

Update
08 Jun 2011 19:04:43

Note, whilst unaffected by this issue, we have switched *all* lines to the B server which includes the BE lines. Sorry that we are not really in a position to sensibly switch only the broken BT lines over. This would have beeb a momentary PPP restart on BE lines and loss of graphs for now.

Update
08 Jun 2011 19:07:59

Note line status on the control page is confused and will update over night. Some lines may PPP restart once on an hour over night.

Update
08 Jun 2011 19:10:40

Odd, and annoying, the A server BT link just stablised by itself.

This means no BT fault report will get anywhere and we will not get any sensinle explanation. Arrrg!

We'll ask anyway.

Update
08 Jun 2011 20:27:58

This has been reported to the BT, and they will be looking in to what they saw from their end during the outage. We'll update in the morning.

Started 08 Jun 2011 18:25:11 by AAISP Staff
Closed 08 Jun 2011 19:07:09
Cause BT

01 Jun 2011 23:30:00
Details
01 Jun 2011 23:22:20

Some major BT issues.

Looks like relatively short blips with people coming back on line now.

23:10:36 All 20CN birmingham BRASs!

23:15 most BT lines.

Update
01 Jun 2011 23:27:09

Looking at the traffic levels this looks to have been mostly 20CN lines impacted.

Update
01 Jun 2011 23:31:35

I am pleased to say that one of the first FireBrick FB2700 users making use of a 3G backup for his DSL line reports it worked seamlessly for him.

Closed 01 Jun 2011 23:30:00

19 May 2011 02:11:00
Details
19 May 2011 00:45:34

At 00:27:10 all traffic on both BT link stopped, and BGP timed out. This is after some sort of blip affectiong lots of lines at around 00:00. This resulted in all sessions dropping shortly after. The connection resumed at 00:29:08.

This looks almost exactly the same as a few nights ago, which turned out to be a planned work that had been sneaked in. This was the same second on both lines this time, so not exactly the same.

Lines reconnected over several minutes. BE lines were unaffected.

Update
19 May 2011 00:47:11

The rate at which lines are reconnecting seems unusually slow.

Update
19 May 2011 00:51:19

Looks like a lot of people got a "BT default accept" during the outage which means they spend a while on an unconnected link. That then has to time out before they connect again. This could explain the slow connections.

Update
19 May 2011 00:51:43

A few people ended up on the wrong LNS briefly as well.

Update
19 May 2011 00:55:49

Connections are stable now, but low on numbers. If you are off line, try restarting your connection.

I'll leave this MSO open until we have more details in the morning.

Update
19 May 2011 09:47:05

Lines came back in batches after the incident, and there were a number of BRAS specific issues during the night as well.

We'll post any more information we get from BT when we get it.

Started 19 May 2011 00:27:10
Closed 19 May 2011 02:11:00

17 May 2011 02:28:12
Details
17 May 2011 02:19:22

It appeas that both of our links to BT dropped, at the same time (02:06:51), including BGP with BT. All lines went off line. This lasted several minutes until 02:14:37 before lines stated coming back.

BE lines unaffected, of course.

Update
17 May 2011 02:29:02

lines all back. We'll chase up for an explantion in the morning. very odd.

I bet it is some sort of planned work they failed to bring to our attention.

I was just about to do the LNS switch over, but that will happen slightly later this morning now.

Update
17 May 2011 02:33:08

Logs confirm this was not a clean shutdown of BGP.

Update
17 May 2011 02:40:04

Hmm, timing different on each link. On the backup it was 02:06:44 to 02:14:21

Update
17 May 2011 09:41:46

It looks like this was caused by a BT PEW. We are chasing BT to confirm this though.

Update
17 May 2011 10:34:10

BT confirm this was due a PEW.

Broadband Users Affected 100%
Started 17 May 2011 02:06:51
Closed 17 May 2011 02:28:12
Cause BT

26 Apr 2011 22:00:00
Details
26 Apr 2011 21:48:42

Looks like most BT lines when off, possibly not all.

Lines coming back now. Taking a few minutes in some cases.

Some lines will need to be moved to the right LNS.

This means lines will now be on the new timeouts from now.

Update
26 Apr 2011 21:51:36

Status suggest that this was slightly more than half the BT lines. This is specific to BT and not BE lines.

Update
26 Apr 2011 22:00:18

Looking stable, but took a couple of attempts to steer sessions to the right LNS which is another oddity, and perhaps another clue in the mix.

 

Update
27 Apr 2011 08:59:02

Oddly, and almost certainly unrelated in any way, at 23:32 many of our BE lines dropped.

This does highlight the benefit of having a BE + BT line as they rarely break at the same time!

Resolution

We'll let you know if we get anything out of BT

In the mean time, most lines will be on new timeouts now.

Started 26 Apr 2011 21:40:20
Closed 26 Apr 2011 22:00:00

22 Apr 2011 22:00:00
Details
25 Apr 2011 21:53:22

Looks like pretty much all BT lines blipped. Investigating.

Update
25 Apr 2011 21:59:42

As expected lines came back quickly, but we need to find why BT blipped.

Reports of one Be line "wobbling" for a few seconds but not losing connection may be a side effect as Be Lines seem generally unaffected by this. Look BT specific.

Lines come back spread between the LNSs so some lines will have had a second brief blip while moved to the other LNS.

Started 25 Apr 2011 21:44:58
Closed 22 Apr 2011 22:00:00

23 Apr 2011 19:57:03
Details
23 Apr 2011 19:45:22

During some planned maintenance things went wrong and lines have cleared unexpectedly. They will reconnect over the next few seconds and minutes. Sorry for inconvenience.

Update
23 Apr 2011 19:46:28

A chunk of graphs will be lost, sorry.

Update
23 Apr 2011 19:49:40

A diagnostic command caused and issue and this is being investigated so we do not have it happen again!

Update
23 Apr 2011 19:57:49

Sorry about that - checking how that happened now so we can avoid it ever happening again. Thanks for your patience. Lines all back on and on the right LNS apart from the odd stragler.

Started 23 Apr 2011 19:43:33
Closed 23 Apr 2011 19:57:03

14 Apr 2011 18:22:55
Details
14 Apr 2011 13:12:14

Some ADSL lines dropped at 12:52 - it does look like a BT problem, but we've not yet discovered any patters in the lines affected.

We'll update this shortly.

Update: We are sorry but it does actually apeear that this was all of our BT lines.
We will look into why our monitoing gave an inccorect percentage to begin with. 

Update
14 Apr 2011 13:13:31

Most lines have logged back in again now. Lines are still logging in and we expect the remaining to log back in during the next few minutes.

Update
14 Apr 2011 13:19:10

Some lines (about half the number of the first drop) have just dropped again, we are seeing these reconnect though.

Update
14 Apr 2011 13:46:23

Some customers have logged into the wrong LNS, we are bouncing them back to the correct LNS now.

Update
14 Apr 2011 14:38:00

Title of this post has been changed to reflect the problem better

from: BT Blip affecting some Customers to: Blip Affecting BT Lines

Update
14 Apr 2011 14:39:47

All remaining lines were back online by 13:30

We're still looking on to what casued this, and will be reporting back.

Update
14 Apr 2011 16:44:41

Sorry for delay - we should have more details on this incident shortly.

Update
14 Apr 2011 16:45:59

Note that the outage duration may be much shorted than shown on the graph due to many lines switching to the backup LNS. The graphs for while on the backup LNS will not show (i.e. show as purple) even though on line.

Update
14 Apr 2011 18:32:23

Interestingly, having seen some BT issues before, this looks to be some sort of glitch in the LNS on the link to BT and not actually a BT fault. Whilst it is remotely possible this was caused by some external factor we think it is a random hardware glitch on this occasion.

The logs suggest this issue lasted 4 to 5 seconds at most but clearly it must have had a knock on effect that lasted the 10 seconds that are normally the timeout. This caused sessions to time out. The result was lines dropping and reconnecting to both main and backup LNS.

Action: From now on, all new sessions will have a 20 second timeout rather than 10 by default. This should help both the BT issues we have seen and anything like this happening.

Whilst lines started reconnecting immediately (within seconds) the time taken for some lines to connect we several minutes.

Action: We are working on multiple authentication servers not just for normal backup usage but for load sharing to ensure lines reconnect much faster in future. This should speed up recovery in the event of any issues like this.

While staff started to clear the sessions from the backup LNS in a controlled way an unexpected issue caused the backup LNS to fail. This affected a number of lines a second time and the reconnected to the main LNS.

Action: The exact cause is being investigated still and we hope to understand this better soon.

Further actions: We are very concerned at the apparent random issue on the BT link port, and we will be replacing the hardware some time on the next couple of weeks as part of routine maintenance.

Sorry for the inconvenience.

Broadband Users Affected 95%
Started 14 Apr 2011 12:52:10
Closed 14 Apr 2011 18:22:55

05 Apr 2011 03:43:54
Details
05 Apr 2011 03:04:40

We saw most links to BT fail at 02:30 and have since received initial notifications from BT of a major issue. We are trying to get more details.

This looks like it may be related to the previous issues in Slough, and many lines remain disconnected.

Update
05 Apr 2011 03:11:55

Following the BT incident our main LNS stopped responding and the working lines switched to the secondary LNS automatically. This is being investigated separately but the BT issue remains.

Update
05 Apr 2011 03:15:56

Looks like more connections around 3am means most customers are now back on line.

Update
05 Apr 2011 03:44:17

BT have confirmed the incidents closed. We'll try and get more details of what happened as ever.

Broadband Users Affected 10%
Started 05 Apr 2011 02:30:59
Closed 05 Apr 2011 03:43:54

05 Apr 2011 02:48:00
Details
05 Apr 2011 02:40:33

Following the BT outage at 02:30:59, the main LNS stopped responding at 02:33:34. This is likely to be a result of the problems with the link to BT. This should obviously not happen.

The system automatically switched lines to the secondary LNS as designed, which meant some customers took everal minutes to reconnect.

We are investigating the cause and may be sending an engineer.

Update
05 Apr 2011 08:51:37

Both LNSs up and running now.

Broadband Users Affected 90%
Started 05 Apr 2011 02:33:34
Closed 05 Apr 2011 02:48:00

31 Mar 2011 06:36:00
Details
30 Mar 2011 17:54:24

All lines on 100% 20CN-REGION-READING and 100% 21CN-REGION-SL dropped at 2011-03-30 17:35:17
We have advised BT
This is likely to have affected multiple internet providers using BT

Update
30 Mar 2011 17:59:58

Two of our ethernet services are down as well, we can only assume routed via the same place. Investigating.

Update
30 Mar 2011 18:13:18

It is very rare for ethernet and broadband services to be affected by a common fault and suggests something major in the openreach back-haul network which serves both services.

Update
30 Mar 2011 18:26:25

Latest is "looks like a major fibre cut near Slough"

Update
30 Mar 2011 19:39:47

No more news yet

Update
30 Mar 2011 20:26:06

engineers on site and that the next update will be at 21:30, apparently

Update
30 Mar 2011 21:59:45

Fault still ongoing, and no confirmed reason for outage yet. Engineers are still on site and trying to find out exactly what the problem is.

Update
30 Mar 2011 23:32:56

The outage has been identifed by BT as an issue with a 21CN core router in Slough. Engineers are trying to restore connections. Earlier information appears to be incorrect - it is not a cable failure.

Update
31 Mar 2011 02:36:18

Further outages for a number of lines on Guildfor BRASs at 01:04

Update
31 Mar 2011 02:58:48

Latest is "There are technical and manageent bridge calls going on", "engneers are on-site at Slugh exchange", and "they had called out CISCO"...

Update
31 Mar 2011 03:00:51

Some broadband lines are coming back on now.

Update
31 Mar 2011 03:03:30

And off they go again... Well, something is clearly happening...

Update
31 Mar 2011 03:09:18

Not wishing to get your hopes up, but some lines tricking back on again...

Update
31 Mar 2011 03:09:49

And off they go... Arrrrg

Update
31 Mar 2011 03:21:27

Many lines back , and seem to be more stable now.

Update
31 Mar 2011 03:29:56

Broadband looks pretty good now. Still two Ethernet circuits down.

Update
31 Mar 2011 03:31:40

Looking in more detail, still quite a few of the BRASes are not back yet, but those that are seem stable.

Update
31 Mar 2011 03:46:06

More lines coming back gradually.

Update
31 Mar 2011 06:30:40

Ethernet back briefly at 6am and off again (this is the 2 affected lines, all others have been fine).

Update
31 Mar 2011 06:32:57

Whle broadband lines have ben gradually coming on, some are still down. It could be that these are lines that need a router reboot, but we have not had confirmation from BT yet as to whether they think the fault is cleared or not. We suspect not quite.

Update
31 Mar 2011 06:35:16

Some broadband lines ent off and back on at 6am, and just now (06:33). BT have in fact now advised that they think the broadband side was fixed at 03:50.

Update
31 Mar 2011 06:36:00

Affected Ethernet lines back on now (06:35)

Update
31 Mar 2011 06:37:56

At this stage, we suggest that if you broadband is still down, try power cycling the router first, and if no luck contact support at 9am.

Update
31 Mar 2011 18:26:32

Closing the issue now as no longer service affecting since this morning. We'll post more details of what actually happened when we get them.

Broadband Users Affected 11%
Started 30 Mar 2011 17:35:17 by AAISP automated checking
Closed 31 Mar 2011 06:36:00
Cause BT