Recent posts
Timeline view of events on our network and systems

Events from the AAISP network from the last few months on a scrollable timeline. Mouseover for brief details, click incident to view the full post.

MAINTENANCE Planned TalkTalk
AFFECTING
TalkTalk
STARTING
Apr 09, 03:00 AM (11½ days )
DESCRIPTION

We have multiple interlinks to TalkTalk that carry our broadband traffic. TalkTalk have scheduled planned work on our links in our Equinix LD8 datacentre for 11th April between 1AM and 6AM.

So as to minimise the impact on our customers, we will move traffic off these links on 9th April at 3AM. This should be seamless, but there is a risk of some customers having a brief interruption to their service.


MAINTENANCE Assumed Completed Easter
AFFECTING
Easter
STARTED
Mar 28, 10:00 AM (1¾ hours ago)
DESCRIPTION
We are closed on both Bank Holiday Friday and Bank holiday Monday. We're open 10AM-2PM on Saturday, as usual, for technical support.

MAINTENANCE Completed CityFibre
AFFECTING
CityFibre
STARTED
Mar 27, 12:01 AM (1¼ days ago)
CLOSED
Mar 27, 09:05 AM (1 day ago)
DESCRIPTION

CityFibre are carrying out work that will affect CityFibre connections in Maidenhead, Luton, Leicester, Kettering, Gloucester, Coventry, Glasgow, Bournemouth, Milton Keynes, Newcastle Upon Tyne, Northampton, Norwich, Peterborough, Plymouth, Poole, Reading, Rugby, Solihull, Swindon, Wakefield and Wolverhampton.

Customers may experience a momentary loss of service ranging from a couple of seconds up to a maximum of 30 seconds several times during the maintenance window.


Resolution: We assume this work was carried out, we didn't see any customers affected by this.

MAJOR Closed BT Circuits
AFFECTED
BT Circuits
STARTED
Mar 26, 01:13 AM (2¼ days ago)
CLOSED
Mar 26, 03:30 AM (2¼ days ago)
DESCRIPTION
We're investigating the cause of major stability issues. Update to follow ASAP.
Resolution:

The cause of this disruption was BT planned work to carry out 'invasive testing' on our links. They have confirmed that the work has been completed.

They failed to inform us of this. We already have a formal complaint regarding previous lack of notifications, and BT have since been sending us notification of works (eg the one for 27th March) manually to us. This is being followed up with our account manager.

We do apologise to our customers who were affected by this.

We're furious.

We have had further information from BT about their work. The work was on a transmission link between two datacentres, and as part of that all ports on devices that use the link also have their ports disabled and enabled. As a result we saw one port on each pair of our host links go down and up around 15 times each - at the same time. As this was not cleanly shutdown by BT it caused traffic to break and customers to drop and reconnect multiple times between midnight and 3:30AM.


MAINTENANCE Completed BT
AFFECTING
BT
STARTED
Mar 25, 03:00 AM (3¼ days ago)
CLOSED
Mar 27, 09:03 AM (1 day ago)
DESCRIPTION

We have multiple interlinks to BT that carry our broadband traffic. BT have scheduled planned work on our links in our Harbour Exchange Square datacentre for 27th March between midnight and 6AM.

So as to minimise the impact on our customers, we will move traffic off these links on 25th March at 3AM. This should be seamless, but last time we attempted this BT had a misconfiguration which caused some customers to drop their connection!


Resolution: Unfortunately BT's migration didn't go to plan and they rolled back their change. No customer circuits were affected. The work will be rescheduled for a later date.

MAINTENANCE Assumed Completed LNS and Routers
AFFECTING
LNS and Routers
STARTED
Mar 23, 03:00 AM (5¼ days ago)
DESCRIPTION

We will be performing software upgrades on our FB9000 LNSs during the early hours of Saturday 23rd, Sunday 24th and Monday 25th this week. This will cause customer lines to drop and reconnect a couple of times between the hours of 3AM and 4:30AM.

Customer who will be affected by this are those with line speeds of 80Mb/s and above.

The software upgrade being applied does have a plausible fix for the CPU hang that we have been seeing. However, if we we see any further CPU hangs we will revert back to the seemingly stable version of the software.


MAINTENANCE Completed TalkTalk
AFFECTING
TalkTalk
STARTED
Mar 20, 03:00 AM (8¼ days ago)
CLOSED
Mar 22, 07:58 AM (6 days ago)
DESCRIPTION

We have multiple interlinks to TalkTalk that carry our broadband traffic. TalkTalk have scheduled planned work on our links in our Telehouse datacentre for 21th March between 1AM and 6AM.

So as to minimise the impact on our customers, we will move traffic off these links on 20th March at 3AM. This should be seamless, but there is a risk of some customers having a brief interruption to their service.


Resolution: This was completed without any problems.

MINOR Closed VoIP
AFFECTED
VoIP
STARTED
Mar 19, 03:20 PM (8¾ days ago)
CLOSED
Mar 19, 03:33 PM (8¾ days ago)
DESCRIPTION
We're investigating reports if some incoming calls not arriving/
Resolution: A change to the database on one of our backends caused phone registration information to fail which caused some incoming calls to fail. The change was reverted at 15:32 and calls are now working. This was a change that has been running in test systems successfully for weeks, but further investigations will be carried out.

MINOR Closed VoIP
AFFECTED
VoIP
STARTED
Mar 15, 11:10 AM (13 days ago)
CLOSED
Mar 15, 11:23 AM (13 days ago)
DESCRIPTION
We're investigating reports of call problems with one of our voice servers.
Resolution:

MINOR Closed LNS
AFFECTED
LNS
STARTED
Mar 15, 07:00 AM (13 days ago)
CLOSED
Mar 15, 07:40 AM (13 days ago)
DESCRIPTION
A 7:30AM, the X.Witless restarted causing customers on it to drop and reconnect.
Resolution: This was related to https://aastatus.net/42608 This LNS is now out of service and will be analysed by our developers.

MAINTENANCE Completed LNS
AFFECTING
LNS
STARTED
Mar 15, 04:00 AM (13¼ days ago)
CLOSED
Mar 15, 04:00 AM (13¼ days ago)
DESCRIPTION
We will be moving lines off the Z.Witless LNS at 4AM. They will reconnect to a different LNS.
Resolution: This work was completed.

MAINTENANCE Completed L2TP
AFFECTING
L2TP
STARTED
Mar 14, 03:00 AM (14¼ days ago)
CLOSED
Mar 14, 03:10 AM (14¼ days ago)
DESCRIPTION
We will be performing software upgrades on our L2TP routers - l2tp.aa.net.uk. These will be scheduled for between 3AM and 4:30AM on Thursday this week. L2TP customers will see their connection drop and reconnect twice during this period.
Resolution: This was completed at 03:10

MAINTENANCE Completed LNS
AFFECTING
LNS
STARTED
Mar 13, 03:00 AM (15¼ days ago)
CLOSED
Mar 13, 04:30 AM (15¼ days ago)
DESCRIPTION
We will be performing overnight upgrades of V.Gormless and W.Gormless on 13th and 14th March between 3AM and 4:30AM. Customers on these will see their connection drop and reconnect a few seconds later.
Resolution: This has been completed.

MAINTENANCE Completed Router Upgrades
AFFECTING
Router Upgrades
STARTED
Mar 12, 03:00 AM (16¼ days ago)
CLOSED
Mar 21, 05:00 AM (7¼ days ago)
DESCRIPTION
We will be performing software upgrades on our BGP routers. These will be scheduled for between 3AM and 4:30AM Tuesday-SAturday this week. This is to bring all our routers up to the same level with software that introduces a few minor feature updates. This work is not expected to impact customers.
Resolution: This has been completed.

MINOR Closed LNS
AFFECTED
LNS
STARTED
Mar 09, 11:35 AM (19 days ago)
CLOSED
Mar 09, 11:30 AM (19 days ago)
DESCRIPTION
Customers on the X.Witless LNS dropped and reconnected at 11:35 today.
Resolution: This is related to the ongoing LNS hangs we've been seeing: https://aastatus.net/42608. We do apologise to customers affected by this. This incident does help towards diagnosing and investigating the root cause.

MINOR Closed SMS
AFFECTED
SMS
STARTED
Mar 07, 10:00 PM (20½ days ago)
CLOSED
Mar 08, 09:38 AM (20 days ago)
DESCRIPTION
SMSC nodes' firewall upgrade to improve QoS and High Availability. This activity is performed together with the vendor Engineers. Expected Impact: 4 hours.
Resolution:

MINOR Closed Data SIMs
AFFECTED
Data SIMs
STARTED
Mar 04, 10:45 PM (23½ days ago)
CLOSED
Mar 04, 10:53 PM (23½ days ago)
DESCRIPTION
We saw Data SIMs drop and reconnect at around 22:45 this evening. This looks to be caused by something upstream of us at the carrier.
Resolution: Services are back online. This was planned work by the upstream carrier.

MAINTENANCE Completed LNS
AFFECTING
LNS
STARTED
Mar 01, 03:00 AM (27¼ days ago)
CLOSED
Mar 01, 04:44 AM (27¼ days ago)
DESCRIPTION

We have work planned for the early hours of Friday morning that entails upgrading software on our LNSs and moving CityFibre and higher-speed BT/TalkTalk services on to separate pools of routers (LNSs) at our side.

In practice this will mean that most customers with speeds of 80Mb/s and above will experience a few PPP drops and reconnects between 3AM and 5AM as we carry out the work.

This is related to the hardware hangs we've been experiencing: https://aastatus.net/42608 and it will help us further investigate this ongoing issue.


Resolution: This work has been completed.

MINOR Closed BT DSL
AFFECTED
BT DSL
STARTED
Mar 01, 01:00 AM (27¼ days ago)
CLOSED
Mar 01, 02:15 AM (27¼ days ago)
DESCRIPTION
BT carried out planned work that affected one of our 4 links to them at 1AM and 2AM. This caused lines to drop and reconnect. BT failed to inform us of this work (again). Had BT informed us then we would have cleanly moved traffic of the affected link.
Resolution: A formal complaint has been raised with BT. (Drops between 3AM and 5AM were A&A planned work)

MINOR Closed VoIP and SIMs
AFFECTED
VoIP and SIMs
STARTED
Feb 29, 02:17 PM (27¾ days ago)
CLOSED
Feb 29, 06:25 PM (27½ days ago)
DESCRIPTION
Our SIP2SIM carrier are investigating a problem with VOICE and SMS, this is affecting the O2 Profile only We'll update this post as soon as we have further information
Resolution: Our service provider has advised that the issue was resolved at 18:25.

MINOR Closed Hetzner
AFFECTED
Hetzner
STARTED
Feb 13, 07:00 PM (1½ months ago)
CLOSED
Mar 23, 04:03 PM (4¾ days ago)
DESCRIPTION
Hetzner is a German based server hosting provider. We have seen intermittent problems in routing traffic to them in recent days.
Resolution:

MAINTENANCE Assumed Completed Broadband
AFFECTING
Broadband
STARTED
Jan 19, 03:50 PM (2¼ months ago)
DESCRIPTION

This is a summary and update regarding the problems we've been having with our network, causing line drops for some customers, interrupting their Internet connections for a few minutes at a time. It carries on from the earlier, now out of date, post: https://aastatus.net/42577

We are not only an Internet Service Provider.

We also design and build our own routers under the FireBrick brand. This equipment is what we predominantly use in our own network to provide Internet services to customers. These routers are installed between our wholesale carriers (e.g. BT, CityFibre and TalkTalk) and the A&A core IP network. The type of router is called an "LNS", which stands for L2TP Network Server.

FireBricks are also deployed elsewhere in the core; providing our L2TP and Ethernet services, as well as facing the rest of the Internet as BGP routers to multiple Transit feeds, Internet Exchanges and CDNs.

Throughout the entire existence of A&A as an ISP, we have been running various models of FireBrick in our network.

Our newest model is the FB9000. We have been running a mix of prototype, pre-production and production variants of the FB9000 within our network since early 2022.

As can sometimes happen with a new product, at a certain point we started to experience some strange behaviour; essentially the hardware would lock-up and "watchdog" (and reboot) unpredictably.

Compared to a software 'crash' a hardware lock-up is very hard to diagnose, as little information is obtainable when this happens. If the FireBrick software ever crashes, a 'core dump' is posted with specific information about where the software problem happened. This makes it a lot easier to find and fix.

After intensive work by our developers, the cause was identified as (unexpectedly) something to do with the NVMe socket on the motherboard. At design time, we had included an NVME socket connected to the PCIE pins on the CPU, for undecided possible future uses. We did not populate the NVMe socket, though. The hanging issue completely cleared up once an NVMe was installed even though it was not used for anything at all.

As a second approach, the software was then modified to force the PCIe to be switched off such that we would not need to install NVMes in all the units.

This certainly did solve the problem in our test rig (which is multiple FB9000s, PCs to generate traffic, switches etc). For several weeks FireBricks which had formerly been hanging often in "artificially worsened" test conditions, literally stopped hanging altogether, becoming extremely stable.

So, we thought the problem was resolved. And, indeed, in our test rig we still have not seen a hang. Not even once, across multiple FB9000s.

However...

We did then start seeing hangs in our Live prototype units in production (causing dropouts to our broadband customers).

At the same time, the FB9000s we have elsewhere in our network, not running as LNS routers, are stable.

We are still working on pinpointing the cause of this, which we think is highly likely to be related to the original (now, solved) problem.

Further work...

Over the next 1-2 weeks we will be installing several extra FB9000 LNS routers. We are installing these with additional low-level monitoring capabilities in the form of JTAG connections from the main PCB so that in the event of a hardware lock-up we can directly gather more information.

The enlarged pool of LNSs will also reduce the number of customers affected if there is a lock-up of one LNS.

We obviously do apologise for the blips customers have been seeing. We do take this very seriously, and are not happy when customers are inconvenienced.

We can imagine some customers might also be wondering why we bother to make our own routers, and not just do what almost all other ISPs do, and simply buy them from a major manufacturer. This is a fair question. At times like this, it is a question we ask ourselves!

Ultimately, we do still firmly believe the benefits of having the FireBrick technology under our complete control outweigh the disadvantages. CQM graphs are still almost unique to us, and these would simply not be possible without FireBrick. There have also been numerous individual cases where our direct control over the firmware has enabled us to implement individual improvements and changes that have benefitted one or many customers.

Many times over the years we have been able to diagnose problems with our carrier partners, which they themselves could not see or investigate. This level of monitoring is facilitated by having FireBricks.

But in order to have finished FireBricks, we have to develop them. And development involves testing, and testing can sometimes reveal problems, which then affect customers.

We do not feel we were irrationally premature in introducing prototype FireBricks into our network, having had them under test not routing live customer traffic for an appropriate period beforehand.

But some problems can only reveal themselves once a "real world" level and nature of traffic is being passed. This is unavoidable, and whilst we do try hard to minimise disruption, we still feel the long term benefits of having FireBricks more-than offset the short term problems in late stage of development. We hope our detailed view on this is informative, and even persuasive.