Our Network:
Major Outages


Minor Outages


Happening Now


Future Planned


Open Events:
Jun 05, 03:22 PM (7¾ days ago)
VoIP and SIMs
Jun 05, 03:20 PM (7¾ days ago)
42688 / AA42688
    7¾ days ago by Adrian

    We are again seeing an issue where calls from ONSIM SIMs are not immediately connecting, and may seem like several call attempts to the recipient. We are chasing ONSIM on this.

    2¾ days ago by Adrian

    Sorry fro the delay on this - we do not seem to have had any complaints from customers on this for some time, but we have not had a final confirmation that it is fully resolved from ONSIM yet, so keeping this issue open for now.

Jun 03, 03:13 PM (9¾ days ago)
authoritative DNS
Jun 03, 10:35 AM (10 days ago)
42684 / AA42684
    10 days ago by Andrew

    This is only relevant to customers who run their own authoritative DNS servers and use our secondary-dns.co.uk as an additional nameserver.

    Overview: We run a "secondary" DNS service for customers where they run the master DNS server and we are secondary slaves. We have a project underway that involves migrating all our authoritative DNS services to a new platform. As part of this we are needing to disable some of the automation we do for adding and updating the customer's master IP address automatically.

    The change: From June 17th, If you run your own master DNS server for your domain(s) and secondary-dns.co.uk is a slave, if you change the IP address of your master you will need to contact support@aa.net.uk to request us to update our side.

    We have more information about our Authoritative DNS project on our Support Site: https://support.aa.net.uk/New_Authoritive_DNS


    Due in 3¾ days

Apr 08, 02:16 PM (2 months ago)
Apr 08, 02:13 PM (2 months ago)
42654 / AA42654
    2 months ago by Adrian

    This work has started, but we did not do a planned work as expected it to be seamless. Sadly that was not quite the case today, so this is more detail on what we are planning over the next few weeks. The main thing is, any problems, please tell us right away.

    • Some cosmetic improvements (nicer format phone numbers) in emailed or tooted SMS (done)
    • Additional options (such as forcing the email/toots to be E.123 + format numbers) (done)
    • Additional options for posting JSON to http/https (TODO)
    • Allowing SMS to be relayed (chargeable) to other numbers (done)
    • We already allow multiple targets for a number for SMS (done)
    • Some improvements for 8 bit SMS, which are rare, as we previously treated as latin1, which is not correct (TODO)
    • Some new features for trialling a new SIP2SIM platform (TODO)
    • Improve "visible" format for content in email/toot when special characters are used (e.g. NULL as ␀) (TODO)
    The 8 bit data format changes are likely to be the least "backwards compatible" changes, but should not impact anyone as they are not generally encountered. I.e. incoming SMS will rarely (if ever) be 8 bit coded, and when they were, we would get special characters wrong. Similarly, sending 8 bit SMS would only show the expected characters on some older phones, and would be wrong on many others as the specification does not say the character set to use. We will, however, handle NULLs much better, which are relevant for some special use cases.

Apr 29, 02:13 PM ( 1¼ months ago)
Recently Closed Events:
Jan 19, 03:55 PM (4¾ months ago)
Jan 19, 03:50 PM (4¾ months ago)
Jun 12, 04:00 PM (20¼ hours ago)
42608 / AA42608
    4¾ months ago by Andrew

    This is a summary and update regarding the problems we've been having with our network, causing line drops for some customers, interrupting their Internet connections for a few minutes at a time. It carries on from the earlier, now out of date, post: https://aastatus.net/42577

    We are not only an Internet Service Provider.

    We also design and build our own routers under the FireBrick brand. This equipment is what we predominantly use in our own network to provide Internet services to customers. These routers are installed between our wholesale carriers (e.g. BT, CityFibre and TalkTalk) and the A&A core IP network. The type of router is called an "LNS", which stands for L2TP Network Server.

    FireBricks are also deployed elsewhere in the core; providing our L2TP and Ethernet services, as well as facing the rest of the Internet as BGP routers to multiple Transit feeds, Internet Exchanges and CDNs.

    Throughout the entire existence of A&A as an ISP, we have been running various models of FireBrick in our network.

    Our newest model is the FB9000. We have been running a mix of prototype, pre-production and production variants of the FB9000 within our network since early 2022.

    As can sometimes happen with a new product, at a certain point we started to experience some strange behaviour; essentially the hardware would lock-up and "watchdog" (and reboot) unpredictably.

    Compared to a software 'crash' a hardware lock-up is very hard to diagnose, as little information is obtainable when this happens. If the FireBrick software ever crashes, a 'core dump' is posted with specific information about where the software problem happened. This makes it a lot easier to find and fix.

    After intensive work by our developers, the cause was identified as (unexpectedly) something to do with the NVMe socket on the motherboard. At design time, we had included an NVME socket connected to the PCIE pins on the CPU, for undecided possible future uses. We did not populate the NVMe socket, though. The hanging issue completely cleared up once an NVMe was installed even though it was not used for anything at all.

    As a second approach, the software was then modified to force the PCIe to be switched off such that we would not need to install NVMes in all the units.

    This certainly did solve the problem in our test rig (which is multiple FB9000s, PCs to generate traffic, switches etc). For several weeks FireBricks which had formerly been hanging often in "artificially worsened" test conditions, literally stopped hanging altogether, becoming extremely stable.

    So, we thought the problem was resolved. And, indeed, in our test rig we still have not seen a hang. Not even once, across multiple FB9000s.


    We did then start seeing hangs in our Live prototype units in production (causing dropouts to our broadband customers).

    At the same time, the FB9000s we have elsewhere in our network, not running as LNS routers, are stable.

    We are still working on pinpointing the cause of this, which we think is highly likely to be related to the original (now, solved) problem.

    Further work...

    Over the next 1-2 weeks we will be installing several extra FB9000 LNS routers. We are installing these with additional low-level monitoring capabilities in the form of JTAG connections from the main PCB so that in the event of a hardware lock-up we can directly gather more information.

    The enlarged pool of LNSs will also reduce the number of customers affected if there is a lock-up of one LNS.

    We obviously do apologise for the blips customers have been seeing. We do take this very seriously, and are not happy when customers are inconvenienced.

    We can imagine some customers might also be wondering why we bother to make our own routers, and not just do what almost all other ISPs do, and simply buy them from a major manufacturer. This is a fair question. At times like this, it is a question we ask ourselves!

    Ultimately, we do still firmly believe the benefits of having the FireBrick technology under our complete control outweigh the disadvantages. CQM graphs are still almost unique to us, and these would simply not be possible without FireBrick. There have also been numerous individual cases where our direct control over the firmware has enabled us to implement individual improvements and changes that have benefitted one or many customers.

    Many times over the years we have been able to diagnose problems with our carrier partners, which they themselves could not see or investigate. This level of monitoring is facilitated by having FireBricks.

    But in order to have finished FireBricks, we have to develop them. And development involves testing, and testing can sometimes reveal problems, which then affect customers.

    We do not feel we were irrationally premature in introducing prototype FireBricks into our network, having had them under test not routing live customer traffic for an appropriate period beforehand.

    But some problems can only reveal themselves once a "real world" level and nature of traffic is being passed. This is unavoidable, and whilst we do try hard to minimise disruption, we still feel the long term benefits of having FireBricks more-than offset the short term problems in late stage of development. We hope our detailed view on this is informative, and even persuasive.

    4¼ months ago by Andrew

    5th Feb: Both Z and Y have hung in recent days (Saturday 3rd and Monday 5th) - we are currently analysing the data from the various cache and memory systems that we were able to retrieve from the hardware whilst it was in its hung state.

    4 months ago by Andrew

    Latest Summary, as of 9th February: We now have a larger pool of FB9000 LNSs. Six out of seven of them have been fitted with NMVe drives and JTAG debugging capabilities. If/when they have a hardware lock-up we'll be able to gain a bit more of an insight in to the cause. The seventh LNS has not, but it has been stable with an uptime of 86 days.

    3¼ months ago by Andrew

    Work being carried out:

    2¾ months ago by Andrew

    Just a small update to say that the FireBrick development team are still working on this as their main priority. A lot of work has been done and a lot of factors have been ruled out. Progress is slow as we've been unable to reproduce the hang in the test lab thus far . The scope of the cause of the hang is now much smaller and work continues.

    A priority for us over the past few weeks is to minimise impact this has on you, our customers. To achieve this, over the past few weeks most of the customer-facing LNSs have been running 'factory-release' software, which we've not seen hang and we consider to be 'stable'. Our efforts have then been focused on our test lab and our test LNSs and trying to reproduce the hang there.

    2½ months ago by Andrew

    In contrary to the above post where we explained that we have been working on trying to reproduce the hang in our test lab and on our test LNSs, we are now in a position to run newer software on our live LNSs. This is being rolled at as part of https://aastatus.net/42647

    2 months ago by Andrew

    General update of the current situation

    We know that our recent reliability for some customers has been unacceptable. We wanted to set out a bit more of the story, mainly for transparency rather than because we expect it to be "mitigation" in most people's minds.

    This is where our two roles; that of both an ISP with broadband customers, and also that of a hardware manufacturer meet each other head-on and, unfortunately and uncomfortably, collide.

    To be abundantly clear, we are very sorry for the outages some customers have suffered. This falls below the standards we set ourselves. We are not happy about it, and a lot of effort is going into sorting it.

    The story since

    Several plausible causes have been found, fixed and tested in our testing process (before deploying live). Many of these will have fixed genuine problems, but not solved what appears to be the "main" issue.

    Almost all of these have been at the meeting point between hardware and software. The problem with a hardware hang is that far less diagnostic information is available to assist with debugging.

    On several go-arounds now, we have genuinely believed that the issue had been found and fixed, tested in our test-rig offline, and therefore we were keen to place the firmware in active use; the thought being that the sooner it was rolled out, the sooner the unreliability would disappear.

    But then, some time after being put live, an FB9000 would suffer another hang. The nature of the hang has been unpredictable (i.e. when it would happen); sometimes taking days or weeks to surface. Meanwhile, until it did hang, we still believed the problem had been solved.

    "Why not Cisco?"

    Some customers have quite reasonably asked why we do not employ (even temporarily) a 3rd party hardware vendor as our LNS supplier, such as Cisco. This is an option, but the costs of implementation (in time and money) we still feel would be better spent on active R&D to resolve this problem.

    We do still believe strongly that the FB9000, when stable, offers us features that distinguish our service from the service of almost all others. Simply, we want bonding, CQM graphs, low power consumption, etc.

    It is part of what makes our ISP offering different and better; our USP.

    Other issues

    Within this same time frame, we have had multiple instances of BT Wholesale doing planned work which they had not told us about in advance (and apparently not told other ISPs, too). We could have zeroed the impact their planned work, had they told us they were doing it beforehand.

    Multiple times we have raised this with our account manager and at higher levels, and we still have not had a satisfactory response. Of course, no wholesale network is 100% reliable; we are not unreasonable about this, but the combined appearance, especially to customers not following matters closely, is that it's "another LNS blip". Unlucky timing, which would be bad any time, but happens to be far worse just now.

    A change of plan

    Historically, our October "Factory" firmware from has been stable. The hangs we have seen have all occurred in releases prior to that one, or since that one. That release did have at least one major fix in it, addressing a hardware hang (the PCI/NVMe issue).

    Our immediate decision is to therefore put all "live" production FB9000 hardware back onto the October "Factory" release, except for our test LNS. To this end, we have already rolled back almost all live LNSs.

    Assistance requested if you're willing

    We invite and encourage customers who do want to assist with the process of fixing this to prepend "test-" onto their login, which will steer them to the test LNS, and help the effort to fix the problem. Of course this may be less stable than our regular LNS. Email support for more details.

    Rounding up

    Hopefully this post shows we are listening, that there is a vast amount of work going on, and that we've taken a different approach, recognising that this state of affairs has remained too long and cannot be carried on.

    We recognise that this level of openness is uncommon, but the situation we are in is uncommon; We doubt any other ISP develops its own core equipment.

    We politely request that this post is taken for what it is; a genuine offer to : explain in more depth, announce a change of direction, and apologise for the outages

    Nothing we do happens by accident or because of a lack of thought, or a lack of awareness, or a cavalier approach to customer well-being. Decisions sometimes do prove to be wrong, but decisions *are* made, and made with the best of intentions.

    There are human beings writing the code. There are human beings in our Ops and Support teams. And there are human beings managing the business.

    Nobody takes this in any other way than "extremely seriously".

    Thanks for taking the time to read this, and we are happy to answer any questions, of course.

    1¾ months ago by Andrew

    We have not had any hangs since we've been running Factory-Release software on our FB9000 LNSs.

    1¼ months ago by Andrew

    Our fleet of FireBrick FB9000s remain stable.

    29 days ago by Andrew

    Our LNSs remain stable running the 'Factory' software. Work continues in our test lab and on non-customer affecting parts of our network to track down the problem with the alpha software.

    For the past month or so our service has no incident of LNS hangs, most of our FB9000 LNSs have an uptime approaching 100 days. We are confident that the Factory software is stable.

    20¼ hours ago by Andrew

    We are still running the 'Factory' release software on our production LNSs and we consider them stable.

    Work is still being done away from our LNSs regarding the cause of the hangs, but due to the nature of the problem it is a time consuming process.

    Moving forward: Over the coming months we are planning to migrate our FB6000 LNS pool to FB9000 (running the stable, factory software). Most of our non-LNS routers (eg those used for BGP, L2TP and Ethernet services) have already been migrated over to FB9000 hardware and have been running, in some cases, for nearly 2 years.

    We will create new Status Posts regarding the work to migrate our FB6000s to FB9000.

  • Closed
Broadband blip graph

The graph shows the last few hours of logins and logouts of ADSL, VDSL, SIMs and L2TP circuits.

The current time is on the left. Green is login, red is logout.

If there are spikes, then this shows a large number of logouts, which may indicate an outage or planned work happening.

You can click on a spike to search for incidents or maintenance that were open around that time.

About our status page

This is the status page of Andrews & Arnold Ltd.

Our status page shows outages (problems) and maintenance (planned work) that happen on our own network and systems and also that of our suppliers networks and systems. We try and ensure this site is updated as soon as possible with incidents as they happen. Live discussion of issues is usually available on IRC.

The last update was Yesterday 16:13:29

Contacting us
Our support number is 033 33 400 999, or you can email support@aa.net.uk or text 01344 400 999 to raise a support ticket.

Spotted a Major Service Outage? (MSO)
A Major Service Outage disrupts the service of multiple customers simultaneously. If you believe that a problem affects multiple customers, and is not mentioned here already, text the number above. Begin the text with "MSO". This alerts multiple staff immediately, waking them if necessary. False alarms (i.e. raising MSO for a single line being down) may result in your number being prevented from raising MSO alerts in future. More info.

Regular Maintenance
Thursday evenings, from 10pm, are designated as a general maintenance window where we will perform non-service affecting updates.