Intermittent DoH/DoT problem - fixed
MINOR Closed DNS, Email and Web Hosting
STATUS
Closed
CREATED
Apr 04, 11:02 AM (26½ days ago)
AFFECTED
DNS, Email and Web Hosting
STARTED
Apr 03, 10:54 AM (27½ days ago)
CLOSED
Apr 04, 10:54 AM (26½ days ago)
REFERENCE
42650 / AA42650
MASTODON
INFORMATION
  • INITIAL
    27½ days ago by James

    Our DoH/DoT resolvers ( https://support.aa.net.uk/DoH_and_DoT ) were intermittently failing DNS lookups. It seemed to start over the Easter weekend. Our DoT/DoH front ends are DNS aware proxies (dnsdist) to back ends running unbound. dnsdist uses TLS to speak DNS to the back ends. Some of the back ends had failed to reload their TLS certificates after renewal, so although the certificates were valid unbound was still serving old certs and they eventually expired. This resulted in broken back ends in the pool, which dnsdist kept trying to bring back into service. The intermittent nature of the failures meant that it wasn't obvious to users, as clients generally retry silently in the background. Of course our monitoring should have caught this! We've fixed the underlying problem which caused unbound not to pick up the renewed certificates, and we've improved monitoring to catch similar problems should they occur in future.

  • Closed