Yesterday we published BIND 9.16.48, 9.18.24, and 9.19.21. These releases mitigate several vulnerabilities that are described in our announcement. Two of these vulnerabilities were multi-vendor, pan-DNS concerns.
These two CVEs are:
ISC would like to thank Elias Heftrig, Haya Schulmann, Niklas Vogel, and Michael Waidner from the German National Research Center for Applied Cybersecurity ATHENE for bringing the KeyTrap vulnerability to our attention and coordinating disclosure. The research team also provided invaluable help testing mitigations for KeyTrap.
Both of these are ways in which an abuser can exploit standard DNSSEC protocols intended for DNS integrity by using excessive resources on a resolver, causing a denial of service for legitimate users.
What is the KeyTrap vulnerability?
Essentially, the attacker crafts a DNS zone with many DNSKEY and RRSIG records, and a standards-compliant DNSSEC validator tries all possible combinations of DNSKEY and RRSIG records in the vain hope of finding the one combination which matches and validates. If the validator does not implement an explicit limit on the amount of work it will do, it can spend an outrageous amount of resources doing useless work. This attack is also asymmetric - the attacker expends relatively little effort to cause the resolver to expend a lot of effort.
This attack is extremely effective against older versions of BIND because DNSSEC validation was historically done in the same processing thread as basically everything else. This design flaw in BIND, together with the unlimited efforts at validation, allowed an attacker to block query processing in BIND for a really long time – on the order of minutes or possibly hours on a slow CPU.
KeyTrap mitigation in BIND
To mitigate the KeyTrap vulnerability we have made two significant changes to BIND:
-
BIND now limits the amount of work spent on DNSSEC-validating a single answer.
-
BIND now offloads DNSSEC validation into separate threads.
This change provides defense in depth: DNSSEC validation no longer blocks processing of other requests. Thanks to this design change in BIND, KeyTrap and other similar DNSSEC-related vulnerabilities will not have as strong an impact on unrelated queries. This change also improves resolver resilience when under random subdomain attacks targeting DNSSEC-signed domains.
With these changes, a DNSSEC validation attack which bypasses all other limits will be able to consume no more than approximately 1/2 of CPU capacity on the affected machine, leaving the other half for normal processing.
NSEC3 closest encloser proof can exhaust CPU (CVE-2023-50868)
The effectiveness of this design change is demonstrated by fact that our mitigation for KeyTrap is also effective against another denial-of-service attack published yesterday, CVE-2023-50868.
An attacker either selects or creates a DNSSEC-signed zone with NSEC3 parameters configured in excess of the Best Current Practice RFC9276, primarily by using extra iterations, and then launches a random subdomain attack against this zone. Because this Best Practice document is not yet universally followed, resolvers typically accept the extra iterations and spend CPU cycles on SHA1 hashing.
These extra SHA1 hash iterations serve as another potential denial-of-service attack vector. Again, the relevant standard, RFC 5155 section 8.3, does not warn about this risk, and multiple implementations did not protect against it. Ironically, we discovered this flaw while testing mitigations for KeyTrap!
The novelty of this vulnerability is in the ability to influence not only the zone used, but also the number of retries done by the Closest Encloser Proof algorithm. This allows the attacker to make the attack roughly 125x more effective than previously thought possible.
Luckily, all versions of BIND released in 2023 already limited the number of NSEC3 iterations to a maxiumum of 150, and the SHA1 hash algorithm is efficient, so the impact on recent versions of BIND is much milder: it requires hundreds of queries per second to exhaust a resolver CPU.
The sad part of this story is that if all DNS zone operators had followed the Best Current Practice, resolver implementations could have enforced stricter limits, making this attack totally ineffective.
We are not there yet, but we at Internet Systems Consortium are committed to tightening the limits on NSEC3 iterations as soon as practical - and we encourage DNS zone operators to follow the advice. Please read and follow RFC 9276 section 3.1!
DNS scalability: the good, the bad, and the ugly
The fact that it is possible to cause excessive use of resources is not entirely an accident: the DNS protocol specification(s) intentionally do not put explicit limits on many things, including:
- the number of CNAME records in a chain - which led to the DNS Unchained attack
- the number of delegations in a recursion loop - which led to NXNSAttack
- the number of answers to a given source(s) - which led to amplification attacks
- the number of queries in general - which led to the invention of the random subdomain attack
- the number of validations - which led to KeyTrap
- the number of NSEC3 hash iterations - which led to CVE-2023-50868
- the number of answers in an ECS-enabled cache - which led to CVE-2023-5680
- the number of … basically anything.
Now, you might be asking yourself: were the DNS protocol standards completely bonkers?! And the answer is no!
If they had included explicit limits on all of these parameters back in 1987, we would not have been able to scale the DNS all this time without changing the protocol.
Imagine there were a hardcoded limit on the number of CNAME steps in a chain: if the limit was, e.g., “2 CNAMEs at most,” we would not have been able to construct today’s Content Delivery Networks. If there were a limit on the number of DNSKEYs, say “at most 2 DNSKEYs,” we would not have been able to use multi-signer DNSSEC setups. And we could go on. The lack of limits is on one hand dangerous, and on the other hand it has allowed us to use the same protocol and scale it for 37 years in a row!
We did not listen
Of course the DNS protocol designers were not stupid, and they foresaw this class of problems. Back in 1987 they provided some generic guidelines for implementers:
The recommended priorities for the resolver designer are:
1. Bound the amount of work (packets sent, parallel processes
started) so that a request can't get into an infinite loop or
start off a chain reaction of requests or queries with other
implementations EVEN IF SOMEONE HAS INCORRECTLY CONFIGURED
SOME DATA.
Indeed, you read it correctly, even back in 1987 when the original DNS specification was written, the top priority was limiting the amount of work done by the implementation! Again and again, researchers continue to show implementers the dark corners where this simple instruction was not followed.
The KeyTrap (CVE-2023-50387) and the NSEC3 closest encloser proof CPU exhaustion (CVE-2023-50868) vulnerabilities have joined the ranks of similar CVEs based on tricking DNS implementations into doing excessive and unnecessary work.
Behind the scenes
There will be more attacks of this type, because the DNS protocol is notoriously complex. Luckily, the DNS ecosystem has a healthy mix of implementers who (mostly) are able to openly speak to each other and to coordinate orderly remediation and disclosure of these vulnerabilities.
Many DNS implementers publicly participate and share operation and development experiences through the DNS Operations, Analysis, and Research Center, or DNS-OARC. We have to thank DNS-OARC for providing a venue for coordination and secure channels which allowed everyone involved to work together on mitigations for these two recent vulnerabilities.
If you are doing serious work in the field of DNS and still do not participate in DNS-OARC, it’s time to reconsider! Join DNS-OARC and their Mattermost chat server, and attend their excellent workshops!
Conclusion
The German National Research Center for Applied Cybersecurity ATHENE research team found an implementation problem in several DNSSEC validators which stems from a lack of imagination on the part of DNS software developers, a lack of any explicit warning in the DNSSEC standards RFC 4035, section 5.3.3, and a failure to follow decades-old, very generic advice.
Fortunately, these attacks that misuse complexity in the DNS can be fixed without changing the protocol fundamentals. The changes implemented in BIND and other DNS systems to mitigate these two vulnerabilities will improve their resilience in future attack scenarios. DNS is in a very different position than, e.g., PGP’s SKS key server network, which was basically rendered useless by one practical attack based on complexity.
The DNSSEC protocol continues to be secure and provides valuable protection from various attacks on the integrity of the DNS. The best response to this vulnerability is for users to update their DNS software to a patched version.
BIND users who are interested in receiving advance notification of security announcements involving BIND are encouraged to contact our sales team for more information.
CVE-2023-50387 References
ISC References