Internet DRAFT - draft-hall-dns-data
draft-hall-dns-data
INTERNET-DRAFT Eric A. Hall
Document: draft-hall-dns-data-04.txt March 2004
Expires: September, 2004
Category: Informational
Considerations for DNS Resource Records
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
This document discusses some common design considerations for DNS
resource records and data models.
Internet Draft draft-hall-dns-data-04.txt March 2004
Table of Contents
1. Introduction..............................................2
2. Prerequisites and Terminology.............................3
3. Architectural Principles and Inherent Limitations.........3
3.1. Resource Records.......................................3
3.2. Hierarchical Partitioning..............................5
3.3. Minimalist Messages....................................7
3.4. Built-In Record Caching................................9
3.5. Unreliable Hinting....................................10
3.6. World-Readable Data...................................11
3.7. Implementation Issues.................................11
4. Design Conclusion........................................12
5. Security Considerations..................................13
6. IANA Considerations......................................13
7. Normative References.....................................13
8. Acknowledgments..........................................13
Author's Address..............................................14
Full Copyright Statement......................................14
1. Introduction
In terms of deployment, the Domain Name System (DNS) [STD13] is an
extremely successful network service, having perhaps the widest
usage of all Internet services. Unfortunately, the omnipresence of
DNS makes it a frequent target for well-intentioned efforts to
extend the service into roles that it is technically unsuited to
provide, or which would impose excessive burdens on the Internet
community as a whole if they were widely adopted.
This document attempts to itemize some of these issues, so that
planners and developers can try to avoid these concerns during
their planning cycles. However, it should also be recognized that
there are several modern DNS usage models which violate more than
one of the considerations listed in this document, but which still
provide significant value for the Internet community. As such,
this document should not be considered as a governing device of
any kind, and should not be used to reject any and all proposals
for new usage models. Instead, this document is intended to be
used to facilitate honest discussion about the kinds of problems
that a particular proposal may be expected to encounter, or the
burdens that it may impose on the Internet community as a whole if
it were to be widely adopted.
Hall I-D Expires: September 2004 [page 2]
Internet Draft draft-hall-dns-data-04.txt March 2004
2. Prerequisites and Terminology
Readers of this document are expected to be familiar with STD 13
[STD13], STD 3 [STD3], and RFC 2181 [RFC2181].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC 2119.
3. Architectural Principles and Inherent Limitations
The current collection of DNS specifications define a lightweight
and anonymous "lookup-by-name" service, with compact datagrams
being relayed through a structured network of authoritative
servers and caches, each of which provide access to specific
database partitions and/or resource records. The Domain Name
System is able to fulfill its primary responsibility as a fast and
robust distributed naming service directly as a result of these
design principles.
However, DNS has several built-in critical limitations as a direct
result of the highly-optimized lookup model. For example, DNS does
not provide any functions to "search-by-value", nor does it
provide any sort of mechanisms for user authentication, access
control services, cache-validation, nor most of the other features
which are typically associated with general-purpose databases or
directories. Although DNS could be extended to provide some of
these usages, such an effort would require a significant amount of
engineering and deployment effort in order to preserve
compatibility with existing DNS systems. Furthermore, there is a
significant danger inherent in overloading DNS with excessive
features such that the service itself becomes incapable of
performing lightweight lookups quickly and efficiently, thereby
precluding its primary purpose.
Therefore, it is incumbent upon protocol developers and planners
to recognize and accommodate the issues which govern DNS in its
current form. This document describes the most common of those
issues, so that some of the most common problems can be avoided
early in the design cycle.
3.1. Resource Records
Data stored in DNS uses a common resource record data-structure,
consisting of six fields ["domain name", "type", "class", "time-
Hall I-D Expires: September 2004 [page 3]
Internet Draft draft-hall-dns-data-04.txt March 2004
to-live", "length" and "data" (the "data" field is further
structured according to the kind of data being provided by the
resource record itself)].
The domain name, type and class fields collectively form a unique
identifier for each resource record, and allow clients to
specifically identity the kind of data they want to retrieve for a
specific named resource. Multiple resource records may share the
same domain name, type and/or class values, but those resource
records must have different data values to be considered unique.
At a minimum, all lookup queries must explicitly identify the
domain name of the resource record being requested. Queries may
request all known types and/or classes associated with the named
resource, but typically specify those fields as well. If a query
results in multiple matches, then all of the matching resource
records will be returned.
Furthermore, DNS is only capable of issuing lookup-by-name
queries, and does not provide any queries which would allow a
resolver to search for resource records which contain a particular
data value (although the original DNS specifications did provide a
mechanism for searching a specific server for resource records
with matching data-values, this feature was never widely deployed,
and the query-type has since been officially deprecated).
Meanwhile, DNS has never provided any means to search for all
resource records of a particular type or class, without the client
also specifying an exact domain name to match against, meaning
that it is also not possible to query for all resource records of
a specific type regardless of their name.
In theory, it would be possible to create a super-index of all
zones in the entire distributed database and issue these kinds of
searches against that index, although nobody has ever built such a
system. It is also possible to fake these kinds of searches on a
per-zone basis by transferring the entire zone contents down to
the client and then performing local searches against all of the
resource record data. However, neither of these scenarios
represent normal DNS lookup-by-name queries, and would not be
representative of typical DNS transactions and client processes.
In the absence of these mechanisms, designers must be aware that
they can only issue queries against the name of a resource record
unless they are willing to use something other than DNS.
Hall I-D Expires: September 2004 [page 4]
Internet Draft draft-hall-dns-data-04.txt March 2004
Since it is not possible for a query to specify anything other
than the domain name, type and class fields, it is also not
possible to explicitly request an exact instance from among a set,
unless only one instance of the requested resource record exists
at the specified domain name. However, it is not possible to
guarantee that a particular resource record will only exist in the
singular form at any given time. Although it is possible to demand
that administrators "MUST NOT" create more than one instance of a
particular resource record for any domain name, such demands are
usually at the mercy of the administrators of those systems, and
are generally unenforceable. Data models which depend on singular
instances of a particular record should be designed with the
recognition that multiple resource records may be returned anyway,
and should be prepared to deal with that scenario.
By the same token, it is also not possible to be sure that all of
the resource records from a set will always be returned. For
example, the original DNS specifications allowed each resource
record in a set to have different time-to-live values, and this
allowed (in theory) each resource record to be aged out of a cache
at different times. Furthermore, there have been some secondary
bugs in some implementations which have resulted in incomplete
answer sets being returned and subsequently cached by other nodes.
Although these problems have mostly been addressed over time, it
is still not possible to guarantee with absolute certainty that
all of the records in a set will always be returned. Data models
which depend on spreading component data over multiple resource
records in a set should be designed with this in mind.
On a related point, many application designers are tempted to
utilize the TXT resource record as a container for structured
data. This is generally a terrible idea. For one thing, allowing
one application to remodel the TXT record means allowing everyone
to do so, resulting in all of the TXT resource records being
returned whenever any of them are requested, with an increased
potential for message overflow (see the discussion in section 3.3
for the problems this can cause). Furthermore, this model
complicates message processing, in that the contents of the TXT
resource record have to be analyzed and string-matched. These
problems are avoided by the use of purposeful, task-specific
resource records.
3.2. Hierarchical Partitioning
From a high-level perspective, the DNS database is distributed
across multiple partitions (called "zones"), each of which have
Hall I-D Expires: September 2004 [page 5]
Internet Draft draft-hall-dns-data-04.txt March 2004
ownership for the domain names within that partition. Zones are
linked in a hierarchical tree, with the top-level zones having
zones directly beneath them, and with some of those zones having
subordinate zones, and so forth. Although the zones are arranged
in a hierarchy, each zone acts as an independent partition and is
usually only concerned with the records that it controls directly.
The hierarchical zone structure is traversed by resolvers whenever
a zone which is authoritative for a named resource record needs to
be located (this usually only happens when the answer has not
already been cached). In this regard, the domain name of a
resource record acts like a lookup key, with resolvers matching
the key value against the zone hierarchy until either an answer or
an error is returned.
Unfortunately, domain names are restricted to a maximum length of
255 characters. Since a domain name is the primary identifier for
a resource record, and since the domain name also identifies the
zone where a resource record is stored, the length restrictions of
a domain name can be a significant limitation in some cases. For
example, a domain name for a resource record in a zone that is
nested several layers deep in the global hierarchy could face
significantly tighter space constraints than domain names for
resource records in a top-level zone, simply because there will be
fewer octets left to work with in the lower-level zones.
This can be a significant concern with applications which require
the use of application-specific domain name sequences, especially
when those sequences are relatively long. In some cases, it may
simply be impossible to use those sequences in some zones, given
the space restrictions. As such, the use of application-specific
domain name sequences should generally be avoided.
The use of the per-zone matching system also introduces certain
complexities in feature-negotiation and error-recovery processes
which are generally required in datagram transaction models. In
particular, data-models which depend on issuing "fallback" queries
in those cases where an earlier query has failed can impose
significant burdens on the DNS infrastructure as a whole if the
fallback mechanisms are triggered too frequently or if they are
triggered at heavily-loaded servers.
These problems can be slightly lessened somewhat if the fallback
processing only occurs after the authoritative servers for the
partition have already been learned, since the fallback queries
will be sent directly to those servers, and will not impose
Hall I-D Expires: September 2004 [page 6]
Internet Draft draft-hall-dns-data-04.txt March 2004
additional burdens on the servers responsible for the root and
top-level zones. However, the downside to this directed-fallback
approach is that fewer systems in the query path will have the
opportunity to cache the resulting answer data, which in turn will
guarantee that a larger number of queries will almost certainly be
needed in the common case.
3.3. Minimalist Messages
The DNS protocol uses a highly-compact, binary message format
which is specifically suited for fast and lightweight lookup
transactions. There are very few spurious bits or fields in the
DNS message (there is no "version" field, for example), with the
ultimate objective being very small message sizes.
By default, DNS uses UDP to transfer messages, avoiding the
latency and processing costs that are typically associated with
TCP sessions. However, there are some situations in which UDP
cannot be used, and in those cases, DNS will typically use TCP in
order to ensure that lookups succeed.
Standard DNS messages sent over UDP have a maximum message size of
512 bytes. If a lookup results in a response message that exceeds
the maximum message or datagram sizes, the query process must be
restarted using TCP. Meanwhile, extended DNS (EDNS) [RFC2671] can
carry messages up to 65,535 bytes over UDP, although the actual
payload size is usually limited to 1280 bytes due to limitations
in physical media capacity and problems that arise from
fragmentation. If the size of the EDNS message exceeds the
capacity of the end-to-end link, TCP will again be needed.
However, DNS messages sent over TCP are themselves limited to a
maximum size of 65,535 bytes, and messages which are larger than
that size cannot be transferred over DNS at all. Furthermore, not
all DNS servers support the use of TCP, and in those cases,
messages which overflow the 512-byte limit for UDP will also be
inaccessible. In short, messages which are larger than 512 bytes
always cause performance problems and sometimes trigger
catastrophic failures, while messages which are larger than 65,535
bytes always trigger catastrophic failures.
In those cases where TCP works as expected, there can be several
penalties from its use. For example, TCP session management
typically consumes more resources than UDP datagrams, which can
significantly limit the number of queries that a server is able to
process at any given time. For a particularly busy server,
Hall I-D Expires: September 2004 [page 7]
Internet Draft draft-hall-dns-data-04.txt March 2004
processing a significant number of TCP transactions can mean that
other transactions will have to be rejected. Meanwhile, the use of
TCP also requires more round-trips, which can sometimes cause
timers to expire while the query is still being processed,
resulting in multiple duplicate queries going to that server (each
of which will subsequently require TCP sessions), accelerating the
negative affects.
It's also important to recognize that TCP queries are often
exchanged between the local resolver and the target server
directly, and can bypass some parts of the caching infrastructure.
As a result, answers returned over the TCP connection might not be
cached by intermediary nodes, and the entire process would need to
be repeated for each instance of the same query.
For all of these reasons, planners and developers are strongly
encouraged to limit resource record data to sizes that will not
cause UDP messages to overflow. In those cases where this is
unavoidable, they should be prepared for a variety of problems,
including performance degradation and outright failure.
Note that the DNS message format uses a protocol-specific
compression technique which can be used to substitute label
sequences with offset pointers to previous occurrences of those
sequences, thereby saving some space in the message itself.
However, this mechanism only works in a few instances, and is not
as widely usable as many people hope. For example, older caches
are not aware of newer resource record data-structures, so the
compression mechanism cannot be used in the data fields of those
resource records, but instead can only be used with the owner
domain name of the resource record itself. This is not a problem
for resource records which were defined as part of [STD13], since
those resource records have well-known formats, but newer resource
records are unable to utilize the standard compression mechanism.
This is an especially important consideration to keep in mind when
considering large data-structures. While it is tempting to believe
that domain names can be compressed to save space in the message,
this simply is not true as often as people would like.
It is entirely feasible for newer resource records to define their
own (record-specific) compression algorithms, although such
schemes must be planned with legacy caches in mind -- those
devices will not be able to expand the contents of those records,
nor will they be able to apply any other kind of logic against
those records.
Hall I-D Expires: September 2004 [page 8]
Internet Draft draft-hall-dns-data-04.txt March 2004
3.4. Built-In Record Caching
DNS resolvers and servers are allowed to cache resource records
that they have discovered as part of normal query processing. This
allows subsequent queries for that information to be answered
immediately from the cache, without requiring another batch of
transactions for the same information. In turn, this ensures that
lookups are answered in the shortest amount of time, that servers
are not excessively burdened by unnecessary queries, and that the
total number of transactions are kept to a minimum.
Since DNS is optimized for lookups, the use of caching is
generally considered a positive feature. However, caching can also
be somewhat hostile towards certain usage models, especially since
DNS does not provide any mechanisms for forcing a system to flush
its cache of previously discovered records. In particular, caches
prevent data from being validated against an authoritative source,
in that a resolver or application cannot "demand" that a query be
forwarded to an authoritative server (the client can do this on
its own accord, but cannot request a proxy to do this on behalf of
the client). While this is normally beneficial for lookup
activities, it can be a devastating feature for data models that
require data-integrity at all times.
Although DNS servers can dictate the maximum length of time that a
resource record is to be held in a cache, data models which
require the use of low time-to-live settings are generally frowned
upon by the DNS community, as these resource records place a
disproportionate burden on the infrastructure. Furthermore, some
DNS agents are known to apply their own minimum time-to-live
values, regardless of the settings associated with the original
data. As a result, DNS is generally considered to be inappropriate
for data models which require full-time and instantaneous data
integrity, and developers are generally encouraged to look towards
other services if this level of responsiveness is needed,
especially if the application is expected to be widely deployed.
Another issue related to caching limitations is the amount of
memory available to each particular cache. All systems have fixed
amounts of available memory, and when that memory is consumed,
some data will necessarily have to be flushed in order for any new
data to be stored. If the flushed data is subsequently needed
again, the query path will have to be reprocessed, and the cache
will have to flush some other data in order to make room for the
Hall I-D Expires: September 2004 [page 9]
Internet Draft draft-hall-dns-data-04.txt March 2004
answers. In heavily loaded environments (such as a very busy ISP),
this can result in a constant churning of the memory pool.
This is obviously a good reason to limit the size of each resource
record's data-structures, but it is also a good reason for
limiting the total number of resource records in a set. Since each
entry will have to consume memory in a cache somewhere, large
resource record data blocks and large sets of resource records
will both contribute to the potential for cache churning.
3.5. Unreliable Hinting
DNS responses provide for the inclusion of hinting data, by way of
the Additional-Data section of the response message. In the usual
case, this section of the response message will contain resource
records that are associated with the originally requested resource
records (such as listing the IP addresses associated with a name
server or mail server). However, due to certain design
considerations, this data is often incomplete, and is almost
always unreliable, and therefore must often be ignored.
Specifically, DNS messages contain a "truncation" flag which
indicates whether or not all of the answer data has been returned
in the message (if the flag is enabled, the recipient system will
need to retry the query via some other transport, as discussed in
section 3.3). However, this flag does not apply to the Additional-
Data section of the response message, and in the absence of any
such flag (and there is none, so its absence is guaranteed),
recipient systems must always assume that the Additional-Data
section is incomplete.
Meanwhile, caches often store resource records gleaned from the
additional-data section, and then provide that data as answers to
subsequent queries. In order to prevent caches from capturing and
relaying incomplete resource record sets, authoritative servers
should only provide them when the full set will fit within the
Additional-Data section. Unfortunately, there are still several
systems in use which do not conform to this behavior, and it is
therefore possible that any given Additional-Data section will not
contain a complete set.
In theory, it is possible to examine the Authoritative Answer flag
of a response message to determine whether or not the full set of
resource records have been provided. Unfortunately, there are
implementations which are known to have inheritance bugs, where
resource records from the Additional-Data section are stored with
Hall I-D Expires: September 2004 [page 10]
Internet Draft draft-hall-dns-data-04.txt March 2004
the Authoritative Answer flag, even though that flag is only
supposed to apply to the Answer section.
As a result of these considerations, applications are generally
encouraged to avoid the Additional-Data section entirely.
Furthermore, if an application does make use of any resource
records in this section, those applications are generally
encouraged to issue new queries for those resource records if the
data is absolutely critical, thus ensuring that the full answer
set is always retrieved. Unfortunately, even that cannot be
guaranteed, due to the prevalence of bugs.
3.6. World-Readable Data
DNS is optimal for publishing anonymous and world-readable data,
given the implicit collusion between several of the design factors
in the DNS model. For example, the need for small DNS messages
generally precludes the use of data such as access-control lists,
while the use of in-line caching generally precludes any
presumption of privacy.
More specifically, DNS does not provide any mechanisms for
authenticating users during the lookup process, nor does it
provide any mechanisms for linking access controls to a resource
record across the global network of servers and caches. Without
these features, DNS is unsuitable for applications which require
authenticated access to private data.
Furthermore, although some products provide mechanisms for
restricting query-level access to ranges of IP addresses or other
filtered sources, it is important to recognize that once the
resource records get into a cache outside of the protected scope,
the information is only as secure as that system. In this regard,
a cache which resides outside of a firewall will be just as
informative as the DNS servers inside the firewall.
In the end, there is no such thing as "private" data with DNS.
Developers must treat all data as if it will eventually be made
public, and are strongly encouraged to use some other service if
higher levels of security are required.
3.7. Implementation Issues
Most of the DNS resolvers which are provided with operating
systems and TCP/IP stacks are purposefully optimized for the most
Hall I-D Expires: September 2004 [page 11]
Internet Draft draft-hall-dns-data-04.txt March 2004
common queries, usually only offering APIs for resource record
types such as IPv4 addresses and a couple of others, but with
little or no direct support for other resource record types.
Similarly, some resolvers do not provide the kind of granularity
that an application may require. For example, some resolvers have
been known to provide only one resource record from a set to the
application, and these resolvers can cause problems with
applications which need to see the full response.
Although there are usually several mechanisms that application
developers can pursue to overcome these kinds of limitations, many
developers are loathe to do so, especially when their applications
are ported across multiple operating systems. Unfortunately, this
reticence can represent a significant hurdle towards wide
deployment of even the best designed resource records and usage
models, and is frequently the single largest obstacle which must
be overcome by protocol developers and planners. By no means
should this be considered as a show-stopper, but it should be
recognized as a potentially significant hurdle, and that the
majority of the applications will likely be unable to immediately
adopt any sufficiently new technologies.
4. Design Conclusion
Due to the architectural tradeoffs inherent in the DNS lookup
model, some usage models are better suited to DNS than others. In
particular, DNS is highly efficient at lookups of compact, public
and relatively stable data. Conversely, DNS is unsuitable for
value-based queries or searches, restricted-access data, highly-
dynamic data, or large records and arrays. Applications which
require access to those kinds of data should investigate services
such as LDAP or HTTP as being more appropriate.
Generally speaking, planners and developers can usually define
their own resource record types as part of a standards-track
specification without interference from the DNS community, as long
as the functional scope is limited to defining data-structures for
those resource record types. However, there are some cases where
it may be useful or necessary for the DNS community to be involved
with the standardization of a particular resource record type.
In particular, if a resource record type requires a server to
perform some kind of extra processing other than piping data from
a database into a message, then the DNS community should be
consulted. Similarly, requiring that servers provide additional
Hall I-D Expires: September 2004 [page 12]
Internet Draft draft-hall-dns-data-04.txt March 2004
data outside the answer section of the response message should be
vetted with the community. Moreover, if a specification requires
special structuring of the message for the benefit of a single
service, then the DNS community should definitely be involved in
the discussion, since any changes to the highly-optimized message
format could be disastrous in non-obvious ways.
Requests to reserve portions of the namespace for the use of a
single network service should also be brought to the DNS community
for discussion.
Finally, if a particular usage goes against more than two of the
recommendations put forth in this document, then it would probably
be a good idea to consult with the DNS community over any
alternatives which may be available.
In all cases, IANA must be involved in delegating resource record
type codes and mnemonics.
5. Security Considerations
This document does not create any security considerations.
6. IANA Considerations
This document does not create any IANA considerations.
7. Normative References
[RFC2181] Elz, R., and R. Bush, "Clarifications to the
DNS Specification", RFC 2181, July 1997.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS
(EDNS0)", RFC 2671, August 1999.
[STD3] Braden, R., "Requirements for Internet Hosts -
Application and Support", STD 3, RFC 1123,
October 1989.
[STD13] Mockapetris, P., "Domain names - concepts and
facilities", STD 13, RFC 1034 and "Domain
names - implementation and specification", STD
13, RFC 1035, November 1987.
8. Acknowledgments
Funding for the RFC editor function is currently provided by the
Internet Society.
Hall I-D Expires: September 2004 [page 13]
Internet Draft draft-hall-dns-data-04.txt March 2004
Significant feedback on this document was provided by Edward Lewis
and Walt Howard.
Author's Address
Eric A. Hall
ehall@ehsco.com
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise
explain it or assist in its implementation may be prepared,
copied, published and distributed, in whole or in part, without
restriction of any kind, provided that the above copyright notice
and this paragraph are included on all such copies and derivative
works. However, this document itself may not be modified in any
way, such as by removing the copyright notice or references to the
Internet Society or other Internet organizations, except as needed
for the purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards
process must be followed, or as required to translate it into
languages other than English.
The limited permissions granted above are perpetual and will not
be revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Hall I-D Expires: September 2004 [page 14]