Lead Scientist Responds to Questions on Root Server Queries
cidtoday writes "A CircleID interview
with the lead scientist whose study recently revealed that 98% of a main root
server queries are unnecessary, reveals that spam has little to do with the
issue. In fact, he provides two reasons why anti-spam tools cause more
unnecessary queries to the root servers than spam emails. Many other questions previously
raised by Slashdot readers on the study are also answered."
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
When you type in a webpage address, say, slashdot.org, your computer needs to have a way to find out that it needs to send a message to the IP address of the server. that way is DNS. most ISP's host several of their own DNS servers that keep track of which addresses have been recently resolved so that their customers can get faster resolution. if an address hasn't been recently resolved and is no longer/never was in the DNS cache, then it's time to hit up one of the 13 root servers with a request.
Internet's Main Root Server Saturated By 98%: Should You Be Concerned?
February 26, 2003
By CircleID | Add+Read Comments | Email Article
A recent study by researchers at the Cooperative Association for Internet Data Analysis (CAIDA) at the San Diego Super Computer Center (SDSC) revealed that a staggering 98% of the global Internet queries to one of the main root servers, at the heart of the Internet, were unnecessary. This analysis was conducted on data collected October 4, 2002 from the 'F' root server located in Palo Alto, California.
The findings of the study were originally presented to the North American Network Operators' Group (NANOG) on October 2002 and later discussed with Richard A. Clarke, chairman of the President's Critical Infrastructure Protection Board and Special Advisor to the U.S. President for Cyber Space Security.
In this special CircleID interview with Duane Wessels, president of The Measurement Factory and one of the main scientists who lead the root server study, we attempt to gain a better sense of what has been discovered? What can be done about it? And, how? But most importantly, why? After all, from an end-user's perspective, the Internet appears to be working just fine! Should a businesses that fully or partially depends on the Internet be concerned? Read on...
CircleID: Mr. Wessels, could you give us a bit of background about yourself and tell us what initiated the study?
Duane Wessels: I started doing Internet research in 1994. From 1996 to 2000 I worked for the National Laboratory for Applied Network Research (NLANR)/UCSD on a web caching project, including Squid, funded by the National Science Foundation. These days I am president of The Measurement Factory, where we develop tools for testing performance and compliance.
For this study I joined up with my old friends at CAIDA. Funding for this work came from WIDE in response to questions from ICANN's Root Server System Advisory Committee (RSSAC).
CircleID: Could you give us a brief background on the significance of your findings in this study, particularly the unique discoveries that were not already known to the technical and scientific community?
Duane Wessels: Certain facts about root server traffic have been known for a long time. Earlier studies identified certain problems, and some root server operators publish traffic statistics (number of queries, etc). What is unique about our study is that we developed a simple model of the DNS and used that model to categorize each and every query. This allowed us to say, for example, "this query is valid, because we haven't heard from this client before, but this other query is invalid, because the same client sent the same query a short time ago."
We also took a much longer trace than earlier studies and spent more time looking at individual abusers.
CircleID: Why the F root server? Is there a particular reason why this root server, located in Palo Alto, California, was selected for the study rather than the other 12 servers?
Duane Wessels: Paul Vixie and the Internet Software Consortium were kind enough to give us access to the query stream. ISC has the infrastructure in place to make this happen easily, and without any chance of disrupting the operation of the server. We are currently working with other operators to get data from additional sites.
CircleID: The report on the study indicates "a detailed analysis of 152 million messages received on Oct4, 2002." In other words, the final results are based on only one set of data collected within 24 hours. What about comparison to other dates? Why are you confident that findings from this particular day, October 4, 2002, is a sufficient indication of what is happening today -- or tomorrow, for that matter?
Duane Wessels: We have no reason to believe that October 4, 2002 is special. It just happens to be the first day that we successfully collected a 24-hour trace. We took shorter traces before and after this date, and they have similar characteristics. For example, our talk and paperPDF mention a particularly large abuser (the Name Registration Company). While writing the paper, we were curious to see whether they had cleaned up their act yet. Indeed, they had not. They were still abusing the F root server months after we had notified them about the problem.
CircleID: Why should end-users be concerned about the findings, given that their Internet browsing experience does not appear to be affected in any noticeable way?
Duane Wessels: It's likely that most end-users are not impacted by root server abusers, for several reasons. One is that most users are going through properly functioning name servers, and their queries rarely reach a root name server. Another is that the root servers are overprovisioned in order to handle the load -- root DNS servers are typically multiple boxes placed behind load balancers, and some are even geographically distributed.
CircleID: What about companies that are running part or all of their business on the web? How are they being affected by this very high -- unnecessarily high -- root server inquiry rate?
Duane Wessels: Again, I would bet that most of them are properly configured and not severely impacted by root server abuse. Our results showed that 50% of the root server traffic comes from only 220 IP addresses. It's possible that some of these 220 addresses are experiencing a negative side-effect, but I believe that most of these problems go unnoticed. For example, some web servers are configured to look up IP addresses in the in-addr.arpa domain so they can log a hostname instead of an address. But if the lookup fails (as in-addr.arpa queries often do), nobody really notices. The web server logs the address anyway after a timeout.
CircleID: Moving on to possible causes -- at this time, what do you think are the main reasons for such a high (98%) inquiry rate? Is it possible to identify them?
Duane Wessels: The short answer is that we suspect firewalls and packet filters.
When we initially started the study, our assumption was that there must be some broken software out there causing all the root server traffic. Aside from an old bug with Microsoft's resolver [a system to locate records that would answer a query], we didn't really find any implementation-specific problems.
Approximately 75% of the root server's queries were duplicates. Furthermore, we noticed that most of the repeats occurred at sensible intervals. That is, the agents making queries seemed to be following the protocol specifications.
From this, it seems most likely that these agents are just not receiving any DNS replies. To the application, it looks like a network outage, so it keeps on retransmitting. By investigating a few individual abusers, we know that they indeed do not receive replies from the root server.
CircleID: According to Radicati Group research firm, more than 2.3 billion spam messages are broadcast daily over the Internet, and this number is expected to rise to 15 billion by 2006. How does spam, particularly at such high rates, affect the root servers -- especially when you take into account millions, if not billions, of spam emails floating around in people's inboxes, many of which contain broken links that cause bad DNS lookups.
Duane Wessels: It's entirely possible that spam emails generate an increased load for the root name servers. However, I don't think that simply sending spam increases load. Rather, it's more likely that anti-spam tools do. I can think of two specific examples:
1. Many anti-spam tools verify "From" addresses and perhaps other fields. If the From address has an invalid hostname, such as "spam.my.domain," the root servers will see more requests, because the top level domain does not exist.
2. Anti-spam tools also make various checks on the IP address of the connecting client -- for example, the various "realtime blackhole lists" and basic in-addr.arpa checks. These may be causing an increase in root server load, not simply because of the amount of spam, but also because these tools silently ignore failures.
CircleID: According to the report, "About 12% of the queries received by the root server on October 4 were for nonexistent top-level domains, such as '.elvis,' '.corp,' and '.localhost.'" Many Internet users, in order to avoid spam, are increasingly providing dummy email addresses whenever they are forced to provide personal information on the web. Are all those 'email@lives.elvis'-type fake email addresses triggering part of the 98% problem?
Duane Wessels: I don't believe so, but I can't be sure.
Many of the fake email addresses that I've seen are of the form wessels.NOSPAM@example.com or wessels@nospam.example.com.
Most of the unknown TLD queries probably come from short hostnames. For example, if I set my hostname to "elvis" (instead of "elvis.example.com"), then the root servers are likely to see queries for the short name "elvis."
CircleID: This is a direct quote from SDSC news release:
"Researchers believe that many bad requests occur because organizations have misconfigured packet filters and firewalls, security mechanisms intended to restrict certain types of network traffic."
How far can current unnecessary root server inquiry rates be reduced, considering that organizations such as ISPs will be required to dedicate added time and financial resources to help in the reduction? Do you foresee new regulations and penalties for organizations that are responsible?
Duane Wessels: Regulations and/or penalties are extremely unlikely. They would be impossible to monitor and enforce.
I am, unfortunately, skeptical that ISPs and other network operators will take the initiative to reduce root server traffic, for three reasons:
1. The system works sufficiently well as-is. Many applications use the DNS, but do not depend on it. Unresolved queries go silently unnoticed.
2. A very small number of sources can cause a significant amount of abuse.
3. It's often difficult to get people to recognize they have a problem, and even harder to get them to fix it.
As is often the case with studies such as this, network administrators are left feeling somewhat helpless. That is why we also wrote a tool for examining the DNS traffic leaving a network. People can download our "dnstop" tool from http://dnstop.measurement-factory.com/.
One of the abusers was misusing packet filters to block incoming, but not outgoing, DNS packets. This prompted us to write a technote for ISC that describes how people should be configuring their authoritative-only name servers. You can find it at http://www.isc.org/tn/.
The root servers are responsible for providing the IP addresses of the name servers for the top-level domains such as .com, .edu, .org. If you want the IP address for slashdot.org, you ask the root nameserver for the IP address of the nameserver responsible for .org, you then ask the .org nameserver for the IP address of slashdot.org.
Mea navis aericumbens anguillis abundat
The root servers are the "invisible" trailing dot in
www.slashdot.org. <- that one at the end
The root DNS servers point to the top-level domains (TLDs) such as the Country Code TLD (ccTLD) and generic TLD (gTLD).
So the root server points to the servers for the 'org' domain (or subdomain), which are now handled by Internet Society and Public Interest Registry that operate several DNS authoritative DNS servers for the ORG domain. These then point to the authortative servers for slashdot.org, and we (or our ISP on our behalf) do yet another DNS request, this time to one of the authoratitive slashdot.org DNS servers, and lookup the IP address of www.slashdot.org or slashdot.org.
To reduce the number of requests, our ISP DNS server will normally cache answers for both the TLDs servers, and specific subdomains, such as slashdot.org and specific hostnames such as www.slashdot.org.
DNS cache.
My company firewall is a Linux host-based box with some custom logging apps, squid and tinydns. Making your network "Internet friendly" is easy:
iptables -t nat -I PREROUTING -p udp --dport 53 -j REDIRECT --to-ports 53
directs all your outbound DNS to your cache. Let users, rogue admins, and anyone else try and resolve from particular nameservers, all they'll get is your own cache.
I want to delete my account but Slashdot doesn't allow it.
set your dhcp server to assign your company dns server to the clients.
:) they'll be scratching their heads : )
THEN
iptables -I FORWARD -p udp --dport 53 -j DROP
let them try to hit any external dns servers
Lawyers, MBA's, RIAA? A jedi fears not these things!
Doesn't have anything to do with the root name server stuff; in fact, if you use your IP address instead of your hostname, you'll entirely skip the DNS part. Also, that site doesn't work very well, because all of the tricks to specify the IP address instead of the name point at the wrong IP address.
Availability checks for domain name registrars never hit the root servers. The registrars connect directly to the SRS (Shared Registry System) and look up records there.
.com, .net, and .org), none of them refresh in real-time, so you could be registering a domain that had actually been taken 6 hours previously.
It would be silly to use the root servers as a basis for availability, especially since the root servers know nothing about individual domains, only TLDs (the root server zone file is less then 50k, iirc) . But even assuming you meant the DNS servers one level down(like the GTLD servers that handle
Thanks,
Matt
me@mzi.to