Lead Scientist Responds to Questions on Root Server Queries
cidtoday writes "A CircleID interview
with the lead scientist whose study recently revealed that 98% of a main root
server queries are unnecessary, reveals that spam has little to do with the
issue. In fact, he provides two reasons why anti-spam tools cause more
unnecessary queries to the root servers than spam emails. Many other questions previously
raised by Slashdot readers on the study are also answered."
--sex
Very popular slashdot journal for adul
don't go to the article all at once, or those questions will continue unanswered!
Here's a link that lists how some spammers attempt to hide their real identities. This isn't necessarily exactly what the root server query guy was talking about, or maybe it is? Either way, it is very enlightening. Some slashdotters even occasionally try to hide a goatse link this way.
--sex
Very popular slashdot journal for adul
I read it as "Lead" as in "Lead Guitarist", and subsequently wanted to know which band he was in.
http://jesus.everdense.com/
It's BB&N... er, GTEI... er, Genuity that's getting pounded. They provide caching DNS servers to the entire Internet at 4.2.2.1 (.2, ...) and because they're so easily memorizable, I've never met a sysadmin who didn't put them in a hosts' configuration in a pinch.
If they can identify and quantify eplicit networks or IP addresses causing the 'abuse', then why don't they send a warning and then block them? They'll fix the problem real quick.....
Feed the need: Digitaladdiction.net
reveals that spam has little to do with the issue. In fact, he provides two reasons why anti-spam tools cause more unnecessary queries to the root servers than spam emails...
So Spam has little to do with extra traffic, but the wealth of tools fighting against spam are adding to the load, right? But then since spam is the reason anti-spam tools exist, it's fair to say spam is the root cause of the problem!
Code, Hardware, stuff like that.
No... but my first though on Root Server was a waitress with vegetables.
Rod Taylor
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
With all the talk that floats around, about every household electronic appliance having its own IP. And this also leading to companies adding everything as some kind of named host within in a home network i.e yourhomeaddress.personal.ps2.sony or yourhomeaddress.personal.microwave.bosh. What can root servers actually handle. I'd hate to see someone bring down a root server with a microwave oven, well without actually putting it in one :)
--+> Life, is there any?
We have enough geeks and articles about geeks who tinker with things to optimize them even though they work just fine the way they are.
The root server engineers are busy explaining why not to tinker with things that are clearly and inherently broken.
Don't complain about useless queries -- FIX THE SYSTEM.
Background: 28/M/Bi-Sexual; Owner of a Linux company; MBA Harvard 2003; B.S. Comp Sci MIT 2000
List, please? Hey Bush, forget about Iraq, let's take these bastards out. [grabs ak-47]
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
When you type in a webpage address, say, slashdot.org, your computer needs to have a way to find out that it needs to send a message to the IP address of the server. that way is DNS. most ISP's host several of their own DNS servers that keep track of which addresses have been recently resolved so that their customers can get faster resolution. if an address hasn't been recently resolved and is no longer/never was in the DNS cache, then it's time to hit up one of the 13 root servers with a request.
Internet's Main Root Server Saturated By 98%: Should You Be Concerned?
February 26, 2003
By CircleID | Add+Read Comments | Email Article
A recent study by researchers at the Cooperative Association for Internet Data Analysis (CAIDA) at the San Diego Super Computer Center (SDSC) revealed that a staggering 98% of the global Internet queries to one of the main root servers, at the heart of the Internet, were unnecessary. This analysis was conducted on data collected October 4, 2002 from the 'F' root server located in Palo Alto, California.
The findings of the study were originally presented to the North American Network Operators' Group (NANOG) on October 2002 and later discussed with Richard A. Clarke, chairman of the President's Critical Infrastructure Protection Board and Special Advisor to the U.S. President for Cyber Space Security.
In this special CircleID interview with Duane Wessels, president of The Measurement Factory and one of the main scientists who lead the root server study, we attempt to gain a better sense of what has been discovered? What can be done about it? And, how? But most importantly, why? After all, from an end-user's perspective, the Internet appears to be working just fine! Should a businesses that fully or partially depends on the Internet be concerned? Read on...
CircleID: Mr. Wessels, could you give us a bit of background about yourself and tell us what initiated the study?
Duane Wessels: I started doing Internet research in 1994. From 1996 to 2000 I worked for the National Laboratory for Applied Network Research (NLANR)/UCSD on a web caching project, including Squid, funded by the National Science Foundation. These days I am president of The Measurement Factory, where we develop tools for testing performance and compliance.
For this study I joined up with my old friends at CAIDA. Funding for this work came from WIDE in response to questions from ICANN's Root Server System Advisory Committee (RSSAC).
CircleID: Could you give us a brief background on the significance of your findings in this study, particularly the unique discoveries that were not already known to the technical and scientific community?
Duane Wessels: Certain facts about root server traffic have been known for a long time. Earlier studies identified certain problems, and some root server operators publish traffic statistics (number of queries, etc). What is unique about our study is that we developed a simple model of the DNS and used that model to categorize each and every query. This allowed us to say, for example, "this query is valid, because we haven't heard from this client before, but this other query is invalid, because the same client sent the same query a short time ago."
We also took a much longer trace than earlier studies and spent more time looking at individual abusers.
CircleID: Why the F root server? Is there a particular reason why this root server, located in Palo Alto, California, was selected for the study rather than the other 12 servers?
Duane Wessels: Paul Vixie and the Internet Software Consortium were kind enough to give us access to the query stream. ISC has the infrastructure in place to make this happen easily, and without any chance of disrupting the operation of the server. We are currently working with other operators to get data from additional sites.
CircleID: The report on the study indicates "a detailed analysis of 152 million messages received on Oct4, 2002." In other words, the final results are based on only one set of data collected within 24 hours. What about comparison to other dates? Why are you confident that findings from this particular day, October 4, 2002, is a sufficient indication of what is happening today -- or tomorrow, for that matter?
Duane Wessels: We have no reason to believe that October 4, 2002 is special. It just happens to be the first day that we successfully collected a 24-hour trace. We took shorter traces before and after this date, and they have similar characteristics. For example, our talk and paperPDF mention a particularly large abuser (the Name Registration Company). While writing the paper, we were curious to see whether they had cleaned up their act yet. Indeed, they had not. They were still abusing the F root server months after we had notified them about the problem.
CircleID: Why should end-users be concerned about the findings, given that their Internet browsing experience does not appear to be affected in any noticeable way?
Duane Wessels: It's likely that most end-users are not impacted by root server abusers, for several reasons. One is that most users are going through properly functioning name servers, and their queries rarely reach a root name server. Another is that the root servers are overprovisioned in order to handle the load -- root DNS servers are typically multiple boxes placed behind load balancers, and some are even geographically distributed.
CircleID: What about companies that are running part or all of their business on the web? How are they being affected by this very high -- unnecessarily high -- root server inquiry rate?
Duane Wessels: Again, I would bet that most of them are properly configured and not severely impacted by root server abuse. Our results showed that 50% of the root server traffic comes from only 220 IP addresses. It's possible that some of these 220 addresses are experiencing a negative side-effect, but I believe that most of these problems go unnoticed. For example, some web servers are configured to look up IP addresses in the in-addr.arpa domain so they can log a hostname instead of an address. But if the lookup fails (as in-addr.arpa queries often do), nobody really notices. The web server logs the address anyway after a timeout.
CircleID: Moving on to possible causes -- at this time, what do you think are the main reasons for such a high (98%) inquiry rate? Is it possible to identify them?
Duane Wessels: The short answer is that we suspect firewalls and packet filters.
When we initially started the study, our assumption was that there must be some broken software out there causing all the root server traffic. Aside from an old bug with Microsoft's resolver [a system to locate records that would answer a query], we didn't really find any implementation-specific problems.
Approximately 75% of the root server's queries were duplicates. Furthermore, we noticed that most of the repeats occurred at sensible intervals. That is, the agents making queries seemed to be following the protocol specifications.
From this, it seems most likely that these agents are just not receiving any DNS replies. To the application, it looks like a network outage, so it keeps on retransmitting. By investigating a few individual abusers, we know that they indeed do not receive replies from the root server.
CircleID: According to Radicati Group research firm, more than 2.3 billion spam messages are broadcast daily over the Internet, and this number is expected to rise to 15 billion by 2006. How does spam, particularly at such high rates, affect the root servers -- especially when you take into account millions, if not billions, of spam emails floating around in people's inboxes, many of which contain broken links that cause bad DNS lookups.
Duane Wessels: It's entirely possible that spam emails generate an increased load for the root name servers. However, I don't think that simply sending spam increases load. Rather, it's more likely that anti-spam tools do. I can think of two specific examples:
1. Many anti-spam tools verify "From" addresses and perhaps other fields. If the From address has an invalid hostname, such as "spam.my.domain," the root servers will see more requests, because the top level domain does not exist.
2. Anti-spam tools also make various checks on the IP address of the connecting client -- for example, the various "realtime blackhole lists" and basic in-addr.arpa checks. These may be causing an increase in root server load, not simply because of the amount of spam, but also because these tools silently ignore failures.
CircleID: According to the report, "About 12% of the queries received by the root server on October 4 were for nonexistent top-level domains, such as '.elvis,' '.corp,' and '.localhost.'" Many Internet users, in order to avoid spam, are increasingly providing dummy email addresses whenever they are forced to provide personal information on the web. Are all those 'email@lives.elvis'-type fake email addresses triggering part of the 98% problem?
Duane Wessels: I don't believe so, but I can't be sure.
Many of the fake email addresses that I've seen are of the form wessels.NOSPAM@example.com or wessels@nospam.example.com.
Most of the unknown TLD queries probably come from short hostnames. For example, if I set my hostname to "elvis" (instead of "elvis.example.com"), then the root servers are likely to see queries for the short name "elvis."
CircleID: This is a direct quote from SDSC news release:
"Researchers believe that many bad requests occur because organizations have misconfigured packet filters and firewalls, security mechanisms intended to restrict certain types of network traffic."
How far can current unnecessary root server inquiry rates be reduced, considering that organizations such as ISPs will be required to dedicate added time and financial resources to help in the reduction? Do you foresee new regulations and penalties for organizations that are responsible?
Duane Wessels: Regulations and/or penalties are extremely unlikely. They would be impossible to monitor and enforce.
I am, unfortunately, skeptical that ISPs and other network operators will take the initiative to reduce root server traffic, for three reasons:
1. The system works sufficiently well as-is. Many applications use the DNS, but do not depend on it. Unresolved queries go silently unnoticed.
2. A very small number of sources can cause a significant amount of abuse.
3. It's often difficult to get people to recognize they have a problem, and even harder to get them to fix it.
As is often the case with studies such as this, network administrators are left feeling somewhat helpless. That is why we also wrote a tool for examining the DNS traffic leaving a network. People can download our "dnstop" tool from http://dnstop.measurement-factory.com/.
One of the abusers was misusing packet filters to block incoming, but not outgoing, DNS packets. This prompted us to write a technote for ISC that describes how people should be configuring their authoritative-only name servers. You can find it at http://www.isc.org/tn/.
The root servers are responsible for providing the IP addresses of the name servers for the top-level domains such as .com, .edu, .org. If you want the IP address for slashdot.org, you ask the root nameserver for the IP address of the nameserver responsible for .org, you then ask the .org nameserver for the IP address of slashdot.org.
Mea navis aericumbens anguillis abundat
The root servers are the "invisible" trailing dot in
www.slashdot.org. <- that one at the end
The root DNS servers point to the top-level domains (TLDs) such as the Country Code TLD (ccTLD) and generic TLD (gTLD).
So the root server points to the servers for the 'org' domain (or subdomain), which are now handled by Internet Society and Public Interest Registry that operate several DNS authoritative DNS servers for the ORG domain. These then point to the authortative servers for slashdot.org, and we (or our ISP on our behalf) do yet another DNS request, this time to one of the authoratitive slashdot.org DNS servers, and lookup the IP address of www.slashdot.org or slashdot.org.
To reduce the number of requests, our ISP DNS server will normally cache answers for both the TLDs servers, and specific subdomains, such as slashdot.org and specific hostnames such as www.slashdot.org.
Hello Troll,
On what grounds would they win in court? Seems to me they don't have a contract (express or implied) with the root server operators, and therefore no standing to sue.
You can't just randomly block abusers.
"Just watch me."
Either you have to offer DS services to everyone, or no one, or you have to start charging per lookup.
Not at all. "Management reserves the right to refuse service."
I agree that blocking them is probably too simplistic to be useful, but you're wrong about there being anything legally wrong with it.
DNS cache.
My company firewall is a Linux host-based box with some custom logging apps, squid and tinydns. Making your network "Internet friendly" is easy:
iptables -t nat -I PREROUTING -p udp --dport 53 -j REDIRECT --to-ports 53
directs all your outbound DNS to your cache. Let users, rogue admins, and anyone else try and resolve from particular nameservers, all they'll get is your own cache.
I want to delete my account but Slashdot doesn't allow it.
From this article, we've learned the most important truth of our time - elvis is possibly the most popular hostname on the internet (since some large fraction of 12% of the 98% of the queries to the root server are for the top level domain elvis, probably because of a misconfigured resolver). What could this mean? Elvis was the messiah and we just didn't know it? Are there more machines named elvis than Jesus? Are there more elvis impersonators than jesus impersonators? On the other hand, I wonder how many machines are named Gandalf.
set your dhcp server to assign your company dns server to the clients.
:) they'll be scratching their heads : )
THEN
iptables -I FORWARD -p udp --dport 53 -j DROP
let them try to hit any external dns servers
Lawyers, MBA's, RIAA? A jedi fears not these things!
It's a pun.
There are reasons why democracy does not work nearly as well as capitalism.
-- David D. Friedman
I don't understand why this is news or why it required any level of study.
The root servers handling zone '.' such as F.ROOT-SERVERS.NET put refresh periods of 48 hours on most every query. That means that at most once every 48 hours every name server on the planet should re-ask the root servers where to get answers for each of the gtlds, com, net, org, arpa, etc.
What they should receive the most queries for are domains that don't exist because everything else is cached for such a long period of time. That is the point of the root servers.
If the root servers are having trouble handling the query load then they should be upgraded for goodness sake. These are root servers after all and I think the global internet community could spare a few dollars to add some spare capacity if it is required.
To improve on this, BIND could up the maximum negative RR cache default time to live. Right now I believe it is set to 3 hours and the root servers use a 1 day SOA.MINIMUM instead, so BIND is always lowering it by default.
Of course, other nameservers are different. Some older versions of BIND by default only stored negative RR for 10 minutes.
The world is neither black nor white nor good nor evil, only many shades of CowboyNeal.
Also, the name servers get a surprising number of queries FROM RFC1918 addresses (10.x, 192.168.x, etc.), and while it may be more efficient to use root server CPU (on big fast computers) than router CPU to dispose of these queries, ISPs have ENTIRELY no business accepting IP packets FROM these addresses, and they should be killing them at the incoming edges of their networks, not carrying them and passing them on to other people.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Yes, definitely, set your DHCP servers to tell clients about your company's DNS servers, and do a good job of maintaining your DNS serves so they work well. But sometimes people want to ask other servers what's going on, especially if they're trying to track down detailed authoritative information about a name from the real name servers for that name - or it they're spam hunting.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Most DNS queries get handled out of some kind of cache. While it's definitely important to be able to query your favorite root or alternate-root-like server when you really need to, you don't usually need to. If you ask your local vaguely-correctly-configured server for something, then ask it again before the expiration date, the first time it sees it it'l cache it, so the second time it can get it out of cache (unless the cache entry expired or the cache overflowed.) But if the entry's nonexistent, it's not likely to stick around the cache. So there's a need for a standard way to respond to well-known non-existent names, so the cache has something to keep for popular bogus queries. Obviously "localhost" is "127.0.0.1", and "example.com" can be just about anything not in use but might as well be 127.0.0.1, but it'd be nice if there were some other standard value to use. Maybe 127.0.0.0 or 127.255.255.255 (e.g. yell at yourself :-) ?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Many anti-spam tools verify "From" addresses and perhaps other fields. If the From address has an invalid hostname, such as "spam.my.domain," the root servers will see more requests, because the top level domain does not exist.
DNS lookups on the sender address was common before there was a major spam problem. It makes sense, why would you want to take email from somewhere you cannot reply to? So I don't think you can blame anti-spam tools for this.
Anti-spam tools also make various checks on the IP address of the connecting client -- for example, the various "realtime blackhole lists" and basic in-addr.arpa checks.
in-addr.arpa checks has been a standard practice in networking software, not just email, since it was available. Some FTP servers do it, some web servers do it, your web log analyzer does it, IRC does it. You can't put that one onto anti-spam tools either.
The use of dnsBL lists will, of course, create extra load, when you look up the name servers for the list(s) you are using. But in all likelihood the NS and A records are cached at your local server. You're not hitting the root server with every lookup.
This guy seems full of bull. Note that he is not a LEAD scientist for the root servers, he's a lead scientist for the company that produced the report.
I really think that one of the very nice things happening in anti-spam these days is the increasing use of local, independent processing power rather than centralized network queries (like realtime blacklists).
A growing number of projects are implementing Bayesian filtering techniques for example. I personally love spamprobe, but there are many others. Some, like spamprobe go server side and others are even client-side. They work equally well by filtering spam based examples you train it with. In the 4 months I've been using it, I've achieved 97.6% accuracy. And no DNS queries, no load to any other site but my disk & CPU.
Anyway, the advantage of this sort of filtering is that you do all the decision making locally, and no data flies across the internet. Remember, what we have in abundance is processing power. But network resources should be conserved.
I actively practice encrypted firewall piercing, or, at a minimum, running an external socks server. I can't handle castrated networks. The worst of them don't even allow me to get IMAP traffic. Blech.
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
Yes Scientist?
I know this isn't your responsibility, but mop the rest of this shit up.
Availability checks for domain name registrars never hit the root servers. The registrars connect directly to the SRS (Shared Registry System) and look up records there.
.com, .net, and .org), none of them refresh in real-time, so you could be registering a domain that had actually been taken 6 hours previously.
It would be silly to use the root servers as a basis for availability, especially since the root servers know nothing about individual domains, only TLDs (the root server zone file is less then 50k, iirc) . But even assuming you meant the DNS servers one level down(like the GTLD servers that handle
Thanks,
Matt
me@mzi.to