Lead Scientist Responds to Questions on Root Server Queries
cidtoday writes "A CircleID interview
with the lead scientist whose study recently revealed that 98% of a main root
server queries are unnecessary, reveals that spam has little to do with the
issue. In fact, he provides two reasons why anti-spam tools cause more
unnecessary queries to the root servers than spam emails. Many other questions previously
raised by Slashdot readers on the study are also answered."
Did anyone else read "Lead" as the metal, and not as "the one in charge?
"I'm not impatient. I just hate waiting." - My Dad
--sex
Very popular slashdot journal for adul
don't go to the article all at once, or those questions will continue unanswered!
Here's a link that lists how some spammers attempt to hide their real identities. This isn't necessarily exactly what the root server query guy was talking about, or maybe it is? Either way, it is very enlightening. Some slashdotters even occasionally try to hide a goatse link this way.
--sex
Very popular slashdot journal for adul
It's BB&N... er, GTEI... er, Genuity that's getting pounded. They provide caching DNS servers to the entire Internet at 4.2.2.1 (.2, ...) and because they're so easily memorizable, I've never met a sysadmin who didn't put them in a hosts' configuration in a pinch.
If they can identify and quantify eplicit networks or IP addresses causing the 'abuse', then why don't they send a warning and then block them? They'll fix the problem real quick.....
Feed the need: Digitaladdiction.net
Excellent article on URL obfuscation.
Never, ever lose a file again. Ever.
This sounds interesting but what's a root server query?
for i in a b c d e f g h i j; do ping $i.root-servers.net; done
That really wasn't that hard.
reveals that spam has little to do with the issue. In fact, he provides two reasons why anti-spam tools cause more unnecessary queries to the root servers than spam emails...
So Spam has little to do with extra traffic, but the wealth of tools fighting against spam are adding to the load, right? But then since spam is the reason anti-spam tools exist, it's fair to say spam is the root cause of the problem!
Code, Hardware, stuff like that.
98% of all SUV's are unnessary. Get a real car!
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
Not only do we delete legitimate mail when this anti-spam software gets a false negative, we place load on the fragile root servers.
Please stop using spam assassin, for the good of the Internet.
With all the talk that floats around, about every household electronic appliance having its own IP. And this also leading to companies adding everything as some kind of named host within in a home network i.e yourhomeaddress.personal.ps2.sony or yourhomeaddress.personal.microwave.bosh. What can root servers actually handle. I'd hate to see someone bring down a root server with a microwave oven, well without actually putting it in one :)
--+> Life, is there any?
We have enough geeks and articles about geeks who tinker with things to optimize them even though they work just fine the way they are.
The root server engineers are busy explaining why not to tinker with things that are clearly and inherently broken.
Don't complain about useless queries -- FIX THE SYSTEM.
Background: 28/M/Bi-Sexual; Owner of a Linux company; MBA Harvard 2003; B.S. Comp Sci MIT 2000
Fine. Firewall those IPs from using the root servers.
List, please? Hey Bush, forget about Iraq, let's take these bastards out. [grabs ak-47]
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
And they'd take you to court so quick, and they'd win and rightfully so. You can't just randomly block abusers. If so, then get the hell off the internet. That's like me standing along side the highways and then getting pissed as drivers that look at me the wrong way. Either you have to offer DS services to everyone, or no one, or you have to start charging per lookup. Learn a little bit about how the intenet works, jerk off, before making overly simplistic comments about complex problems.
Internet's Main Root Server Saturated By 98%: Should You Be Concerned?
February 26, 2003
By CircleID | Add+Read Comments | Email Article
A recent study by researchers at the Cooperative Association for Internet Data Analysis (CAIDA) at the San Diego Super Computer Center (SDSC) revealed that a staggering 98% of the global Internet queries to one of the main root servers, at the heart of the Internet, were unnecessary. This analysis was conducted on data collected October 4, 2002 from the 'F' root server located in Palo Alto, California.
The findings of the study were originally presented to the North American Network Operators' Group (NANOG) on October 2002 and later discussed with Richard A. Clarke, chairman of the President's Critical Infrastructure Protection Board and Special Advisor to the U.S. President for Cyber Space Security.
In this special CircleID interview with Duane Wessels, president of The Measurement Factory and one of the main scientists who lead the root server study, we attempt to gain a better sense of what has been discovered? What can be done about it? And, how? But most importantly, why? After all, from an end-user's perspective, the Internet appears to be working just fine! Should a businesses that fully or partially depends on the Internet be concerned? Read on...
CircleID: Mr. Wessels, could you give us a bit of background about yourself and tell us what initiated the study?
Duane Wessels: I started doing Internet research in 1994. From 1996 to 2000 I worked for the National Laboratory for Applied Network Research (NLANR)/UCSD on a web caching project, including Squid, funded by the National Science Foundation. These days I am president of The Measurement Factory, where we develop tools for testing performance and compliance.
For this study I joined up with my old friends at CAIDA. Funding for this work came from WIDE in response to questions from ICANN's Root Server System Advisory Committee (RSSAC).
CircleID: Could you give us a brief background on the significance of your findings in this study, particularly the unique discoveries that were not already known to the technical and scientific community?
Duane Wessels: Certain facts about root server traffic have been known for a long time. Earlier studies identified certain problems, and some root server operators publish traffic statistics (number of queries, etc). What is unique about our study is that we developed a simple model of the DNS and used that model to categorize each and every query. This allowed us to say, for example, "this query is valid, because we haven't heard from this client before, but this other query is invalid, because the same client sent the same query a short time ago."
We also took a much longer trace than earlier studies and spent more time looking at individual abusers.
CircleID: Why the F root server? Is there a particular reason why this root server, located in Palo Alto, California, was selected for the study rather than the other 12 servers?
Duane Wessels: Paul Vixie and the Internet Software Consortium were kind enough to give us access to the query stream. ISC has the infrastructure in place to make this happen easily, and without any chance of disrupting the operation of the server. We are currently working with other operators to get data from additional sites.
CircleID: The report on the study indicates "a detailed analysis of 152 million messages received on Oct4, 2002." In other words, the final results are based on only one set of data collected within 24 hours. What about comparison to other dates? Why are you confident that findings from this particular day, October 4, 2002, is a sufficient indication of what is happening today -- or tomorrow, for that matter?
Duane Wessels: We have no reason to believe that October 4, 2002 is special. It just happens to be the first day that we successfully collected a 24-hour trace. We took shorter traces before and after this date, and they have similar characteristics. For example, our talk and paperPDF mention a particularly large abuser (the Name Registration Company). While writing the paper, we were curious to see whether they had cleaned up their act yet. Indeed, they had not. They were still abusing the F root server months after we had notified them about the problem.
CircleID: Why should end-users be concerned about the findings, given that their Internet browsing experience does not appear to be affected in any noticeable way?
Duane Wessels: It's likely that most end-users are not impacted by root server abusers, for several reasons. One is that most users are going through properly functioning name servers, and their queries rarely reach a root name server. Another is that the root servers are overprovisioned in order to handle the load -- root DNS servers are typically multiple boxes placed behind load balancers, and some are even geographically distributed.
CircleID: What about companies that are running part or all of their business on the web? How are they being affected by this very high -- unnecessarily high -- root server inquiry rate?
Duane Wessels: Again, I would bet that most of them are properly configured and not severely impacted by root server abuse. Our results showed that 50% of the root server traffic comes from only 220 IP addresses. It's possible that some of these 220 addresses are experiencing a negative side-effect, but I believe that most of these problems go unnoticed. For example, some web servers are configured to look up IP addresses in the in-addr.arpa domain so they can log a hostname instead of an address. But if the lookup fails (as in-addr.arpa queries often do), nobody really notices. The web server logs the address anyway after a timeout.
CircleID: Moving on to possible causes -- at this time, what do you think are the main reasons for such a high (98%) inquiry rate? Is it possible to identify them?
Duane Wessels: The short answer is that we suspect firewalls and packet filters.
When we initially started the study, our assumption was that there must be some broken software out there causing all the root server traffic. Aside from an old bug with Microsoft's resolver [a system to locate records that would answer a query], we didn't really find any implementation-specific problems.
Approximately 75% of the root server's queries were duplicates. Furthermore, we noticed that most of the repeats occurred at sensible intervals. That is, the agents making queries seemed to be following the protocol specifications.
From this, it seems most likely that these agents are just not receiving any DNS replies. To the application, it looks like a network outage, so it keeps on retransmitting. By investigating a few individual abusers, we know that they indeed do not receive replies from the root server.
CircleID: According to Radicati Group research firm, more than 2.3 billion spam messages are broadcast daily over the Internet, and this number is expected to rise to 15 billion by 2006. How does spam, particularly at such high rates, affect the root servers -- especially when you take into account millions, if not billions, of spam emails floating around in people's inboxes, many of which contain broken links that cause bad DNS lookups.
Duane Wessels: It's entirely possible that spam emails generate an increased load for the root name servers. However, I don't think that simply sending spam increases load. Rather, it's more likely that anti-spam tools do. I can think of two specific examples:
1. Many anti-spam tools verify "From" addresses and perhaps other fields. If the From address has an invalid hostname, such as "spam.my.domain," the root servers will see more requests, because the top level domain does not exist.
2. Anti-spam tools also make various checks on the IP address of the connecting client -- for example, the various "realtime blackhole lists" and basic in-addr.arpa checks. These may be causing an increase in root server load, not simply because of the amount of spam, but also because these tools silently ignore failures.
CircleID: According to the report, "About 12% of the queries received by the root server on October 4 were for nonexistent top-level domains, such as '.elvis,' '.corp,' and '.localhost.'" Many Internet users, in order to avoid spam, are increasingly providing dummy email addresses whenever they are forced to provide personal information on the web. Are all those 'email@lives.elvis'-type fake email addresses triggering part of the 98% problem?
Duane Wessels: I don't believe so, but I can't be sure.
Many of the fake email addresses that I've seen are of the form wessels.NOSPAM@example.com or wessels@nospam.example.com.
Most of the unknown TLD queries probably come from short hostnames. For example, if I set my hostname to "elvis" (instead of "elvis.example.com"), then the root servers are likely to see queries for the short name "elvis."
CircleID: This is a direct quote from SDSC news release:
"Researchers believe that many bad requests occur because organizations have misconfigured packet filters and firewalls, security mechanisms intended to restrict certain types of network traffic."
How far can current unnecessary root server inquiry rates be reduced, considering that organizations such as ISPs will be required to dedicate added time and financial resources to help in the reduction? Do you foresee new regulations and penalties for organizations that are responsible?
Duane Wessels: Regulations and/or penalties are extremely unlikely. They would be impossible to monitor and enforce.
I am, unfortunately, skeptical that ISPs and other network operators will take the initiative to reduce root server traffic, for three reasons:
1. The system works sufficiently well as-is. Many applications use the DNS, but do not depend on it. Unresolved queries go silently unnoticed.
2. A very small number of sources can cause a significant amount of abuse.
3. It's often difficult to get people to recognize they have a problem, and even harder to get them to fix it.
As is often the case with studies such as this, network administrators are left feeling somewhat helpless. That is why we also wrote a tool for examining the DNS traffic leaving a network. People can download our "dnstop" tool from http://dnstop.measurement-factory.com/.
One of the abusers was misusing packet filters to block incoming, but not outgoing, DNS packets. This prompted us to write a technote for ISC that describes how people should be configuring their authoritative-only name servers. You can find it at http://www.isc.org/tn/.
WOW, im glad I didn't click that link after you asked those questions!
I'm in the library with the screen facing a whole damn study wing!
That would have been emabrassing.
Thats also why I'm afraid to see what this goatse stuff is all about.
yeah, im thinking that too..
is sexygal one of those fat balding guys that mascarade (sp?) slashdot as a chick? and somehow live out a fantasy online and maybe try to pick up some young boys?
kinda like those fruit cakes you hear about on AOL.
icky icky icky
That's really getting old.
Every one of your excellent questions can be explained with "she's Japanese." They're a bunch a sexual scat freaks over there. I suspect that uberdave is Japanese.
hey!!
i was wondering where that michael hating guy went! i remember he likes making that ascii art that usually fucks up really bad.
She's just had an enema. That's probably water mixed with the contents of her lower intestine there, hence the yellow (=very dilute brown) stuff.
As for the rest of it, ya got me.
nice effort but...
YOU FAIL IT!!!!!
(and screw the lameness filter, im only shouting a little bit)
Hello Troll,
On what grounds would they win in court? Seems to me they don't have a contract (express or implied) with the root server operators, and therefore no standing to sue.
You can't just randomly block abusers.
"Just watch me."
Either you have to offer DS services to everyone, or no one, or you have to start charging per lookup.
Not at all. "Management reserves the right to refuse service."
I agree that blocking them is probably too simplistic to be useful, but you're wrong about there being anything legally wrong with it.
not to troll or flamebait or offtopic more...
but that was actually funny, especially on a day where nothing seems to go right.
as much as people hate it, i applaude you AC's for adding a little humor to my depressing existance.
$$$$exygal is totally a fucking male pervert.
i think we can all tell that, and if your gaydar didn't go directally to defcon 3, your a queer.
He's not - he's Caucasian. But his breath smells as bad as if he were Japanese. Probably worse!
-n
http://www.remix.net/
DNS cache.
My company firewall is a Linux host-based box with some custom logging apps, squid and tinydns. Making your network "Internet friendly" is easy:
iptables -t nat -I PREROUTING -p udp --dport 53 -j REDIRECT --to-ports 53
directs all your outbound DNS to your cache. Let users, rogue admins, and anyone else try and resolve from particular nameservers, all they'll get is your own cache.
I want to delete my account but Slashdot doesn't allow it.
Hey man, why are you so down?
Most people secretly love it.
gay people on "canal street" is funny even with the C left in place!
Interesting! Have you ever had an enema? And if so, what did it feel like?
Really I'm not trolling. I WANT TO KNOW!
http://windows.scares.us
Hah! I thought I was the only one. I've probably spread that to 3 or 4 other admins too. It's easy to remember to set up on a box for testing, and it's always live so it's a good ping test.
Funny....
From this article, we've learned the most important truth of our time - elvis is possibly the most popular hostname on the internet (since some large fraction of 12% of the 98% of the queries to the root server are for the top level domain elvis, probably because of a misconfigured resolver). What could this mean? Elvis was the messiah and we just didn't know it? Are there more machines named elvis than Jesus? Are there more elvis impersonators than jesus impersonators? On the other hand, I wonder how many machines are named Gandalf.
set your dhcp server to assign your company dns server to the clients.
:) they'll be scratching their heads : )
THEN
iptables -I FORWARD -p udp --dport 53 -j DROP
let them try to hit any external dns servers
Lawyers, MBA's, RIAA? A jedi fears not these things!
i thought i was the only one who used 4.2.2.1 and 4.2.2.2
easiest ip's in the world to remember, great ping times. I have them set as the secondary and tertiary dns servers for my company network.
Lawyers, MBA's, RIAA? A jedi fears not these things!
Was the original post ontopic? -1 for the spelling alone. Lets here it for the Mods. Yay! They all like a bit of Ladyboy action.
Im sure he is right about the 98%. One of my ISP's DNS server went down so much I just left it permanently pointed at the nearest root server. Hey, the mail must go through!
Manipulate the moderator system! Mod someone as "overrated" today.
How is is this offtopic but the parent funny?
I don't understand why this is news or why it required any level of study.
The root servers handling zone '.' such as F.ROOT-SERVERS.NET put refresh periods of 48 hours on most every query. That means that at most once every 48 hours every name server on the planet should re-ask the root servers where to get answers for each of the gtlds, com, net, org, arpa, etc.
What they should receive the most queries for are domains that don't exist because everything else is cached for such a long period of time. That is the point of the root servers.
If the root servers are having trouble handling the query load then they should be upgraded for goodness sake. These are root servers after all and I think the global internet community could spare a few dollars to add some spare capacity if it is required.
To improve on this, BIND could up the maximum negative RR cache default time to live. Right now I believe it is set to 3 hours and the root servers use a 1 day SOA.MINIMUM instead, so BIND is always lowering it by default.
Of course, other nameservers are different. Some older versions of BIND by default only stored negative RR for 10 minutes.
The world is neither black nor white nor good nor evil, only many shades of CowboyNeal.
When you fuck up the DNS server, and I try to use someone else's server, I'm sure you'll come up with some dumb excuse about why it isn't your fault.
Also, the name servers get a surprising number of queries FROM RFC1918 addresses (10.x, 192.168.x, etc.), and while it may be more efficient to use root server CPU (on big fast computers) than router CPU to dispose of these queries, ISPs have ENTIRELY no business accepting IP packets FROM these addresses, and they should be killing them at the incoming edges of their networks, not carrying them and passing them on to other people.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Yes, definitely, set your DHCP servers to tell clients about your company's DNS servers, and do a good job of maintaining your DNS serves so they work well. But sometimes people want to ask other servers what's going on, especially if they're trying to track down detailed authoritative information about a name from the real name servers for that name - or it they're spam hunting.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Simple minded me. I thought they probably had a darn good reason for putting in that blackhole thing into BIND.
Guess *I* was completely fooled. While blocking may exclude some useful traffic, on the whole it generally avoids a host (maybe a bit of a pun intended) of problems.
If there wasn't spam, we wouldn't need anti-spam utilities...so wouln't you say that the excesive queries are, IN FACT, caused by SPAM?
Goals are deceptive - the unaimed arrow never misses.
Most DNS queries get handled out of some kind of cache. While it's definitely important to be able to query your favorite root or alternate-root-like server when you really need to, you don't usually need to. If you ask your local vaguely-correctly-configured server for something, then ask it again before the expiration date, the first time it sees it it'l cache it, so the second time it can get it out of cache (unless the cache entry expired or the cache overflowed.) But if the entry's nonexistent, it's not likely to stick around the cache. So there's a need for a standard way to respond to well-known non-existent names, so the cache has something to keep for popular bogus queries. Obviously "localhost" is "127.0.0.1", and "example.com" can be just about anything not in use but might as well be 127.0.0.1, but it'd be nice if there were some other standard value to use. Maybe 127.0.0.0 or 127.255.255.255 (e.g. yell at yourself :-) ?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Many anti-spam tools verify "From" addresses and perhaps other fields. If the From address has an invalid hostname, such as "spam.my.domain," the root servers will see more requests, because the top level domain does not exist.
DNS lookups on the sender address was common before there was a major spam problem. It makes sense, why would you want to take email from somewhere you cannot reply to? So I don't think you can blame anti-spam tools for this.
Anti-spam tools also make various checks on the IP address of the connecting client -- for example, the various "realtime blackhole lists" and basic in-addr.arpa checks.
in-addr.arpa checks has been a standard practice in networking software, not just email, since it was available. Some FTP servers do it, some web servers do it, your web log analyzer does it, IRC does it. You can't put that one onto anti-spam tools either.
The use of dnsBL lists will, of course, create extra load, when you look up the name servers for the list(s) you are using. But in all likelihood the NS and A records are cached at your local server. You're not hitting the root server with every lookup.
This guy seems full of bull. Note that he is not a LEAD scientist for the root servers, he's a lead scientist for the company that produced the report.
I seriously doubt extranneous DNS queries rate in the top 10, or hell, even top 100, of culprits of network inefficiency. The fact that it only takes 13 of these servers to keep the entire internet afloat should be a testament to the efficiency of the protocol.
so obviously it is critical to totally reform the DNS implementation as it exists today. maybe if we free up some traffic, we can look towards more important things... like defending the right for some little prick to be KaZaaing half of the music released in the last 15 years across 2 oceans with it ending up in some 3rd world chinese province where it is pressed into 2 gazillion cds and sold to some guy who has never paid more than 5 cents for something in his whole damn life. geez, I gotta get off this site
I really think that one of the very nice things happening in anti-spam these days is the increasing use of local, independent processing power rather than centralized network queries (like realtime blacklists).
A growing number of projects are implementing Bayesian filtering techniques for example. I personally love spamprobe, but there are many others. Some, like spamprobe go server side and others are even client-side. They work equally well by filtering spam based examples you train it with. In the 4 months I've been using it, I've achieved 97.6% accuracy. And no DNS queries, no load to any other site but my disk & CPU.
Anyway, the advantage of this sort of filtering is that you do all the decision making locally, and no data flies across the internet. Remember, what we have in abundance is processing power. But network resources should be conserved.
I actively practice encrypted firewall piercing, or, at a minimum, running an external socks server. I can't handle castrated networks. The worst of them don't even allow me to get IMAP traffic. Blech.
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
Things could actually be operating to spec (except for the few abusing the root servers to do dictionary searches etc).
I see the RFC suggest minimum values of 2 to 5 seconds for retransmissions. What values do implementations pick?
In many situations the round trip time between the querying host and the root server could be more than the retransmission timeout, that's why the root server gets more than one request.
In other cases there could be packet loss.
And if the reply takes too long (delays etc), firewalls could timeout the stateful filtering rules for the returning DNS reply, requiring yet another query.
It may be that some DNS implementations go to the root servers more often. Does djbdns's dnscache do that?
for i in a b c d e f g h i j; do ping $i.root-servers.net; done
WTF does that accomplish?? You just ping Verisign's server until you Ctrl-C it, and then it pings the next server. Why not at least 'ping -f' it?? Do something useful at least....
The default behaviour for ping varies from one OS to another.
The Linux one (on the distros I've used, anyway) sends packets until you kill it, printing out the round trip time and TTL for each packet that comes back. It prints nothing for lost packets, except in the summary at the end. So yes, if you run the command on Linux, you've asked for 10 infinite loops.
The Solaris ping, though, sends just one packet, and prints "host is alive" or "no answer from host".
The Windows one (shudder) sends four packets and prints the round trip and TTL for all of them.
Having said all of that, I'm as puzzled as you are about what the grandparent post is trying to accomplish ;-)
Just another wannabe fantasy novelist...
Think about it. How many new domain registration sites have popped up over the last year or two? For 7.95, you can have your own domain.
What does this lead to? Millions of people doing searches on Go-Daddy, Verisign, etc for their vanity domain name.....
And then, there is the spam email about owning your own domain, and spam about increasing traffic to your site, and spam about blocking spam to your site, etc....
I really hope my tax dollars did not pay this guy. Traffic on the root name servers is way down on my priority list, right under voluntary castration.....
All I can think of while reading this is Chekov saying, "Nuclear wessels.. we're looking for the nuclear wessels!"
As if anti-spam tools would exist without spam. Spam is the cause, period.
If you think about it, creating a lot of unneeded DNS queries does the internet a favor. When everyone wastes resources, that means that the systems are designed to handle so much traffic that it will be extremely difficult to initiate a DOS attack. Your thousand boxes will simply drown in the noise from the rest. At least that's a theory :)
Stop the brainwash
I had a firewall once configured to use my ISP's name servers. It would boot up and ask for its host name, but would not drop the DNS replies as they came in. Since the internal connections were properly NATted, there were no ill effects to my programs, from inside the firewall...
As a result, I was getting *thousands* of replies that were being dropped every day. Funny-- seems the exact scenario described in the article.
LedgerSMB: Open source Accounting/ERP
Primary the root zone for yourself. Then you don't care if the legacy root servers all get unplugged, your dns will still work just fine. This is a recording... this is a recording... this is a recording...
Need Mercedes parts ?
Gee, and I thought they sounded like pretty useful instructions:
1. Learn a little bit about how the Internet works,
2. Jerk off, before...
3. Making overly simplistic comments about complex problems.
All of them sound like fun to me!
and of course...
5. Profit!