Map the Internet... In One Day?
rjbrown99 writes "There have been numerous stories over the past few years on Bill Cheswick's Internet Mapping Project. The Lumeta folks even created a company out of it. Well, now there is a competitor. A single guy with a single computer is working to accomplish the same feat - within ONE DAY and using open-source tools to do it. The new project is called Opte and can be found at www.opte.org." He's made some progress and is looking for volunteers.
Who
/24.)
This project was started by me (Barrett Lyon) as a response to a conversation with my colleagues at Network Presence. Over a lunch we were discussing William Cheswick and Hal Burch's Internet Mapping Project. I was not very impressed with the results of their project, they produce beautiful maps but they don't seem to be very useful nor do they release their code freely. Their mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day." The comment was met with some hostility. Thus, this project was born.
What
The goal of this project is to use a single computer and single Internet connection to map the location of every single class C network on the Internet. It is obvious that the Internet is not routed as a bunch of class-c networks, but it is easy to see that by treating the Internet IP space as a bunch of class C networks, it will be possible to make a detailed map of the entire Internet. The global Internet address space currently offers 32 bits worth of unique host addresses, or a theoretical maximum of 2^32=4,294,967,296 hosts. In reality, the address space has been allocated in fairly large contiguous blocks, which renders strictly optimal utilization difficult. The smallest block that is logically routed via BGP or allocated by ARIN is a class C network (CIDR
At the rate of 194 traceroutes per-second it is possible to scan the entire theoretical 2^24 space within a single day. Thus about 16,777,216 class C networks could be processed by a single computer in a single day. Yet, there are huge portions of network blocks that are no longer used, many network blocks fall into the RFC 1918 standard and other blocks that are reserved by ARIN.
According to ARIN there are about 47 class A networks in the reserved status (search ARIN for OrgName "Internet Assigned Numbers Authority".) Doing the math results in a reduction of 3,080,192 class C blocks to be removed from the scan list, leaving us with a theoretical list of 13,697,024 blocks.
Applying some additional thought large portions of the 13.7 Million blocks may route to the same place. By testing about 20 routes at random within a class B and comparing the results, it is possible to see if there are multiple routes worth investigating or if the entire thing goes to the same place. By applying that logic it increases the speed of the scanning.
After some testing and beta code I proved that with enough bandwidth it is possible to scan the entier Internet with a single computer. The 1/5th of the Internet map only took about 2 hours to create, yet it generated nearly 200k/sec of traffic and put my machine at a load of 60+ while scanning. If you apply the math, the entire internet would take about 10 hours to scan and another hour or two for the visual map output.
I found a lot of value in the project, so after the proof of concept was completed I continued to program. I turned the entire system into a distributed client/server model. The clients request a chunk of random IP space from the server and when it is completed the IP space is registered with the server. This is done until all of the IP space has been scanned. I'm also working on a stats system so I can monitor the productivity of the different scanning nodes and users involved in the project.
By taking a more distributed approach the data will look more like the real Internet. It will show more of the backup routes, more of the smaller links in different countries, etc. When the first version of the code is done I should have about 5 to 10 different scanning nodes running on the Internet. If you would like to donate a computer and some bandwdith to this project, please contact me. I can give credit where credit is due!
When
The first scanning tests began in late October 2003 and I wish to have the project generate a new map every week.
Where
Currently the project is hosted in San Francisco on a multi-homed fiber ba
Several maps of the internet right here
Life is the leading cause of death in America.
I am in serious need of more bandwidth and hardware power. If anyone has a Co-Located system on a nice network to donate to this project for a few months, I would be very happy!
Slashdotting was never easier!
Go past the burnt-out Cray and then right at the Commodore64 Contiki server - you'll see my drive lights.
...his web server is already unavailable within minutes of it being posted on Slashdot...
Mapping...Slashdot.org......
IP Address: 127.0.0.1
Computer: The one from Microsoft with the Start button in the bottom left hand corner.
Location: my bedroom.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
SCO IPs are in the Mordor address space.
"You can't make money with computers anymore because some jackass is always trying to give away the same thing you're doing."
Don't feel too bad, the government here (USA) is on your side mainly. I would disagree with you as there is always good money to made here but you have to be creative. The idea is to push each other further to create new ideas and technologies where you can make money.
"If you are a dreamer, a wisher, a liar, A hope-er, a pray-er, a magic bean buyer
Okay, yes, I fully admit that it's cool to map the internet in one day. Regardless...I think I hear about some internet every other day.
There's John Quarterman who's been doing it for years, and then the CAIDA visualization tools, and Cybergeography and the Internet weather report and damn maps and more maps.
Note to everyone: please stop mapping the internet.
Well, there's one less server to map...
Tired of being "punished" by the Slashdot $rtbl since 2002. I'm now over at http://soylentnews.org/ .
Mapping the Internet weekly will allow us to see major disasters in different parts of the world. The Internet is a huge disaster censor. If I had maps of pre-war Iraq and then compared them to today, one could see how badly Iraq was destoryed. The idea of a metaphysical representation of the real world is very interesting to me.
The project can show the Internet growth.
The project is art.
"Time is long and life is short, so begin to live while you still can." -EV
Top that!
I think he means that the program will take less then one day to completely map the internet. Not less then one day to write/compile/run.
Not everything is analogous to cars. Car analogies rarely work.
Forgive me if I'm wrong, but if we need the internet to tell us when a major disaster or war happens in a certain part of the world something is wrong.
A real word map could have many uses. First, it neat to see and learn from to see the real structure of this inter-network of computers. Secondly, graph theorists could use it for research etc as this is a real (as opposed to theoretical) graph so it has real uses. From this graph theory, we could think of new ways to enhance the internet to make it more reliable, faster and more secure. Many things can come from looking at what we have put together and then using our analytic skills to hypothesize about it. I'm sure I'm missing 100 other reasons why this is good.
"If you are a dreamer, a wisher, a liar, A hope-er, a pray-er, a magic bean buyer
the Internet came to him! And he was no more.
When I first saw the image on the right it looked like a human brain. It would be creepy if the Internet had a sort of fractal self-similarity to our physiology.
Now map the people mapping the internet in one day.
If this article confuses you, don't worry. It was posted yesterday in a much clearer fashion.
A single guy with a single computer...
He's mapping the Internet. Why am I not surprised he's single?
Why can't somebody just rsync the Google search cluster? Wouldn't it have the same results this guy is looking for?
This is a test. This is a test of the emergency sig system. This has been only a test.
That's the whole point. Existing methods take months while he claims it can be done in a single day with a single computer.
Please explain how one pings a web page. Is this a feature of AOL?
Web pages are NOT internet hosts.
Web servers are relatively few compared with other types of hosts on the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
As a side comment, now I understand why my connection got so slow.
[Internet Mapping Project's] mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day."
The Internet Mapping Project maps the Internet in under two hours (105 minutes for this morning's run). I'm not certain where the six months came from. The rate limitation is the packet rate limit we set (500 packet per second).
Map layout time is not included in that time, but that is not done on a daily basis. A map layout take about six hours, as I recall. It only took a couple weeks to produce all the layouts necessary for a movie of the Internet from Aug 1998 to Jan 2001 based on the daily runs.
CAIDA also creates daily maps of the Internet as part of their Skitter project. Their schedule varies between measurement points. In addition, other projects, such as the Mercator project and the RocketFuel projects, also map or did map the Internet.
Each project has slightly different goals. Skitter focuses on paths to major web and DNS servers. Mercator attempted to discover networks with limited pre-knowledge. RocketFuel wants a very accurate map of a particular ISP. The Internet Mapping Project is focused on the router connectivity within and between public backbones.
This is nothing new, you can find free software to solve just about any problem. People buy commercial software because in some cases free versions aren't advanced enough or easy enough to use or they want to buy support.
"Thanks to the remote control I have the attention span of a gerbil."
Then I came to my senses and decided to work on more practical and less controversial projects such as Nmap Version Detection. But the subversive in me still hasn't given up entirely on Nmapster :).
-Fyodor
You want to map the internet?
1 Setup a site saying you want to map the internet.
2 Get posted on slashdot.
3 Parse the referer logs.
4 ???
5 Profit!
You kids are so spoiled today. Back in the 60s we used to be able to map the entire internet using nothing more than a piece of string and 2 pushpins.
Huh? 2 nodes? Why the hell should that matter?
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
Maybe people already take this into consideration, but won't this impact webhosting? Won't people try to get their webpage/company closer to the main trunk / center of map? When you look for a hosting service (basically an IP address) right now most people don't consider where in the map the host is.
I mean with this tool, I would look up where my new IP would land me and try to find a host closer to the main backbones. Is this already done now by most people?
(on another subject the maps remind me of the species origin stuff)
Has anyone noticed that nearly all of the maps have a more or less tree-shaped structure?
This means concentration of power. So, the real, failure-tolerant internet is gone, at least it seems to be.
So if we assume the electrical signals of his packets travel at the speed of light (186 000 miles per second) across the internet (which they don't really, but we'll ignore that for this argument), then logic tells us that the internet must have less than 16,070,400,000 miles of cable in order for this to work. Because his data cannot travel any faster along the pipes.
And that's only one way... Assuming query and response, his packets have to effectively travel double the existing cable lengths.
So do all the (public) networks in all the world total less than 16 trillion miles of cable?
No unauthorized use. Trespassers will be shot. Survivors will be shot again.
You don't need a map of the Internet... just watch for your spam to drop by about 90% and you know there's something wrong with the Chinese internet.
Don't blame me; I'm never given mod points.
I have followed various projects related to mapping cyberspace through the years and have always found An Atlas of Cycerspaces to be fascinating.
Mapping by Lumeta is one such methodology and I even have a poster of theirs printed by Peacock Maps (server down just now) in my office.
I have noticed that these mappings take a long time to complete and being able to map in a short time frame could be beneficial in much the same way that Internet Traffic Report can be to visualize traffic patterns or disruptions.
Taco
No. They are limited by the speed of IP, which is not only slower, but its speed is random within a fairly large range. So to be safe, we have to asume the total cabling on the internet is (think, think, think) less than 3 meters.
Why not map Autonomous Systems instead? Routes to AS are being advertised by BGP, and a set of well placed looking glasses would be all it takes to get a big picture. I never saw anything like an AS mapping, with the ASes as nodes and the (BGP announced) routes between them as links.
Of course, some AS span multiple geographical areas, but this is also true of class C networks.
The big advantage of mapping ASes is, that there are not so many of them, compared to class C nets, thus resulting in much simpler graphs. Moreover, the graphs would nicely show the boundaries between institutions/organizations, rather than artificial boundaries based on numerical addresses.
cpghost at Cordula's Web.
I see his trick already. Post on /. that you plan to map the entire net and then wait till the entire net maps its way to you.
P.S.
Is there such a thing as trecart ?
Maybe you live in interesting times
Notice that he maps the paths from his computer to the rest of the world. That is not the same as a map of the entire Internet.
To illustrate, if I map routes from, say Chicago, I'm likely to miss the direct connection between Seattle and San Francisco, as there is no traffic I could generate that would take that path.
Of course, this could simply be a matter of traffick using the fastest route available. If there's an information superhighway and an information dirt path, then as long as the superhighway stays up, it's going to be used.
In other words, the low-level interconnects probably wouldn't show up in a scan like this, because the backbone nodes are faster. That doesn't mean they aren't there, just that data prefers the faster routes as long as they are available. There could be a million paths that don't include the backbone nodes, but traceroute only shows one (fastest) path per trace, and thus they would never show up as long as the backbone stays up. To interpret this to mean they don't exist is analogous to taking the same route to work each day and saying there's no other possible route, since you've never used any other route. But as soon as there's an accident that causes that main road to become useless, traffick will simply use alternative paths, slowing it down but not stopping it entirely.
To properly map the Internet, you would need millions of volunteer nodes, making traceroutes near and far. You can _not_ map the Net from a single point of view, because that's excatly what you get: a single viewpoint, which might show some detail nearby, but only the major traffick points at the far side of the Net. To get truly accurate results, you'd need to run this program from every single one of the class C networks, and then combine the results.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.