Map the Internet... In One Day?
rjbrown99 writes "There have been numerous stories over the past few years on Bill Cheswick's Internet Mapping Project. The Lumeta folks even created a company out of it. Well, now there is a competitor. A single guy with a single computer is working to accomplish the same feat - within ONE DAY and using open-source tools to do it. The new project is called Opte and can be found at www.opte.org." He's made some progress and is looking for volunteers.
Who
/24.)
This project was started by me (Barrett Lyon) as a response to a conversation with my colleagues at Network Presence. Over a lunch we were discussing William Cheswick and Hal Burch's Internet Mapping Project. I was not very impressed with the results of their project, they produce beautiful maps but they don't seem to be very useful nor do they release their code freely. Their mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day." The comment was met with some hostility. Thus, this project was born.
What
The goal of this project is to use a single computer and single Internet connection to map the location of every single class C network on the Internet. It is obvious that the Internet is not routed as a bunch of class-c networks, but it is easy to see that by treating the Internet IP space as a bunch of class C networks, it will be possible to make a detailed map of the entire Internet. The global Internet address space currently offers 32 bits worth of unique host addresses, or a theoretical maximum of 2^32=4,294,967,296 hosts. In reality, the address space has been allocated in fairly large contiguous blocks, which renders strictly optimal utilization difficult. The smallest block that is logically routed via BGP or allocated by ARIN is a class C network (CIDR
At the rate of 194 traceroutes per-second it is possible to scan the entire theoretical 2^24 space within a single day. Thus about 16,777,216 class C networks could be processed by a single computer in a single day. Yet, there are huge portions of network blocks that are no longer used, many network blocks fall into the RFC 1918 standard and other blocks that are reserved by ARIN.
According to ARIN there are about 47 class A networks in the reserved status (search ARIN for OrgName "Internet Assigned Numbers Authority".) Doing the math results in a reduction of 3,080,192 class C blocks to be removed from the scan list, leaving us with a theoretical list of 13,697,024 blocks.
Applying some additional thought large portions of the 13.7 Million blocks may route to the same place. By testing about 20 routes at random within a class B and comparing the results, it is possible to see if there are multiple routes worth investigating or if the entire thing goes to the same place. By applying that logic it increases the speed of the scanning.
After some testing and beta code I proved that with enough bandwidth it is possible to scan the entier Internet with a single computer. The 1/5th of the Internet map only took about 2 hours to create, yet it generated nearly 200k/sec of traffic and put my machine at a load of 60+ while scanning. If you apply the math, the entire internet would take about 10 hours to scan and another hour or two for the visual map output.
I found a lot of value in the project, so after the proof of concept was completed I continued to program. I turned the entire system into a distributed client/server model. The clients request a chunk of random IP space from the server and when it is completed the IP space is registered with the server. This is done until all of the IP space has been scanned. I'm also working on a stats system so I can monitor the productivity of the different scanning nodes and users involved in the project.
By taking a more distributed approach the data will look more like the real Internet. It will show more of the backup routes, more of the smaller links in different countries, etc. When the first version of the code is done I should have about 5 to 10 different scanning nodes running on the Internet. If you would like to donate a computer and some bandwdith to this project, please contact me. I can give credit where credit is due!
When
The first scanning tests began in late October 2003 and I wish to have the project generate a new map every week.
Where
Currently the project is hosted in San Francisco on a multi-homed fiber ba
Several maps of the internet right here
Life is the leading cause of death in America.
I am in serious need of more bandwidth and hardware power. If anyone has a Co-Located system on a nice network to donate to this project for a few months, I would be very happy!
Slashdotting was never easier!
Go past the burnt-out Cray and then right at the Commodore64 Contiki server - you'll see my drive lights.
So he's made progress and needs volunteers, so, uh, forgive me if I sound stupid, but, uh, its been more than ONE DAY!!
This is a test. This is a test of the emergency sig system. This has been only a test.
...his web server is already unavailable within minutes of it being posted on Slashdot...
Mapping...Slashdot.org......
Exactly why do we need a "map" of the Internet?
Life in Orange County
IP Address: 127.0.0.1
Computer: The one from Microsoft with the Start button in the bottom left hand corner.
Location: my bedroom.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
SCO IPs are in the Mordor address space.
But with the always evolving nature of the Internet, this would need to be updated every day.
"You can't make money with computers anymore because some jackass is always trying to give away the same thing you're doing."
Don't feel too bad, the government here (USA) is on your side mainly. I would disagree with you as there is always good money to made here but you have to be creative. The idea is to push each other further to create new ideas and technologies where you can make money.
"If you are a dreamer, a wisher, a liar, A hope-er, a pray-er, a magic bean buyer
I think that last one is either wrong or way way in the future.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
Someone said that hell is the impossibility of reason. Slashdot is hell.
Okay, yes, I fully admit that it's cool to map the internet in one day. Regardless...I think I hear about some internet every other day.
There's John Quarterman who's been doing it for years, and then the CAIDA visualization tools, and Cybergeography and the Internet weather report and damn maps and more maps.
Note to everyone: please stop mapping the internet.
#/usr/sbin/traceroute *.*.*.*
Well, there's one less server to map...
Tired of being "punished" by the Slashdot $rtbl since 2002. I'm now over at http://soylentnews.org/ .
...*cough* Like handling the readership size of a /. news story :)
Mad Hatter
What do you do with it ?
Top that!
Well, I guess that map of the Internet has one less location to worry about now.
Where's my lobbyist? Right here.
SO why doesn't he write a downloadable client that can map the Internet. It will then be possible to let a computer map a specific ip-range close to its location. That will be faster than doing it from one location.
[o]_O
In soviet russia, internet maps you!
"'Yrch!' said Legolas, falling into his own tongue."
Do any of you moderators notice that the word "anal" shows up in that so-called "mirror" comment? Only the poster (called TrollBridge) knows how much else that has been changed from the original...
the Internet came to him! And he was no more.
When I first saw the image on the right it looked like a human brain. It would be creepy if the Internet had a sort of fractal self-similarity to our physiology.
is more about geolocation than mapping, but I guess I deserve at least a passing mention :-)
Simon.
Physicists get Hadrons!
We just made his job easier. There is one less web server to map now!
Now map the people mapping the internet in one day.
If this article confuses you, don't worry. It was posted yesterday in a much clearer fashion.
Assume 1,000,000,000 web pages.
Assume average ping time is one milisecond (10^-3)
1,000,000,000*(0.01) = 1,000,000 (seconds)
1,000,000,000/60 = 16666.6 (minutes)
16666.6/60 = 277.7 (hours)
277.7/24 = 11.5 (days)
Remember, this is only to PING every page, not transfer/parse each page to find sub-pages.
Too bad he changed the text of the article. I doubt the original page really said this:
This mapping consists of frequent traceroute-style anal probes, one to each registered Internet entity. From this, we build a tree showing the paths to most of the nets on the Internet.
It's be even better if he could overlay a map of the world so that we could easily identify regions.
I had maps of pre-war Iraq and then compared them to today, one could see how badly Iraq was destoryed. They have the Internet in Iraq? I thought the only network they had was Al-Qaida. Attack Iraq? YES
A single guy with a single computer...
He's mapping the Internet. Why am I not surprised he's single?
It would be creepy if the Internet had a sort of fractal self-similarity to our physiology.
....
Agreed.
Good material for an X-Files episode
-kgj
-kgj
Why can't somebody just rsync the Google search cluster? Wouldn't it have the same results this guy is looking for?
This is a test. This is a test of the emergency sig system. This has been only a test.
How is it possible to map something that is always changing, and what use is such a map, if it can be created?
What about the reality that all nodes are no longer created "equal," so to speak?
Oh gosh, just one semicolon out of place...
Uh... is that 21st Century Math? Crap. My kids are going to come home from school and I won't be able to help them with their homework.
Read the EFF's Fair Use FAQ
"I knew I should have taken that left at Albuquerque." -- Bugs Bunny
This is a test. This is a test of the emergency sig system. This has been only a test.
You can't make money with computers anymore because some jackass is always trying to give away the same thing you're doing.
You: Mam, can I help you over the street?
Old Woman: Oh that's so nice from you.
I'mTheAmericanDream: Hey wait, you can't help that woman for free. I've built my business plan on this. THIEVE!
Patent this asap before Amazon gets their grubby little fingers on it! =)
Good luck finding me! Even my boss doesn't have a clue where I am!
MMORPG Fan? Prove your worth!
gotta bookmark that one!
it sound like a William Gibson novel, the one with the guy obsessed over the "form" or "shape" of the cyberspace being a "snapshot" of the universe. i can't seem to remember the name anymore
M$ Internet MapPoint 2003
Exactly why do we need a "map" of the Internet?
Because it is there.
http://www.techweb.com/printableArticle?doc_id=TWB 19991013S0007
When he finishes the map it will already be outdated and no representative of the truth. However, this is not a real issue.. one day (or ten hours) is better than anything else
__
Sig: Marine Stock Photos
Why bother mapping it, just post a link on /. and we've already sent a majority of the internet straight to him.
We should just stand in line, take a number, and tell him the path we took to get there.
He doesn't have to wait for one to respond to send another request. Its called parallelism and computers are good at it these days.. well, some.
"Thanks to the remote control I have the attention span of a gerbil."
Please explain how one pings a web page. Is this a feature of AOL?
Web pages are NOT internet hosts.
Web servers are relatively few compared with other types of hosts on the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
The World Wide Web is NOT the internet.
As a side comment, now I understand why my connection got so slow.
[Internet Mapping Project's] mapping also takes nearly six months to generate a single map. My comment was that, "I can write a program that can map the entire net in a single day."
The Internet Mapping Project maps the Internet in under two hours (105 minutes for this morning's run). I'm not certain where the six months came from. The rate limitation is the packet rate limit we set (500 packet per second).
Map layout time is not included in that time, but that is not done on a daily basis. A map layout take about six hours, as I recall. It only took a couple weeks to produce all the layouts necessary for a movie of the Internet from Aug 1998 to Jan 2001 based on the daily runs.
CAIDA also creates daily maps of the Internet as part of their Skitter project. Their schedule varies between measurement points. In addition, other projects, such as the Mercator project and the RocketFuel projects, also map or did map the Internet.
Each project has slightly different goals. Skitter focuses on paths to major web and DNS servers. Mercator attempted to discover networks with limited pre-knowledge. RocketFuel wants a very accurate map of a particular ISP. The Internet Mapping Project is focused on the router connectivity within and between public backbones.
You don't have to help the old lady across the street. Adam Smith's Invisible Hand of the marketplace will reach out and take care of it.
Because we can.
You sure you're on the right website?
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
...when featured on Slashdot, now, is he? =^^=
This sig no verb.
> He's made some progress and is looking for volunteers.
Yeah, I'm always looking for additional ways to waste my time on pointless projects for free.
This is nothing new, you can find free software to solve just about any problem. People buy commercial software because in some cases free versions aren't advanced enough or easy enough to use or they want to buy support.
"Thanks to the remote control I have the attention span of a gerbil."
Then I came to my senses and decided to work on more practical and less controversial projects such as Nmap Version Detection. But the subversive in me still hasn't given up entirely on Nmapster :).
-Fyodor
You want to map the internet?
1 Setup a site saying you want to map the internet.
2 Get posted on slashdot.
3 Parse the referer logs.
4 ???
5 Profit!
You kids are so spoiled today. Back in the 60s we used to be able to map the entire internet using nothing more than a piece of string and 2 pushpins.
Huh? 2 nodes? Why the hell should that matter?
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
(tm) devise, fails again?
what a surprise?
Does anyone else worry that if he is successfull and the code is released that this will significantly slow network traffic? Just think about 100,000 people (may be a conservative number) all trying to map the internet at the same time. Would this result in effectively a huge DOS attack? Plus none of these maps would be complete because they are all competing with each other and it will make it even harder to access some sites.
Is this possible???
If it is, then I think that this is one instance that it will be in everyones interest to not have any kind of release of this product and naturally keep the source closed.
Not everything is analogous to cars. Car analogies rarely work.
http://slushdot.org/mirror/opte/
I think its an outstanding idea. Time and time again people come up with breakthroughs in technology after they discover a different way of thinking. Seeing something in a different way is often the first step in that process. This kind of thing would be an ideal candidate for a seti-at-home like solution where people use their screen savers to map their local area. Imagine how cool it would be to see the throughput over that map as well.
aka mapping@home
4Z5TX
Maybe people already take this into consideration, but won't this impact webhosting? Won't people try to get their webpage/company closer to the main trunk / center of map? When you look for a hosting service (basically an IP address) right now most people don't consider where in the map the host is.
I mean with this tool, I would look up where my new IP would land me and try to find a host closer to the main backbones. Is this already done now by most people?
(on another subject the maps remind me of the species origin stuff)
a regular wolverine in a penguin suit, we'll bet.
he doesn't appear to be afraid to speak his mind, either.
none so dedicated as volunteers, they say?
Here's a link to a page which links to these and other similar projects
I thought we were /.ing sco every day, sometimes even twice
Actually, it's kind of interesting. It would let us see into traditionally restrictive places like China.
It would be very interesting to know that a major portion of the Chinese Internet infrastructure went down, when it happened.
tasks(723) drafts(105) languages(484) examples(29106)
PARENT IS POSIBLE TROLL
how can this guy map all the porn in on day?
I am working on this for years now.
The code for this is distributed, then anyone on the internet can scan the entire internet for some nuance on this purpose.
(shiver)
Perhaps a centralized open database would be a good idea.
People who disagree with you are not automatically evil, greedy, or stupid.
Has anyone noticed that nearly all of the maps have a more or less tree-shaped structure?
This means concentration of power. So, the real, failure-tolerant internet is gone, at least it seems to be.
You've uncovered SkyNet!
--
But then again I thought VCR+ was a stupid idea and would die a quick death--so what do I know?
subject sess all..
This Picture has two hosts out of place, one on the far right and the other far bottom, with the billions? of others all next to each other..
Why?
Browse at -1, because trolls are often the most creative part of
I'm mapping teenkelly.com right now!
Quick, aren't you?
So if we assume the electrical signals of his packets travel at the speed of light (186 000 miles per second) across the internet (which they don't really, but we'll ignore that for this argument), then logic tells us that the internet must have less than 16,070,400,000 miles of cable in order for this to work. Because his data cannot travel any faster along the pipes.
And that's only one way... Assuming query and response, his packets have to effectively travel double the existing cable lengths.
So do all the (public) networks in all the world total less than 16 trillion miles of cable?
No unauthorized use. Trespassers will be shot. Survivors will be shot again.
The Helpful Invisible Hand of Capitalism
Happy Halloween everyone!
Sounds like a waste of bandwidth, hurrah...
MD5 (gnupg-1.2.3.tar.bz2) = cdca1282d7901f9ddb52f9725b001af2
indeed!
And the muscular cyborg German dudes dance with sexy French Canadians
I have followed various projects related to mapping cyberspace through the years and have always found An Atlas of Cycerspaces to be fascinating.
Mapping by Lumeta is one such methodology and I even have a poster of theirs printed by Peacock Maps (server down just now) in my office.
I have noticed that these mappings take a long time to complete and being able to map in a short time frame could be beneficial in much the same way that Internet Traffic Report can be to visualize traffic patterns or disruptions.
Taco
OK, so you map the ever-changing net in a day.
A week later, you map it again. Eventually, you're mapping it every day. After a yeear or two of that, you have a cool little animation of how the internet changed. You project it on the wall of a dark room, and watch it koop, and go "wow".
We know the real reason you didn't do it. The RIAA scared you, didn't they, when they showed you their copyright on the IP address scheme...
They should sell their data to DoubleClick. They could serve geography-sensitive banner ads! If they know you live in San Francisco and you are visiting a food web site, they could serve up banner ads for local San Franciso restaurants.
I think there's a company called MaxMind GeoIP that already does this.
cpeterso
No. They are limited by the speed of IP, which is not only slower, but its speed is random within a fairly large range. So to be safe, we have to asume the total cabling on the internet is (think, think, think) less than 3 meters.
You have to be one humorless ignorant moderator to consider this flamebait. They guy made a type of joke that is all too common around slashdot, which is to misinterpret the title's meaning on purpose.
OK, the joke is not that funny, but flamebait?! More like "moderator is a moron."
I'd be interested in seeing a real global world map with the locations of servers pinpointed on the map to show the density of computer equipment around the global. Actually, it wouldn't even need the real map to exist, if all the points of light to represent a computer server were placed in their proper geographic locations, I bet you'd get a very good mapping of the world. In fact, it would probably look similar to the famous map of the world at night where the lights from industrialized countries creates a spectacular image of the developed world.
Does such a map exist? Is somebody working on one?
--
RumorsDaily
What's disturbing about the current map thus far, is it clearly shows how CENTRALIZED the internet really is. This old idea of traffic routing around damage is in fact a rather fragile network of handfull of backbone nodes. I would have expected more lower hierarchical nodes crisscrossing the network, forming more of spiderweb system, rather than everything going across 3 or 4 nodes.
www.enthea.org
Why not map Autonomous Systems instead? Routes to AS are being advertised by BGP, and a set of well placed looking glasses would be all it takes to get a big picture. I never saw anything like an AS mapping, with the ASes as nodes and the (BGP announced) routes between them as links.
Of course, some AS span multiple geographical areas, but this is also true of class C networks.
The big advantage of mapping ASes is, that there are not so many of them, compared to class C nets, thus resulting in much simpler graphs. Moreover, the graphs would nicely show the boundaries between institutions/organizations, rather than artificial boundaries based on numerical addresses.
cpghost at Cordula's Web.
The problem you allude to is believed to be responsible for the power-law behavior of the Internet. If you look at the distribution of degrees, there are more highly-connected nodes than there should be if the graph was random. The distribution can be explained if people are more likely to connect to nodes that have high degree already.
On the other hand, these maps are not the cause for either of the behaviors above. These maps generally only show IP-level connectivity, ignoring link-layer tunneling, which can be very important. In addition, you have to additionally consider latency, loss rates, and bandwidth at least to some extent. Pure hop-count is what these maps show, and that is only a decent prediction of performance, not a great one (like clock rates for processor performance, if that helps at all).
There are other factors that go into location selection. One such factor is which machines will you talk with. You do not care much about your connectivity to hosts in Norway if you are running a US-only business. Another example factor is price. Few people are willing to pay for a T1 across North America to improve your speed by 10%. While you could put your computers in a co-lo instead, that only helps for servers and incurs yet more costs.
The maps are nice representations, but, generally, more analysis is necessary before useful data can be extracted from them, including computing the best location to connect to the network.
All that said, yes, most people connect to highly-connected nodes. They just generally estimate those nodes, rather than doing direct measurements.
I see his trick already. Post on /. that you plan to map the entire net and then wait till the entire net maps its way to you.
P.S.
Is there such a thing as trecart ?
Maybe you live in interesting times
"Got to see the whole net
From Yahoo on down to eBay--
In just one day!"
Notice that he maps the paths from his computer to the rest of the world. That is not the same as a map of the entire Internet.
To illustrate, if I map routes from, say Chicago, I'm likely to miss the direct connection between Seattle and San Francisco, as there is no traffic I could generate that would take that path.
Until it features a big arrow that says "You are here!" I'm not interested.
Since we all like pretty graph pictures go over to: http://networkviz.sourceforge.net/ and look at the packages out there. Many of these need help, so don't hesitate to offer your services if you like graphing. Most of these would be able to view these internet graphs interactively, which would be far more exciting than just pictures.
If he's "already made progress" doesn't that defeat the "map the Internet in one day" promise... just think about that ;).
his day is up!
Conformity is the jailer of freedom and enemy of growth. -JFK
A somehow similar, i.e. a semi-private Internet Auditing Project by Liraz Siri (for which BASS was written) five years ago (only 36,431,374 hosts, mind you) took twenty days with five scanning nodes. I highly doubt today Internet could be scanned in one day with a single host. Remember that this single host will be attacked, like the Liraz Siri's hosts was:
The keyword here is "backups." Remember that scanning the entire Internet you will step on someone's toes.
(By the way, it's good that this story was posted on Slashdot, since I could be the one counterattacking them and making idiot out of myself --- not that it has ever happened before...)
Sincerely,
Pan Tarhei Hosé, PhD.
"Homo sum et cogito ergo odi profanum vulgus et libido."
Web servers are relatively few compared with other types of hosts on the internet?
/20's worth) of distributed transient users, but for the purpose of this particular mapping exercise that's irrelevant -- the maps shows (network) centrality rather than geographic location...
Really? Compared to what? Routers and switches? Access servers? Storage devices? Perhaps you mean that web servers account for a relatively small small amount of IP address space? A single access servers can accommodate an awful lot (maybe the equivalent of a
In a sense, the results of the project do seem to match earlier research on the topology of the web; at a glance, the graph arrived at, does seem to be scale-free in nature.
Which, actually raises an interesting question. Scale free networks, by their nature, are supposed to have certain highly connected nodes, the connectivity of which, is extremely critical to the network as a whole.
In particular, look at the resultant graph for one-third of the net. Note the single link in the middle between two nodes that seems to connect all four sub-trees together. Now imagine that link being, say, DDoS'ed. (You can see it in the one-fifth-of-the-net graph as well; only, it's more clear here)
(Additional points for all you neurologists out there:- we've been comparing the structure of the human brain with that of the Internet, do you know of any such neurons?)
[Even more points:- Will you tell the world if you've found one? :-) ]
More than mere navel gazing.
By the time he finds us, the 24 hours will be up.
The support that the world provides to projects like this makes me feel better as a human.
Idiot. Who's paying for this nonsense? Taxpayers as usual?
The stuff's not even pretty.
Academic masturbation.
the Internet makes a map out of you.
Bush is on fire and its not good for my lungs.
...I'd be more busy fapping the whole Internet.
In fact, BRB.
The maps at http://idl.net/MAP seem different than the brain tree displays. Lots of squares from dns connections I guess. Theres some tracert ability. Anyone know more about this method ?
> When I first saw the image on the right it looked
> like a human brain. It would be creepy if the
> Internet had a sort of fractal self-similarity to
> our physiology.
Oh, God, no! I don't want to know how the GAPING HOLE of unused address blocks look like!
"If anyone has a Co-Located system on a nice network to donate to this project for a few months, I would be very happy [because I am mapping the Internet in one day as a single guy with a single computer without any help from anyone!]" --- WTF?
I can see you have a whole class A network on 127.0.0.0/8 ---
*runs "nmap -v --randomize_hosts -p1- -O -T Insane 127.0.0.0/8" and goes to make an espresso*
It's not clear to me where the idea came from that it takes /24s on the Internet to limit consternation
us 6 months to map the Internet. Our daily run takes
an hour or two. We do not "expand"
the search to
of the scannees.
I'd be interested in seeing the layouts. The last
time I looked Steve North's stuff couldn't handle
dataset of this size, but that was a long time ago.
Others are collecting data that is probably more useful
than ours on the Internet. Check out CAIDA's work
and especially Rocketfuel.
Our bread-and-butter is scans of intranets, which tend to
be smaller, but need to have the data from several points
integrated into one data set.
We are still collecting the IMP data, and now have
about five year's worth of nearly continuous data.
ches
We need a 'Admin Apriciation Day', where all the admins pull the plug on the main systems and let the redundancies do their work. That way we can get maps of those too :).
GPLv2: I want my rights, I want my phone call! DRM: What use is a phone call, if you are unable to speak?