A New Way to Look at Networking
Van Jacobson gave a Google Tech Talk on some of his ideas of how a modern, global network could work more effectively, and with more trust in the data which changes many hands on its journey to its final destination.
Watch the talk on Google's site
The man is very smart and his ideas are fascinating. He has the experience and knowledge to see the big picture and what can be done to solve some of the new problems we have. He starts with the beginning of the phone networks and then goes on to briefly explain the origins of the ARPAnet and its evolution into the Internet we use today.
He explains the problems that were faced while using the phone networks for data, and how they were solved by realizing that a new problem had risen and needed a new, different solution. He then goes to explain how the Internet has changed significantly from the time it started off in research centres, schools, and government offices into what it is today (lots of identical bytes being redundantly pushed to many consumers, where broadcast would be more appropriate and efficient).
The talk was held on Aug 30, 2006.
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
"...where broadcast would be more appropriate and efficient"
If this means airwaves, same as TV, sure. Why not, since the whole thing is one big info-mercial swamp already. Otherwise, it also means guaranteed next -packet delivery, without any pauses, resends, spinning cursors. And the internet is not going to deliver that, sorry.
How is this anything new? Everything in that summary was already covered by Tanenbaum in his excellent book on networks - of course it's easier to hear it from someone if you're too lazy to read :p
I have spoken'eth.
There is no reason you can't multicast across a large segmented network, i.e. the internet, and get good delivery. Radio, television, audio, phone, movies are all latency sensitive but not particularly bit sensitive so you can drop some packets here and there. That also means that some things would need QoS (VoIP) while others would need intelligent caching and buffering (movies, etc.).
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
"(lots of identical bytes being redundantly pushed to many consumers, where broadcast would be more appropriate and efficient)"
The first part is true, but does not necessarily lead to the conclusion in the second. There is a huge, very important IF that belongs between them. Specifically, "if the recipients are all prepared to receive those bytes at the same time". The problem with the conclusion is that the evaluation of the "if" part is nearly always "they're not". This is yet another case of "if the internet were like television, it'd be more efficient". Yes, but it would then no longer be the internet people like. The great promise of the internet is information on demand. All this bullcrap about broadcast, push, and the like, it's all the efforts of 20th century throwbacks trying to fit the internet into their outdated worldview of "producers" and "consumers". They need to quit it. Broadcast is a square peg and the internet is a round hole. Every time anyone suggests putting the two together, they simply look like a bloody idiot.
If a job's not worth doing, it's not worth doing right.
I 'browsed' some of this video and book marked it for later: Van Jacobson's background is awesome.
A bit off topic, but there are two things that I want to see happen: a complete upgrade to IPv6 and the creation of an alternative 'public Internet' based on emerging long distance wifi and software that lets people volunteer to be part of this new open grid, and optionally share some bandwidth bridging the 'real' Internet.
It may seem pointless to want both higher performance (multi-casting UDP, essentially infinite IP address space) and low performance and ad-hoc systems, but please consider: the UK and USA seem to be going down the wrong path of surveillance and citizen control, the Internet may someday be viewed as something that the public just should not have because it is too free a source of information. I hope that I am wrong about this, but this unpleasant possible repressive future is a possibility.
I think bittorrent is the internet answer to the broadcast problem. Bittorrent is intrinsically adapted to the way the internet works. Data which is most sought by people will be found on more nodes around the net, less popular data can be downloaded directly from the primary servers.
This is something that's been on my mind for a long time. In fact, I thought that's how streaming was done because I couldn't understand why the load would increase so much as more people watched. This should be especially true for internet radio/TV, and for ads that really suck up the bandwidth, which why I block them (the ads that is). I didn't know that each stream was being fed to only one viewer/listener. It always seemed kind of odd to do it that way.
What?
Based on all the measurements I'm aware of, Linux has the fastest & most complete stack of any OS (source)
You know I have to agree with that. I wish there was a text transcript. I won't sit through a video either. It's so obnoxious. I quit watching news on the TV for that reason. I don't want to sit through twenty minutes of tripe to hear an interesting story or wait for the weather segment. Text is the way...maybe with a pretty picture or two.
What?
You missed fart => part. (Yes, I'm serious, at 7 minutes).
So, sending identical packets to everyone is somehow more bandwidth efficient than sending packets to only those who want them? Doesn't that seem backwards to anyone else? Furthermore, couldn't you define broadcasting as precisely the act of sending identical bytes to many consumers?! I'm teh confused.
TLF
I do not respond to cowards. Especially anonymous ones.
Is there a transcript of the video available (e.g. just the subtitles pulled out)? It's a bit tedious watching this when reading it will take 1/10th of the time of the video...
haven't we solved that with proxies?
Everything he says applies to server>client. Producers>consumers. And he proposes a change to the current model of conversation to a model of multicast the same data to many consumers. And support this by findings that 99% of data is structured that way. I guess that's wishful thinking though, because he works at google, the massive meta-producer of data we all consume.
What about bittorrent though? Uses the same TCP/IP protocol, trusts the data, not the source (like he says we must do), answers to the contemporary question "Who has the time" (ditto) AND DELIVERS.
They are also other decentralization options to bittorrent, like trackerless torrents (DHT) and torrentless torrents (Magnet links)
Ditto there are implementations for p2p phones, p2p tv, etc.
I think google and other big companies are just pushing a way to change the Internet to suit their business models and nothing more.
The summary was posted on May 06, 2007 at 08:20.
I just read Slashdot for the articles.
- Only people actively downloading or seeding content are available to redistribute it, so there's a built-in time-dependence which makes in unsuitable for small pieces of data.
- It has no knowledge of topology, so it can't take advantage of topology to cache data closer to endpoints.
I wish someone had asked about freenet, since that seems much closer to what he describes."If you look 'round the table and can't tell who the sucker is, it's you." -- Quiz Show
Wow, some great subtitling here. The guy programs in oc and apparently speaks about item potent data packets...
I enjoyed this talk very much. It was more than just a statement of Van Jacobson's thoughts on data dissemination. It showed his analysis of the relationship between infrastructure and application across two generations of networking, and it pointed out very nicely why it's time now for phase 3: we've moved our usage goalposts compared to when the IP network was designed. Great stuff, and I agree completely.
:-)
:-)
The article submitter didn't seem to "get" what Van Jacobson was saying though, as the talk had almost nothing to do with broadcasting or multicasting. Indeed, Van Jacobson actually pointed out why multicasting and broadcasting were inappropriate in most situations in this new world (they carry implicit time sync), so only use them as accelerators on LANs or in other special cases. The slightly wrong article description may have misdirected some of the posts here since not everybody reads TFA, and even fewer sit through an extended talk. It wasn't about broadcast or multicast at all, except in passing.
Maybe it'll help to summarize his thrust briefly.
What he said was that the network underneath doesn't actually matter, and that the wires and fibre underneath don't actually matter either -- TCP/IP has abstracted away from them. However, the client-server model on which TCP/IP is based is no longer strictly relevant either, because it is founded on a somewhat obsolete concept, the "conversation". The vast bulk of our Internet traffic is no longer "conversations", but "data dissemination" (the migration of identified data objects from place to place), and actual conversations are just a special case of that.
Data dissemination is utterly different to conversation as a communications paradigm, and that's what he's getting at. Fully identified, self-validating items of data as discrete entities are really where our focus needs to be, and how they get to us is rather immaterial, or abstracted away. *Where* they come from (ie. the actual server to which we connect) is quite immaterial too --- getting it from a passing plane would be as good as from a known server, when you can rely on data identity. Furthermore, if the data items were fully self-descriptive then many of the current problems like spam would go away as well. What's more, the nodes of the network would be able to work more intelligently too (and hence efficiently), if they were aware of data identity rather than just treat everything as a conversation.
That's a very brief summary and can't hope to do the talk justice. Go listen! He's dead right.
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
You give a pretty good short summary of a long and interesting talk.
One thing I pulled most out of it was the analogy to 60s and 70s networking and how it is only after technology has been adopted that we see what its used for.
When the telephone was invented Bell didn't know what it would be used for, its a strange concept but he really didn't know what a "phone call" was. He just knew he could transmit voice. Not only that but you had to have wires to connect people, so there was this very expensive business of putting wires everywhere. What happened was that people used those wires to make conversations. To establish a conversation you had to have a path between two nodes. This encouraged a monopoly because the best known way to make paths was to have control of all the wires.
When the idea of what TCP/IP was to become was introduced people thought it was lunacy. What they were proposing was adding all this crap onto your data to explicitly name your destination so that it could travel any path to get to its conversation partner. All the networking researchers didn't get it because they already had implicit addresses by way of making the path. Turns out that the supposed innefficiency solved several problems simply by construction. Being able to take any path meant not caring about the underlying topology.
What Van Jacobsen is proposing is another abstraction. Essentially adding another layer of "crap" that will allow us to ignore the underlying network. He mentions how several technologies are working towards these ends to some degree like bittorrent and akami CDN, but I think he is advocating for something like a new protocol. This new protocol would then end up solving some of our current problems simply by construction. Broadcast and one-to-one will become the same thing. Whether you are sending a secure email (pgp signed and named) or downloading the front page of the nytimes you could rely on the nature of the new protocol to deliver you authentic data, no matter where it comes from.
Personally I think its genius, I'd like to follow the progress of such a protocol if it exists. I just got done watching the talk so I'll be googling around for a little I suppose.
"how can they call it a MINE if everything here is THEIRS?!?!" -Straight Jacket
...but the more he talked, the more it reminded me of some halfbreed between akamai and freenet.
Basicly, he's speaking of named resources, that an URL would be key like KSKs in Freenet
Content would self-verify, that's basicly CHKs in Freenet
Then you need add security into it which pretty much amount to SSKs
Only in his case, it wasn't talk about making the end nodes treat information this way but rather the core of internet, and it didn't involve anonymity. But the general idea was the same, to grab content from a persistant swarm of hosts who doesn't need a connection to the original source. Unfortunately, most of the examples he gives are simply false, like the NY Times front page. If I want up-to-the-minute news everybody need to pull fresh copies off the original source all the time, reducing it down to a caching proxy. Any sort of user-specifc content, or interactive content won't work. For example take slashdot. I've got my reading preferences set up, which means my content isn't the same as yours. Also my front page contains a link to my home page, which is not the same as yours. Getting a posting form and making a comment wouldn't be possible. Making any kind of feedback like digg, youtube, article feedback etc. isn't possible. Counters wouldn't be possible. The only thing where it'd work is reasonably static fire-and-forget content, and even then there's the problem of knowing what junk to keep. Notice that when asked about BT he said that only worked for big files, so the idea is that everyone will have some datastore where they keep small files until someone needs them. The only good example is the Olympic broadcast, which is exactly the same content at exactly the same time. Oh wait, that's classic broadcast. Classic broadcast works best in a broadcast model? Who'd think that.
Live today, because you never know what tomorrow brings
I see one problem with his idea of ignoring where data comes from.
Corporations make money by restricting access to information.
It doesn't seem that it will be possible for them to continue to do that with this model, so I don't think any of this will come to pass any time in the near future.
I thought I was going to skim through that video when I first saw it a while ago.
Then I started watching, and at some point noticed I watched the whole thing, without skipping anything.
I think he gives a good talk, and it kept me interested the whole way.
Its a very nice insight he has there, too bad it flies way over Slashdotters head (well, its just that almost all of them probably didn't even read the whole thing).
By the way, I summarized his ideas (as I understood them, which may not be the same as he explained them).
1) Optical only routing at the backbone, and traffic monitoring to ensure Tier1's don't
let major choke points choke due to not wanting to spend money on equipment,
share the costs like a Co-op if necessary amongst Tier1's to keep the backbones
and choke points scaled up. People who run trace routes see tier1 choke points now.
2) In the Tier1 core make sure it is DWDM folding many layers of Sonet/ATM into
a multi-channel frequency spectrum, at some point plan on phasing out asynchronous
communications and go with on time in order delivery of packets, aka Sonet and its like protocols.
*** VoIP/streams works best with synchronous***
3) Each Metro Area Network has its own Squid box cluster for reducing repetitive traffic,
filling all identical requests for data vs. pulling it long haul a 2nd time,
alot of ISPs have their own squid like system now, but unify and optimize it.
4) Create a spam registry, like ACLs for routers, don't blacklist by single IP,
blacklist by ISP, then Inclusively activate by white list single IP's within that range.
Offenders would essential end up shutoff from most of the world.
5) Create a modified Torrent protocol that can multicast via VPN like hamachi.cc,
so that a single peer can send the same data at 50kb to a million ppl,
make it like tuning into net radio audio stream in progress picking up the
data in progress, and getting the first to middle bits on the 2nd "play" of the data stream,
set a threshold for the modified torrent protocol to switch to this multicast mode.
for ppl with privacy concerns make it virtual ip's on a VPM network like hamachi.cc
6) All monetary transaction sites require end to end encryption and authentication,
and some means of client IDS that verifies the client isn't compromised,
a cursory one would be patch level and malware/virii client acknowledgement.
CERN could maintain a central footprint repository or another agency could
do it, but have open peer review and input.
Verifying legitimacy of all in memory processes would get rid of the obvious
intruders, but the ones that can 'checksum' mimic known registered processes
would be more difficult to detect. Verifying client integrity particularly for
Windows OS would require some brilliance beyond what we have now.
7) Zombie / IDS Net Spiders that roams the internet looking for compromised systems
and notifying the owner, and the ISP, and reports it to all other ISP's and
is added to a blacklist at ISP of origination. Systems routing to known
zombie network reporting IP's or security threat sites would be placed on watch lists
and client warnings of possibly being compromised.
8) Global and Nation based NOC's that monitor for network issues,
and traffic control system that allows truly functioning
reroutes via higher layer routing protocols for alternate routes.
Just a few ideas I had, some it may not be workable with what we have now.
google "32 trillion offshore needs IRS attention"
The data contains all the information, so why not an authentication system. DRM for data. If you can validate if the data is from Company X then the data should be able to validate that you are allowed to see it. You still can have a copy of the data on your server, just can't use it.
What power has law where only money rules.
I think the idea of having a fully transparent networking paradigm is what is of paramount importance to both data and software that manages that data. Adding application layer logic to routers that will effectively cause data I want (or need) to be available everywhere would (in my mind) restrict the things that we can do with the net. I'm a bottom up kind of guy.
We need not only the data but the intelligence to manage that data spread around. I can't think of any news or information site that doesn't require direct interaction with the user of that data. I mean, take Slashdot for example, it tracks your usage of the site and gives you moderator points when you meet certain criteria. How is that going to work if somewhere along the line you don't have some kind of interaction with Slashdot's software? How does that get distributed as well? Are we going to allow server algorithms to run everywhere as well? I guess if we solve the secuirty problem, that would actually be quite nice.
I agree there needs to be a better mechanism for delivering content everyone wants more efficiently. I'm reminded of this as well when I go to watch a live event and the response is very slow (not to mention 30 to 45 seconds behind), or I try to download a large file. Also with VOIP routing headers taking up 30% of wireless bandwidth alone, something has to be done with our protocols so they are lighter weight.
Just as mechanical switches had to be programmed and were part of the problem, I think TCP/IP is part of the problem here as well. There are still (as he mentioned) all kinds of barriers that have to be known about and planned for. As he said, it is a very static system requiring a day or at least minutes to pass before you can be seen by someone else on the net. I use a networking system that is tightly integrated into an operating system (QNX) that I like very much (still has its weaknesses) but it goes a long way to making the entire computing platform very transparent with the ability to add security.
Content can be very static or it can be extremely dynamic (sensor data in the dynamic example). I think his ideas seem focus around managing very static data or data that can be "late" and making it more readily available. When it comes to dynamic content (such as live video or VOIP) he didn't seem to offer much in the way of substance, or I may have just missed his point. I guess I see the need for direct "conversation" like connections as well as data "dissemination" type connections.
In fact, I'm not sure things like the NY Times daily is a good example, as dynamic advertising is very much a part of what they sell, and they need to have their individual subscribers connect to their servers so they can analyze what advertising to focus on the poor unsuspecting reader.
I personally think that the dynamics of the internet are moving towards very dynamic content and this requires direct connections to the servers which manage that content. Unless this content management (software) can be distributed to other devices along with the data, I'm not sure its going to fly. Microsoft already tried that and now most interactive content is blocked (unless specifically enabled) because of security and other concerns.
However, these are all application layer problems, do they really belong at the bit level of data being bounced around? I don't know and it is an interesting question.
We can add layers that do all kinds of fancy things with data to get it to places that were harder for it to get to previously (mirrors, caching, etc). However, in order to make work what he is indicating makes the internet look like nothing more than a big informational resource (file versions, news, video, etc). Like a big file system or database, like a big static storage system. However, the internet is much more dynamic than that and I don't have the feeling that he addressed that aspect well enough.
What we need is something that will enable us to make "conversation" connections
I got it, I just posted this late at night with a poor description. In any case people are watching and talking about it, which was my goal.
this space intentionally left blank
That was one of very few useful talks I've *ever* seen on shortcomings in the Internet.
Akamai Technologies is really very much in the business of solving the main problem Jacobson describes. Yes, lots of people want the same information. Jacobson is a very bright man, and he got pretty much everything right except: "You can't Akamize dynamic content." Yes, you can -- unless live feeds of sporting events (NCAA March Madness) aren't considered dynamic enough.
That said, there probably is room for a truly open platform in this case. If it can do half of what the Akamai network can, I wholeheartedly congratulate the creators their amazing feat of engineering. To my knowledge, the closest thing these days is the coral cache, which actually is restricted to static content.
-Amalcon
At one point in his talk, Van Jacobsen talks about segmenting. He mentions that segmenting solves a problem that is metaphorically like having trains and cars on the same city streets; one doesn't want to wait around in a car for a train to clear the intersection. Then he says something that I've heard before, but never made the connection:
"It would be nice here to not have big trucks, just little cars."
And suddenly I realized that Senator Stevens had gotten a lecture that he completely misunderstood.
Granted, the man shouldn't have tried to give a speech on something about which he was completely clueless, but now I think I know where the heck that truck he mentioned came from.
When I'm logged into slashdot and browsing news, there's only a very small part of the page that is customized specifically to me. Everything else is the exact same content that everyone else is seeing. Currently the web browser does separate queries to pull down the images in the page, which are mostly the same for everyone. Perhaps under VJ's scheme the text parts that are specific to me would have to come pretty much directly from slashdot's site, but it could contain references to all the common content on the site which would be fully cacheable by his dissemination model.
Instead of slashdot's content authors writing templates that are processed on their servers before giving the web page to my browser, something similar to their template would be shipped to my browser, which would then ask for all the common (cacheable) content to fill out the page. Image-heavy pages would already work well with his scheme if the browser just requested the images using the new semantic.
I don't think he plans for the "request/respond" protocols to be used everywhere in a network. The issue is when many users are requesting the same data. Take for example when a site gets "slashdotted," it'd be more efficient if there weren't several thousand requests for the same exact index.html from some host being individually transmitted through the network.
In your example of tracking users on Slashdot, parts of that could fall back on the current conversational protocols. It should probably be possible for clients to decide on whether they want to request data by means of the broadcast system or the conversational system. Then I guess the issue is how does the client decide when it should use one or when it should use the other?
Or, maybe custom homepages and whatnot would have to start being initiated by the client sending a request indicating it needs a specific set of data, instead of the server side realizing that a specific client is requesting data and that it should send the modified data. So in other words, you'd still have to know you want to receive a certain set of data.
There's definitely a lot of finer details to this...
This is the water system...translated for data...that's the only way to make it work...All the information (non-specific-user-defined) would have to be in the system at all times...
If you want a drink you don't specify where the water has to come from...you open the tap and out it comes...but when you want to make lemonade or distilled water lets say...you have to have software on your end (pot and fire) to get out of it what you want...
But that presents the problem of too much bandwidth usage at old usage methods...unless we go to completely over the air data transfer which takes the phone system out of the loop and allows a more radio style internet...where the data is already being disseminated just waiting for you to tune to that station...you're not asking for the information you're allowing your device to consume it...
hrmmm
"Helping to keep you two steps ahead of the Thought Police!"
Wait, isn't this basically what freenet 0.5 was doing? I have no idea how much of the issue with that was the anonyminity part, and how much was the distributed server part, but it was painfully slow. Maybe if everyone was using it it would be faster...
But doesn't this have several major issues?
1) all the freenet problems - that is:
1a)what if people don't want to share bandwidth or just some specific content? At one end, you have the freenet solution where you either share or don't use it (with that driving people away), at the other, there is another huge administrative issue for anyone who wants to manage what they share (which also will serve to drive people away).
1b) slow as hell - even non anonymous protocols that handle the search like gnutella are pretty slow and finding stuff, and getting the actual transfer setup. bittorrent is slightly better, but there's still the overhead time of the out of band/protocol search. Metalinks might solve this however. I just worry that for most data, by the time a torrent got started, you'll have loaded the page over HTTP... Unless he's just suggesting squid style proxies, which have all of their own caching problems.
2) Do people want to share their resources? Sometimes, but not everyone.
3) Why is copyright considered a bogus question? I'm sorry, but with the current way the MAFIAA is working, I can already smell the lawsuits.
Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
The idea is that the "network stack" would implement this not just in end-points but in the network infrastructure as well.
Routers and peers alike.
Today you already have questions such as: What if somebody will modify his TCP stack to use more aggressive retransmitting and waste everybody's bandwidth?
There's already an issue of being a "good netizen" and this makes it a bit more of an issue.
Routers/etc are already trusted to be good netizens and are paid to be so.
If I didn't make it clear in my summary, he does make it clear: He is not proposing "caching". Caching is merely an approximation for what he is suggesting. "Searches" would still be the domain of search engines and such. The mapping between "keywords" or "generic names" and a specific "label" (which is a uniquely identified immutable piece of data you can get from anywhere) can be done by querying central search servers, asking nearby routers, or a "domain name server" equivalent. It does not have to be a "search network" like gnutella. The only thing it must be, is crypto-signed to verify the link between the query and the data is indeed done by those who made the data, or someone you trust with categorizing it.
See reply about good netizens.
The "Bogus questions" is simply my addition (for purposes of the Enough project), and wasn't at all mentioned in the talk.
I believe it is bogus because it is legal and not technical, and because routers already duplicate information. If they also store it, its a mere technical difference.
Some of these issues seem to be addressed (or are being attempted, at least in the early stages) by metalink which was discussed at http://slashdot.org/article.pl?sid=07/02/25/144209 a few months ago, but I don't think people really understood what it was.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> /* <![CDATA[ */
/* ]]> */
<html>
<head>
<style type="text/css">
<!--
@import "/branding/css/tigris.css";
@import "/inst.css";
-->