Hit the nail on the head with this one. This is the bit where the Cisco "better together" argument actually makes sense. It's also the part of the puzzle many of Cisco's biggest customers (ISPs, Fortune 500, Governments) really care about - Slashdotterss may be able to keep their ten-node home networks clean easily but these guys really struggle to keep their 10,000 node networks clean.
It's not just Malware - it's Spam, Phishing, Spyware, Botnet C&C traffic - basically anything bad on the net. The amount of data Cisco has on this stuff as a result of telemetry from their routing and switching business and the more importantly the previous Ironport, SourceFire, TheatGrid and ScanSafe acquisitions is huge - arguably the richest set of security related data in the business. Simply adding the WebRep domain levels blocks from Ironport's data to OpenDNS would improve the overall protection massively.
Of course, Cisco's ability to successfully integrate all of this stuff without falling over themselves is another story - one of the reasons why I left.
This is exactly how it is done on many commercial services (including the one I work for). It works pretty well apart from a few gotchas:
- Blocking of elements within a page (such as images hosted on 3rd party servers, Javascript, AJAX calls). In this case, the end user doesn't get to see a block page because it's incredibly difficult to get the browser to display anything sensible. This is particularly true in the example of AJAX calls for 3rd party websites. For example, if the censor blocked a post on Facebook by altering the returned javascript they could put up a message, but then they're in a constant race to keep up with every subsequent change that Facebook make to how those messages are sent.
- Blocking HTTPS requests. Some browsers follow the redirect, some don't. This also seems to change from version to version.
- Blocking HTTP calls when the end client isn't a real Web browser.
- Blocking of files where the censor has already started to send the content. This is typically done for large files and streaming media. For example, if an ISO image is blocked for containing malware, the scan can't be done until well into the download. However, if a pure store-and-forward model is applied, the browser will have given up and timed out long before the censor has finished downloading and scanning.
A censorship code might help the first three cases in that the browsers could display (sort-of) sensible messages. I don't think anyone has a good answer for the last case.
Also relevant is RiscOS's lack of a conventional file requester. The article says "We have the technology. So why do we still make people use filepickers at all? Cruft." RiscOS has never required these. The user can simply go to save, and drag the icon to the appropriate place (be that a folder or another application). Similarly, dragging files into applications is the standard way to load them. Most microsoft applications will now also load files in this way, but don't provide the same facilities for saving.
I haven't seen anyone mention this, but has anyone considered the effects of a wormhole in a computer ? Let's say we have an iterative function, for example x1= 0.5 * (x0 + 2 / x0) (which will find the square root of 2). Given a wormhole computer, we can send the result of x1 back in time to be used as x0. Thus we could compute values for any iterative function in the time it takes to compute a single iteration. This would work even with very small (subatomic) wormholes. A usable wormhole need only be large enough to send a few electrons through. Perhaps even electrons are larger than we need ? I can't believe I'm the only one whose thought of the implications of this, so does anyone know of any work in this area ?
"AltaVista... doesn't try to eliminate
redundant or very similar pages (or subpages)
like Google does."
Given that Andrei Broder, CTO of Altavista
has written at least 4 papers (some dating
back to his days at Compaq SRC) describing
the algorithms used by Altavista to detect
near duplicate documents, I think you need
to do some more research. FYI, the four
I know of are:
Clustering the Web (Broder, Glassman, Manasse)
Mirror Mirror on the Web: A Study of Host Pairs
with replicated content (Bharat and Broder)
Identifying and Filtering Near-Duplicate Documents
(Broder)
On the resemblance and containment of documents
(Broder)
You and half the rest of the net's users
http://www.useit.com/alertbox/9707b.html
quote: "half of all users are search-dominant, about a fifth of the users are link-dominant,
and the rest exhibit mixed behavior."
Hit the nail on the head with this one. This is the bit where the Cisco "better together" argument actually makes sense. It's also the part of the puzzle many of Cisco's biggest customers (ISPs, Fortune 500, Governments) really care about - Slashdotterss may be able to keep their ten-node home networks clean easily but these guys really struggle to keep their 10,000 node networks clean.
It's not just Malware - it's Spam, Phishing, Spyware, Botnet C&C traffic - basically anything bad on the net. The amount of data Cisco has on this stuff as a result of telemetry from their routing and switching business and the more importantly the previous Ironport, SourceFire, TheatGrid and ScanSafe acquisitions is huge - arguably the richest set of security related data in the business. Simply adding the WebRep domain levels blocks from Ironport's data to OpenDNS would improve the overall protection massively.
Of course, Cisco's ability to successfully integrate all of this stuff without falling over themselves is another story - one of the reasons why I left.
This is exactly how it is done on many commercial services (including the one I work for). It works pretty well apart from a few gotchas:
- Blocking of elements within a page (such as images hosted on 3rd party servers, Javascript, AJAX calls). In this case, the end user doesn't get to see a block page because it's incredibly difficult to get the browser to display anything sensible. This is particularly true in the example of AJAX calls for 3rd party websites. For example, if the censor blocked a post on Facebook by altering the returned javascript they could put up a message, but then they're in a constant race to keep up with every subsequent change that Facebook make to how those messages are sent.
- Blocking HTTPS requests. Some browsers follow the redirect, some don't. This also seems to change from version to version.
- Blocking HTTP calls when the end client isn't a real Web browser.
- Blocking of files where the censor has already started to send the content. This is typically done for large files and streaming media. For example, if an ISO image is blocked for containing malware, the scan can't be done until well into the download. However, if a pure store-and-forward model is applied, the browser will have given up and timed out long before the censor has finished downloading and scanning.
A censorship code might help the first three cases in that the browsers could display (sort-of) sensible messages. I don't think anyone has a good answer for the last case.
Some papers:
The original definition of pagerank.
w to compute it quickly.
s it ive.html
http://citeseer.nj.nec.com/page98pagerank.html
http://www.google.com/technology/
The laymans definition.
http://dbpubs.stanford.edu:8090/pub/1999-31
Ho
http://citeseer.nj.nec.com/haveliwala02topicsen
How to make it topic-specific
Yes, the NEC!
Also relevant is RiscOS's lack of a conventional file requester. The article says "We have the technology. So why do we still make people use filepickers at all? Cruft." RiscOS has never required these. The user can simply go to save, and drag the icon to the appropriate place (be that a folder or another application). Similarly, dragging files into applications is the standard way to load them. Most microsoft applications will now also load files in this way, but don't provide the same facilities for saving.
I haven't seen anyone mention this, but has anyone considered the effects of a wormhole in a computer ? Let's say we have an iterative function, for example x1= 0.5 * (x0 + 2 / x0) (which will find the square root of 2). Given a wormhole computer, we can send the result of x1 back in time to be used as x0. Thus we could compute values for any iterative function in the time it takes to compute a single iteration. This would work even with very small (subatomic) wormholes. A usable wormhole need only be large enough to send a few electrons through. Perhaps even electrons are larger than we need ? I can't believe I'm the only one whose thought of the implications of this, so does anyone know of any work in this area ?
I still have my original Amiga A590 hard drive.
That's a WD 20Mb XT-IDE drive from around 1989.
Still works, although at 150 k/s it's a bit slow!
"AltaVista ... doesn't try to eliminate
redundant or very similar pages (or subpages)
like Google does."
Given that Andrei Broder, CTO of Altavista has written at least 4 papers (some dating back to his days at Compaq SRC) describing the algorithms used by Altavista to detect near duplicate documents, I think you need to do some more research. FYI, the four I know of are:
You and half the rest of the net's users http://www.useit.com/alertbox/9707b.html quote: "half of all users are search-dominant, about a fifth of the users are link-dominant, and the rest exhibit mixed behavior."
This research is from the same people who created CiteSeer