Tim Berners-Lee Is Sorry About the Slashes
Stony Stevenson writes "A light has been shone on one of the great mysteries of the internet. What is the point of the two forward slashes that sit directly in front of the 'www' in every internet website address? The answer, according to Tim Berners-Lee, who had an important role in the creation of the web, is that there isn't one. Berners-Lee revisited that design decision during a recent talk with Paul Mohr of the NY Times when Mohr asked if he would do any differently, given the chance. 'Look at all the paper and trees, he said, that could have been saved if people had not had to write or type out those slashes on paper over the years — not to mention the human labor and time spent typing those two keystrokes countless millions of times in browser address boxes.'"
The structure of a URL is:
protocol://domain/path
When you use the 'file' protocol, there is no domain, there is only a path. Thus the domain part of the URL is omitted and you get a triple-slash.
So, what would you regexp for if all you had was a ":"? Normal text quite often does contain colons....
Back in the early 1980s, before the folks at CERN gave us the first browser, there was another notation that was implemented by an assortment of networking software. It originated, as far as I can tell, with The Newcastle Connection (from the U of Newcastle-upon-Tyne" in England), one of the first fully-distributed unix file systems. What it did conceptually was to define a conceptual network directory one level above your root directory, named "/../". So to reference a file on machine X.Y.Z, you'd use a path like "/../X.Y.Z/...". The actual server on each machine typically wouldn't export its "/" directory, but rather would do what web servers do, and supply only a server-root directory (which could also be mounted by other machines by the unix mount command). So if you tried to access the file /../X.Y.Z/some/dir/foo.txt, you'd get the file that the remote machine had at /server-root/some/dir/foo.txt, so files outside the /server-root/ directory would be invisible to outsiders.
This is, of course, merely another syntax for what the WWW calls "http://X.Y.Z/some/dir/foo.txt", but without the protocol field. The TPC implementation made the file readable or writable, depending on what the permission module allowed, via the usual open(), read(), write(), etc. library routines. This meant that all of the software on your machine was automatically able to use accessible files on other machines without any special coding. As with the Web, you just needed the machine name and the file's location relative to the server-root directory.
The advantage of the Web's "http://" notation, of course, is that it allow the explicit use of different protocols. TNC's "/../" notation doesn't do that; the implementation gives direct access via the usual file-system routines, and hides the comm protocol inside the kernel's file-system code just as is done with local file I/O.
Note that the "/../" notation isn't any more difficult to match than "http://", and it's a string that's equally unlikely to occur anywhere but in a TNC-style file reference. And note that there's no problem with adding a ":port" to the machine name with either notation.
I've sometimes wondered why various browsers, especially the mozilla suite, haven't quietly implemented TNC notation and invited users to start using it. You don't need permission from any standards body to do this. It would only take a few lines of new code, wherever the software parses URLs. You'd have to add "/\.\./" as an alternative to "(\w*)://" at the start of the match, and make 'HTTP' the default protocol if omitted. While you're at it, add another * after the //, so omitting the second / will also work. But that's probably too user-friendly for any real web developer to bother implementing. ;-)
(Actually, I've done this in a few projects that I've worked on. It doesn't break anything, and when people see that notation, they usually really like it and the new conceptual model of the Net that it puts into their mind. The Net becomes just a large, slow bus connecting millions of machines and their disks, joining them into one huge virtual computer. Replacing a big, messy communication protocol with a big, tree-structured file system gives a major reduction in complexity and points to a much easier way to do things.)
Those who do study history are doomed to stand helplessly by while everyone else repeats it.