Can rev="canonical" Replace URL-Shortening Services?
Chris Shiflett writes "There's a new proposal ('URL shortening that doesn't hurt the Internet') floating around for using rev="canonical" to help put a stop to the URL-shortening madness. In order to avoid the great linkrot apocalypse, we can opt to specify short URLs for our own pages, so that compliant services (adoption is still low, because the idea is pretty fresh) will use our short URLs instead of TinyURL.com (or some other third-party alternative) replacements."
On the Twitter /. feed, this of course shows as:
slashdot Can rev="canonical" Replace URL-Shortening Services? http://tinyurl.com/c3j4n8
P.S. Now if you want a really short URL, try http://tinyarro.ws/ (no affiliation; just impressed by the idea)
How about Twitter just stops arbitrarily limiting characters. Go by word count, perhaps?
I know some avid twitter users, and the majority of them apparently use the idiotic SMS message system to 'tweet' each other all throughout the day on their phones. Twitter can't abandon the 140-character limit for this reason.
For the record, I am against anything that keeps the SMS system relevant in this day and age. It should have been abandoned long ago in favor of standard data packets on the internet, rather than control packets on a proprietary wireless system. There's no good reason to keep this system alive when it either forces you to pay $X per month for it, or pay $.15 per 140 characters when one of your idiot friends 'texts' you. There's no way (that I know of) to force incoming SMS to route through GPRS, so you are hit with SMS fees even when you already pay for unlimited data. It also invites spam that you actually DO pay for, quite literally, and from which the wireless carrier profits as well. It should be illegal for the carrier to charge you for incoming SMS messages. Anyone who agrees with me should call their congressperson to protest this policy and call their wireless carrier to block all SMS messages.
Unfortunately, it's not yet an integral part of web frameworks that I have seen. So I am adding it in a new web site I'm building. It means I have to add the feature to the web server.
It works like this. Every part of the web site code that builds URLs for the same site passes them first through the mapping logic. This basically builds an SHA1 checksum of the canonicalized URL string. Then it looks up the string in a fast database (I'll be using Berkeley DB for this). If it's already there, and is the same URL, it generates a new URL that references the checksum. If it was a different URL, it notifies me that it found an SHA1 collision. If not already there, it adds it. The original URL is thus replaced with the mapping URL.
Code added to the web server will be designed to detect checksum URLs. If it looks like one, it looks it up in the database to get the original URL, and proceeds with the request using that URL. Original URLs would still be processed as usual, in case they leak out, or are intentionally made to bypass the mapping for special purposes. Basically it's like a tiny URL service, but integrated without the need to do a redirect.
One thing I am looking at doing is shortening even these URLs, even though they should be short enough already. But this raises the chance for a collision to the point I'll need to add logic to deal with it. How I would do that is similar to a hash data structure collision, but by expanding on the SHA1 checksum by adding back digits that were removed to shorten it.
External URLs to other sites can be done the same way. This does add the extra redirection. I could limit the use of this only to long external links, since this being a web interface, should handle long external links OK. It could be an option.
now we need to go OSS in diesel cars
A couple of good questions I have seen, and my best attempt to answer them:
1. Don't you mean rel? No, I mean rev. It indicates a reverse link.
2. Why not make your URLs short in the first place? I happen to like my URLs and have made them as short as I want them. They're only too long in some very specific use cases, like Twitter. I could just complain about Twitter, or I could support an idea that makes URL shortening suck less. I chose the latter.
Thanks for reading, and please do feel free to criticize whatever you think is wrong with this idea. I'd like a way to indicate a preferred short URL for my own stuff, and this seems like a pretty good way to do it that makes sense semantically and is easy to implement. For an ongoing discussion about adding an HTTP header to do the same thing (so that only a HEAD request is required), read here:
http://shiflett.org/blog/2009/apr/a-rev-canonical-http-header