"They log on as a normal user. They see your computer. They see files. They have your IP..."
Wrong. Some P2P programs will not give the IP address when you look at someone's files. Instead, the network tells you what files that person is supposed to have. You don't get their IP address at all -- you get the IP address of whatever supernode you are connected to, and that's all.
If you want to have the IP address of the person actually sharing those files, you'll have to download (or upload) something to them. Only then you will have a "direct" connection to them and get the IP address you are interested in. Downloading "30 seconds" won't work either... it's not that hard to devise a P2P app that uploads files in bits and pieces and in such a way that it's impossible to identify the file without a significant amount of data being transferred (infact, some P2P apps already do).
In a few years from now (when bandwidth costs decreases further -- and it will) super nodes can probably start acting as intermediaries for even complete uploads and downloads. It's just a matter of time really.. bandwidth will just get cheaper:)
"The IP address is unique at the given point of time that they're connected to the internet."
Please prove the above statement. I find it rather hard to believe that there couldn't be a spoofed IP address on a de-centralised network consisting of millions of networked machines which have little knowledge of what else is out there.
Sure, it would make some packets take the wrong routes if there were two machines with the same IP address, but I'm sure that "local" to your network area it would work well enough to use P2P apps.
As a news site or portal, I'd be FAR more worried about news.google.com:) When they first introduced the feature I found it quite amazing that most news sites co-operated and allowed(?) Google to index all the news and present it in that fashion. It has a large potential to become the main portal for the news, with all the other major news sites being mere content providers when people decide they'll pick their article from Google out of the dozens available on the same topic...
As for me, I already only use news.google.com, slashdot and a few small more specialized local news sites for my daily news needs. If news.google.com includes features for local news (instead of just the US) and perhaps a nice commenting system then I'd probably use it exclusively...
The problem I really have with even a free registration is that it is yet another hoop I have to jump through for content that is also available (albeit in a maybe slightly different form) at other sites which donot have these policies. A thousand other news sites are willing to serve me their news without the registration hoop -- I really don't see why the NYT is any more special. As for their image: I think of them as the News site that is too damn stubborn to drop the registration and just display the articles like all the other sites do.
Registration imho is just silly. Since nobody fills out such registrations with any real information anyway (it gets tiring after the first dozen forms orso) the information is probably so wrong you might as well be anonymous. If you are assuming the information is bogus anyway, why not put a cookie on their machine with a unique number (I have no problem with that (yet), as it doesn't annoy me or cost me any extra time) and use that to track that user's actions. You can find out quite a lot that way (seeing what articles he/she likes, how they navigate the site, approximately where they come from, etc..) This would be MORE information than they are getting from me now, which is none -- I'm sure the other sites are doing this already just looking at the huge lists of cookies on my machine.
I like google's caching quite a lot. I use it almost exclusively these days before visitting the actual page (if I even get that far). Using Google's cached link has the advantage of:
1) Speed... Google's cache is fast. If there's one thing that annoys the heck out of me, then its websites that take more than 5 seconds to load. This is quite annoying when its caused by javascripts, slow servers or popup ads when Google can serve me effectively the same page in under a second -- especially when I'm not even sure if it is the right page, the one I'm looking for.
2) Nice highlighting so I can quickly page down to whatever I was looking for (now if only Google blocked those Tripod background pictures which makes their cached pages unreadable..) Sometimes I wish Google made their highlight examples at the top clickable so it jumped to the first appearance of the keyword immediately.
3) Using Google's cached links usually blocks silly popups and other annoying stuff too many websites seem to incorporate these days.
Perhaps I'll make a proxy server which browses the web exlusively using Google's caching... word highlighting on all pages, fast browsing everywhere and working links to more cached pages... should work fine for any webpages below 100kB:)
As for the NY Times being annoyed with Google's cache, they can easily fix that themselves. Either that or Google's spiders are a lot smarter than I thought to automatically register themselves for the NY times. Furthermore, as far as I'm concerned everything that's publicly accessible on the web without some form of password protection (which would of course also block robots) should be cachable and archivable in whatever form you see fit. Respecting robots.txt is no more than a courtesy as far as I'm concerned. If you don't want your pages to be archived or cached or whatever, then by all means protect your page, or donot put up a webpage in the first place (I'm sure a thousand others will leap at the chance to fill the void).
Assuming that at some point the hashing algorithm will become public knowledge (hacked or otherwise), then you can fake your binary to get any hashkey you want.
For example, if you got a MediaPlayer program and its approved, then you could subsequently modify it. Make it for example stream the unencrypted data to a file, instead of displaying it. It would involve tweaking the binary a bit so it would produce the correct hashkey.
Given a hashkey system that generates say 128-bit hashkeys, then you can create any given hash key with your binary by just altering 128-bits at the end of the program (or in some unused string) until you get the right hashkey. This technique is already used to fool P2P programs into thinking a specific file served by someone is the same as the file you are really after, even though its protected by a hashkey.
Only problem I can see is that it might be too much work to find the combination that generates the correct hashkey; it would depend on the algorithm used, and how easy it is to guess what impact changes in the program have on the hashkey.
In principle I don't believe that such a system could be made hacker proof. There will be a point that you can either fool the system into thinking you are running signed software (by forging the hashkey at some point), or a point where you can capture the data unencrypted; once stored unencrypted the DRM will fall apart.
"They log on as a normal user. They see your computer. They see files. They have your IP..."
:)
Wrong. Some P2P programs will not give the IP address when you look at someone's files. Instead, the network tells you what files that person is supposed to have. You don't get their IP address at all -- you get the IP address of whatever supernode you are connected to, and that's all.
If you want to have the IP address of the person actually sharing those files, you'll have to download (or upload) something to them. Only then you will have a "direct" connection to them and get the IP address you are interested in. Downloading "30 seconds" won't work either... it's not that hard to devise a P2P app that uploads files in bits and pieces and in such a way that it's impossible to identify the file without a significant amount of data being transferred (infact, some P2P apps already do).
In a few years from now (when bandwidth costs decreases further -- and it will) super nodes can probably start acting as intermediaries for even complete uploads and downloads. It's just a matter of time really.. bandwidth will just get cheaper
"The IP address is unique at the given point of time that they're connected to the internet."
Please prove the above statement. I find it rather hard to believe that there couldn't be a spoofed IP address on a de-centralised network consisting of millions of networked machines which have little knowledge of what else is out there.
Sure, it would make some packets take the wrong routes if there were two machines with the same IP address, but I'm sure that "local" to your network area it would work well enough to use P2P apps.
As a news site or portal, I'd be FAR more worried about news.google.com :) When they first introduced the feature I found it quite amazing that most news sites co-operated and allowed(?) Google to index all the news and present it in that fashion. It has a large potential to become the main portal for the news, with all the other major news sites being mere content providers when people decide they'll pick their article from Google out of the dozens available on the same topic...
As for me, I already only use news.google.com, slashdot and a few small more specialized local news sites for my daily news needs. If news.google.com includes features for local news (instead of just the US) and perhaps a nice commenting system then I'd probably use it exclusively...
--Swilver
The problem I really have with even a free registration is that it is yet another hoop I have to jump through for content that is also available (albeit in a maybe slightly different form) at other sites which donot have these policies. A thousand other news sites are willing to serve me their news without the registration hoop -- I really don't see why the NYT is any more special. As for their image: I think of them as the News site that is too damn stubborn to drop the registration and just display the articles like all the other sites do.
Registration imho is just silly. Since nobody fills out such registrations with any real information anyway (it gets tiring after the first dozen forms orso) the information is probably so wrong you might as well be anonymous. If you are assuming the information is bogus anyway, why not put a cookie on their machine with a unique number (I have no problem with that (yet), as it doesn't annoy me or cost me any extra time) and use that to track that user's actions. You can find out quite a lot that way (seeing what articles he/she likes, how they navigate the site, approximately where they come from, etc..) This would be MORE information than they are getting from me now, which is none -- I'm sure the other sites are doing this already just looking at the huge lists of cookies on my machine.
--Swilver
I like google's caching quite a lot. I use it almost exclusively these days before visitting the actual page (if I even get that far). Using Google's cached link has the advantage of:
:)
1) Speed... Google's cache is fast. If there's one thing that annoys the heck out of me, then its websites that take more than 5 seconds to load. This is quite annoying when its caused by javascripts, slow servers or popup ads when Google can serve me effectively the same page in under a second -- especially when I'm not even sure if it is the right page, the one I'm looking for.
2) Nice highlighting so I can quickly page down to whatever I was looking for (now if only Google blocked those Tripod background pictures which makes their cached pages unreadable..) Sometimes I wish Google made their highlight examples at the top clickable so it jumped to the first appearance of the keyword immediately.
3) Using Google's cached links usually blocks silly popups and other annoying stuff too many websites seem to incorporate these days.
Perhaps I'll make a proxy server which browses the web exlusively using Google's caching... word highlighting on all pages, fast browsing everywhere and working links to more cached pages... should work fine for any webpages below 100kB
As for the NY Times being annoyed with Google's cache, they can easily fix that themselves. Either that or Google's spiders are a lot smarter than I thought to automatically register themselves for the NY times. Furthermore, as far as I'm concerned everything that's publicly accessible on the web without some form of password protection (which would of course also block robots) should be cachable and archivable in whatever form you see fit. Respecting robots.txt is no more than a courtesy as far as I'm concerned. If you don't want your pages to be archived or cached or whatever, then by all means protect your page, or donot put up a webpage in the first place (I'm sure a thousand others will leap at the chance to fill the void).
--Swilver
For example, if you got a MediaPlayer program and its approved, then you could subsequently modify it. Make it for example stream the unencrypted data to a file, instead of displaying it. It would involve tweaking the binary a bit so it would produce the correct hashkey.
Given a hashkey system that generates say 128-bit hashkeys, then you can create any given hash key with your binary by just altering 128-bits at the end of the program (or in some unused string) until you get the right hashkey. This technique is already used to fool P2P programs into thinking a specific file served by someone is the same as the file you are really after, even though its protected by a hashkey.
Only problem I can see is that it might be too much work to find the combination that generates the correct hashkey; it would depend on the algorithm used, and how easy it is to guess what impact changes in the program have on the hashkey.
In principle I don't believe that such a system could be made hacker proof. There will be a point that you can either fool the system into thinking you are running signed software (by forging the hashkey at some point), or a point where you can capture the data unencrypted; once stored unencrypted the DRM will fall apart.