There are techniques to do this but none have made it out of academia. Most are quite inefficient and support very restricted querying models. Here's one paper that claims their methods are "practical" (but always keep in mind that academic claims of practicality should always be taken with a grain of salt):
Google wants an auto updater so badly because it allows them to gather more information on you. Why else would it have ever included a unique identifier?
The purpose of the ID is described here. But you may need to take off the tin foil hat before you can understand it.
GoogleUpdate also uses its own, randomly-generated unique ID number to accurately count total users. This information includes version numbers, languages, operating system, and other install or update-related details, such as whether or not the applications have been run. This information is not associated with you or your Google Account.
FAST ?! You mean that company that was a front for a bunch of norwegian scammers that tricked MS into buying it? Also known as the "Enron of Norway"?
Surely you jest.
Wow, someone actually read the article! The article is not about the outcomes of the various cases discussed, but the tactics that are being used. It's a good read.
That ID is only ever sent if you opt in to sending usage stats and crash reports. And if you were dumb enough to opt in when you're paranoid about these sorts of things, you can opt out with the "Under the Hood" menu.
RTFA. It's evil if revenue_from("business.com") > ad_revenue_from("sourcetool.com"). business.com is a partner, whereas sourcetool is not. Isn't this the definition of monopoly abuse?
You've been mislead by the article into assuming the sole reason business.com is being treated better somehow is purely due to its partner status. Oh, the eeeevilness of it all!
I'll grant you that after a quick glance it does seem that business.com is an equally useless site, but where is the evidence its partner status has anything to do with this? Perhaps sourcetool got the boot because of misleading creatives or poorly targetted keywords. There could be many other explanations, each a lot more plausible to me than "monopoly abuse". But I guess the article's author correctly figured that an honest and well researched article might not have attracted as much attention.
More "oh look google is becoming evil!" nonsense.
How exactly is it evil or "acting like Microsoft" to refuse over a half million dollars in revenue every month in order to prevent some lame ass site from annoying real users: the people who actually use the search engine to find information? People should try to use SourceTool before they draw any conclusions. I'm sure NOBODY would visit that site unless tricked into clicking on one of their ads. Don't you think if the site actually provided any real value, they could get plenty of visits through other means such as organic search listings?
And guess where live.maps.com is on Google's search?
Go look... no it's not on the first page....
Go to the second page of results... Ah yes half way down.... HMMMM
I think Google has a case to answer here, I simply don't believe Microsoft maps can possibly legitimately be ranked where it is.
Because you are an idiot. Go back to live.com and see where it shows up in the *search* results for maps (sponsored links DO NOT count, duh!). I tried, and the site appears nowhere in top TOP 50 results.
Hilarious, come on all you Google fanboys/MS anti-fanboys.... try and spin this one into yet another Microsoft bashing session I dare you, then I can see something truly imaginative.
Personally though, I'm happy that Google employees choose to spend more time building excellent products. Let's face it, most academic papers are pointless intellectual masturbation anyway.
Something related to iTunes - a study of the randomness of party shuffle in iTunes. This article does a bit of research and comes up with a function!
Yep, and there was even a previous slashdot story about this very article.
The function from the article to describe play probabilities grossly overfits the data. A much simpler formula is more likely correct, as I demonstrated in a comment submitted to this previous story.
It's the algorithm. It's straight complexity theory; C/C++ is not a panacea. If you write a 2^n or n! algorithm in C, it'll have its doors blown off by an nlogn algorithm in Python.
A programmer who knows how to choose the right algorithm will do so regardless of the language being used. So given the correct algorithm, it boils down to the BIG FAT CONSTANTS that determine better performance. Lower level languages like C++ can make those constants smaller.
From the article:
"AdCenter will give advertisers sophisticated information about consumers, including their location, age, gender and sometimes, their level of wealth."
Could MS be misusing all of that registration data they have been collecting? Or have they silently added another few hundred lines to their EULA / TOS?
Yep, especially considering that 87% of U.S. citizens can be uniquely identified by Zip+Gender+date of birth (see Sweeney, Uniqueness of Simple Demographics in the U.S. Population, 2000). They may as well be handing over your full name too.
Funny how they tout their privacy-invasive demographic targeting stuff as a distinguishing feature of their system compared to Google. It's one thing for MS to know a lot about you, but by affecting the display of ads based on your personal information, some of it is being leaked to advertisers each time you click. No thanks, MS.
Which is reasonably close to what the author found.
Your numbers may seem "reasonably close" from a casual glance, but if you do the math, even for 99% confidence intervals his numbers are accurate within +-.005. Your approximation falls well outside this. Note he had a 52,000+ sample size -- that's pretty big!
Yours is a nice guess though, perhaps only slightly off in some way. I agree iTunes must use some simple formula or algorithm to derive the probabilities, and not the crazy equation in the linked story.
I looked at the powerpoint presentation and the paper itself, and no, I would not call the work BS. The powerpoint presentation shows some graphs based on the outcome of their simulations, which are well documented in their paper. The BitTorrent model used in their simulations might be flawed (as Bram has accused), which indeed brings some of their claims into question, but it certainly doesn't invalidate them. Models used in simulations are necessarily simplified. The fact that their experiments are well documented allows anyone to repeat them, possibly with corrections to their BitTorrent model, in order to confirm or contradict their findings. This is the sign of good research.
I suggest you read the paper -- it's a nice idea, even if it has not yet been perfectly evaluated.
People around here seem to share Dvorak's gross misunderstanding of what research papers are all about. They are NOT product announcements!
Azureus + the Safepeer/PeerGuardian plugin specifically blocks much nasty stuff out.
All that does is block bad IPs. That won't do squat if you're downloading and running an application with malware inside. The real solution is to use something like bitzi which lets you check if a given file/app you are downoading is known to have "issues."
Getting the server to ack each page is going to be very costly, plus it doesn't actually solve the fundamental "man in the middle" vulnerability of HTTP, which is the basis of this and many other attacks on HTTP and its implementations.
There are already some simple proposals that go a long way to solving the man in the middle issue already, without resorting to grossly inefficient schemes such as yours (or HTTPS). The problem is in getting them adopted.
Agreed. The parent poster obviously hasn't actually listened to these internet streams if he/she thinks they are equivalent to 64/128k mp3's in quality. Even XM's satellite streams don't achieve 128k mp3 quality. I'm no audiophile, but I find the quality of XM audio very disappointing (and yes I am a subscriber.) I've heard Sirius is no better. It really bothers me when I hear them claim CD quality sound....
Doh! Just noticed you already are aware of that particular work. Anyway, congrats, you're already aware of the state of the art!
http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf
Bait and switch would be just like these guys!
Google wants an auto updater so badly because it allows them to gather more information on you. Why else would it have ever included a unique identifier?
The purpose of the ID is described here. But you may need to take off the tin foil hat before you can understand it.
GoogleUpdate also uses its own, randomly-generated unique ID number to accurately count total users. This information includes version numbers, languages, operating system, and other install or update-related details, such as whether or not the applications have been run. This information is not associated with you or your Google Account.
FAST ?! You mean that company that was a front for a bunch of norwegian scammers that tricked MS into buying it? Also known as the "Enron of Norway"? Surely you jest.
Wow, someone actually read the article! The article is not about the outcomes of the various cases discussed, but the tactics that are being used. It's a good read.
That ID is only ever sent if you opt in to sending usage stats and crash reports. And if you were dumb enough to opt in when you're paranoid about these sorts of things, you can opt out with the "Under the Hood" menu.
Google is actually pretty open about what they log for Google suggest. http://googleblog.blogspot.com/2008/09/update-to-google-suggest.html
You've been mislead by the article into assuming the sole reason business.com is being treated better somehow is purely due to its partner status. Oh, the eeeevilness of it all!
I'll grant you that after a quick glance it does seem that business.com is an equally useless site, but where is the evidence its partner status has anything to do with this? Perhaps sourcetool got the boot because of misleading creatives or poorly targetted keywords. There could be many other explanations, each a lot more plausible to me than "monopoly abuse". But I guess the article's author correctly figured that an honest and well researched article might not have attracted as much attention.
More "oh look google is becoming evil!" nonsense. How exactly is it evil or "acting like Microsoft" to refuse over a half million dollars in revenue every month in order to prevent some lame ass site from annoying real users: the people who actually use the search engine to find information? People should try to use SourceTool before they draw any conclusions. I'm sure NOBODY would visit that site unless tricked into clicking on one of their ads. Don't you think if the site actually provided any real value, they could get plenty of visits through other means such as organic search listings?
Because you are an idiot. Go back to live.com and see where it shows up in the *search* results for maps (sponsored links DO NOT count, duh!). I tried, and the site appears nowhere in top TOP 50 results.
Hilarious, come on all you Google fanboys/MS anti-fanboys.... try and spin this one into yet another Microsoft bashing session I dare you, then I can see something truly imaginative.
You've already succeeded all on your own.
Seems they have plenty of freedom to publish papers to me:
Google Publication List
Personally though, I'm happy that Google employees choose to spend more time building excellent products. Let's face it, most academic papers are pointless intellectual masturbation anyway.
Something related to iTunes - a study of the randomness of party shuffle in iTunes. This article does a bit of research and comes up with a function! Yep, and there was even a previous slashdot story about this very article. The function from the article to describe play probabilities grossly overfits the data. A much simpler formula is more likely correct, as I demonstrated in a comment submitted to this previous story.
Right, and it's called Orkut, currently used by just about everybody in Brazil.
It's the algorithm. It's straight complexity theory; C/C++ is not a panacea. If you write a 2^n or n! algorithm in C, it'll have its doors blown off by an nlogn algorithm in Python.
A programmer who knows how to choose the right algorithm will do so regardless of the language being used. So given the correct algorithm, it boils down to the BIG FAT CONSTANTS that determine better performance. Lower level languages like C++ can make those constants smaller.
Could MS be misusing all of that registration data they have been collecting? Or have they silently added another few hundred lines to their EULA / TOS?
Yep, especially considering that 87% of U.S. citizens can be uniquely identified by Zip+Gender+date of birth (see Sweeney, Uniqueness of Simple Demographics in the U.S. Population, 2000). They may as well be handing over your full name too.
Funny how they tout their privacy-invasive demographic targeting stuff as a distinguishing feature of their system compared to Google. It's one thing for MS to know a lot about you, but by affecting the display of ads based on your personal information, some of it is being leaked to advertisers each time you click. No thanks, MS.
Not odd at all... many common TLS/SSL modes involve RSA based session key establishment. There is no man in the middle risk if implemented properly.
points(0 stars)=1
points(1 stars)=3
points(2 stars)=4
points(3 stars)=5
points(4 stars)=6
points(5 stars)=7
probability(X stars) = points(X stars) / 26
This yields the following probabilities, listed along side the observed values from the article along with 95% condience intervals.
p(5 star)=.2692 [.270 +- .0038] .0036] .0033] .0031] .0027] .0016]
p(4 star)=.2308 [.230 +-
p(3 star)=.1923 [.189 +-
p(2 star)=.1538 [.154 +-
p(1 star)=.1154 [.118 +-
p(0 star)=.0385 [.039 +-
As you can see each computed probability falls within the 95% confidence interval, so there's a good chance this is the correct forumla.
Boy do I have too much time on my hands today.
Your numbers may seem "reasonably close" from a casual glance, but if you do the math, even for 99% confidence intervals his numbers are accurate within +- .005. Your approximation falls well outside this. Note he had a 52,000+ sample size -- that's pretty big!
Yours is a nice guess though, perhaps only slightly off in some way. I agree iTunes must use some simple formula or algorithm to derive the probabilities, and not the crazy equation in the linked story.
I suggest you read the paper -- it's a nice idea, even if it has not yet been perfectly evaluated.
People around here seem to share Dvorak's gross misunderstanding of what research papers are all about. They are NOT product announcements!
All that does is block bad IPs. That won't do squat if you're downloading and running an application with malware inside. The real solution is to use something like bitzi which lets you check if a given file/app you are downoading is known to have "issues."
There are already some simple proposals that go a long way to solving the man in the middle issue already, without resorting to grossly inefficient schemes such as yours (or HTTPS). The problem is in getting them adopted.
Heheh... yes, too funny. I've been getting what is basically that same e-mail EVERY YEAR since 1999.... yet STILL my spam filter doesn't stop it!
Agreed. The parent poster obviously hasn't actually listened to these internet streams if he/she thinks they are equivalent to 64/128k mp3's in quality. Even XM's satellite streams don't achieve 128k mp3 quality. I'm no audiophile, but I find the quality of XM audio very disappointing (and yes I am a subscriber.) I've heard Sirius is no better. It really bothers me when I hear them claim CD quality sound....
Does for me, anyway.