Data Mining Amazon.com Wish Lists
Dr. Webster writes "In his article "Data Mining 101: Finding Subversives with Amazon Wishlists," Tom Owad of Applefritter outlines a way in which one could build detailed personal profiles of hundreds of thousands of U.S. citizens in a matter of hours. Reading habits, personal tastes and even political party affiliation could be inferred from the results, and through the use of Yahoo! People and Google Maps, one could even map out geographically where people with certain interests or affiliations live, down to their address. Most surprisingly, the process of doing this is completely legal, and doesn't even violate Amazon's Conditions of Use."
Mining voluntary information on a public website? Come back and tell us when you can mine the info as easily from say real amazon sales records of what I actually did buy not what I might want the public to think I am buying.
Obligatory music whenever data mining is mentioned... Privacy Song...
... Throw a monkey wrench right up their database.
Lie,Lie,lie... Lie about your age, your gender and your race.
Breaking news! People conducting surveys report other people freely giving away personal information! That could be an article from http://www.theonion.com/. Shocking. Call it a "wishlist" not a "voluntary survey about what you like" and it's an amazing invasion of privacy.
7h3$3 4r3n'7 7h3 Ðr01Ð$ ¥0 4r3 £00|{1n9 f0r. M0v3 4£0n9. --OB1
So THAT'S why I'm on the no-fly list
i was gonna say the same thing, I'm really for people's rights online (i'm like insane about freedom of speech and stuff if you ask my friends), but I honestly dont see what people expect. If you put information online for the public, this is what happens, no conspiracy, no illegal or suspicious activity. This information is voluntarily released for up for grabs, if people want to use it for that I dont see a problem.
Actually, if you had read the article, you would have seen that Mr. Owad does not reference "invasion of privacy" at all. What he does do is help people understand how information they share online can be used to create a general profile, and even to link them to others. The point of the article is to educate people. But, like others, you were probably just going on the snippet... I mean, why read the whole story when you can see the headlines via RSS, right?
... elipses...
...that I know freely subscribe to Amazon.com wish lists. They are like "lets overthrow the government that wants to jail us" but they are also all over "Lets let everyone know how we feel about corparations and the government by making wish lists that not only incriminate us but play into the hands of the very corporate droogs we hate... makes sense right." Anyone thinking they will get useful information about truly dangerous groups from Google Maps or Amazon Wish Lists needs to take a breather and sit down for a minute.
Did you RTFA?
He maps out (using google maps) the locations of the people who read certain books.
A lot of these wishlists have a city, state, full name and birthdate attached to them... which is more than enough for google to give you a street address (though not always with 100% accuracy)
Just to test it, i randomly picked a 'sarah' who had a wishlist. Turns out there's only one Sarah Johnson in Portland, OR.
[Fuck Beta]
o0t!
From the article:
On a final note, the FBI is now hiring computer scientists to implement a project that sounds very similar to what I just did:
"Currently, the FBI is strengthening systems engineering in order to tie new systems together architecturally and ensure that standards for custom and packaged applications are enforced, and it needs engineers to accomplish this goal, the agency said.
(etc...)
Where does he read data mining into this? I read that the FBI wants to update their computers to make their databases better. Their databases.
This article strikes me as scare mongering, and until I hear that the government plans on breaking the knuckles of people who read Aldous Huxley, I don't care about what's merely possible.
Even his crude filtering techniques can yield worthwhile leads for police/FBI. He says that the first result for bible is "The Cannabis Grow Bible: The Definitive Guide to Growing Marijuana for Recreational and Medical Use".
Is it so hard to imagine that a certain fraction of people with that book on their wishlist may either be growing weed, or have it in their possesion? Or that a percentage of people 'wishing' for the Improvised Munitions Handbook (printed by our favorite Uncle Sam @ the DoD) aren't chemists or demolitionists?
/doesn't have an Amazon wishlist and never will
[Fuck Beta]
o0t!
Profile for Jaish al Ashurah ] Wishlist
Wishlist
This list is for: Jaish Al Ashurah
Birthday: None Entered
Shipping Address: Private
Unique Facts: A shadu la ilaha illah Allah
Total items: 10
"The Anarchist's Cookbook" by William Powell
"Improvised Explosives: How To Make Your Own" by Seymour Lecker
"Ultimate Sniper: An Advanced Training Manual For Military And Police Snipers" by John Plaster
"Crusades Through Arab Eyes" by Amin Maalouf
"The Protocols of the Meetings of the Learned Elders of Zion With Preface and Explanatory Notes" by Sergius Nilus, Henry Ford, and Victor E. Marsden.
"Explosive Dusts: Advanced Improvised Explosives" by Seymour Lecker
"Creative Cloth Doll Making: New Approaches for Using Fibers, Beads, Dyes, And Other Exciting Techniques" by Patti Medaris Culea.
"The Tragedy of Karbala" by M.A. Naquvi
"51 Documents: Zionist Collaboration With the Nazis" by Lenni Brenner
"How to Build a Nuclear Bomb: And Other Weapons Of Mass Destruction" by Frank Barbaby
South Park pokes fun at sacred cows to make a point. Family guy pokes cows to hear them moo.
In my county in NC, if you want a party affiliation all you need to do is look it up on the public records website:f m
http://www.co.durham.nc.us/common/PublRecordsdB.c
You can also figure out how much someone's house is worth, what they paid in taxes, etc.
It starts to get a little scary though when your search for public records reveals mortage applications with the individual's SS# listed on the sheet. All available online, and provided for by your very own government!
Look at a dozen random wishlists and you'll find the same pattern. Customer tried wishlist on December 11, 2002. Added Harry Potter and the Goblet of Fire. Never used wishlist function again.
...when you put that inflatable nun and bottle of baby lotion on your wishlist. woops...
Next time you see a recommendation like that, you can click the "Why was I recommended this?" link under it and then uncheck the "Use this for recommendations" checkbox by the items that you don't want to be used as sources for your recommendations. Alternatively, you can go into "Your Store" through the tabs at the top and then go to the Improve Your Recommendations section and find the items and uncheck the same checkbox.
It may not be 'real news' but I don't think it should be dismissed as completely irrelevant. (Like 95% of current commentators have done).
First, on relevance of wishlists:
Granted that wishlists are not the most accurate estimates of your preferences, what is? My list contains over 50 books, and for the most part they are all related to each other. In fact, I would say that by looking at my list you would have a pretty accurate gauge to measure my interests. Am I an anomaly? Possibly. (Though I doubt it)
But it still makes you wonder how then does Amazon produce dozens and dozens of relevant suggestions to each of your books. For example, I often add a book to my shopping cart just to see the "what other people have bought when they bought this book x". Click, Wishlist, click, Wishlist. I think it's naive to dismiss wishlists completely. In fact, I'm sure that you will be able to successfully data mine data obtained from the wishlists and extract interesting and useful information.
Now, the actual experiment:
An interesting observation that I've recently read about developments in AI: "It stops being AI once it hits the mainstream". It's true, and it's happening here. The idea does not capture anything new, but the application is interesting. You can find out what people are reading and where. (And that's a powerful tool!) It open a big can of worms: advertising, targeting social groups, other 'moral' and 'immoral' uses. To those who know how to utilize it, this might prove to be a goldmine.
When they first started the idea, they gave it some PR, but now it's sort of a low man on the totem pole, relegated to the backwaters. When I checked 6400+ cities, only 2800 of them were recording enough activity to warrant a bestseller or "uniquely popular" list.
They generate the 2 types of lists for 5 classes of items: books, CDs, DVDs, toys, and consumer electronics. Now this might not be as potentially compromising as finding out a single person was ordering subversive books. Yet finding out a small town in Alabama's bestselling genre is showtunes is definitely something interesting.
- Greg
Start a happiness pandemic
a) I only want to share my wish list with people I trust;
b) I only want to share certain sublists with certain people.
They do! Go to "edit wishlist" and the second item after you name the list is "This list will be viweable by:" and it gives three choices: "Anyone who searches for me," "Only people I have invited with the 'Share this list' feature," or "Only me."
Very simple principle. Lots of data is individually acceptable, but when compiled or processed, is unacceptable.
For example, say you maintain a Slashdot identity that you don't link to your real name. While no one post of yours may be sufficient to tie your identity to your name, the sum total may be sufficient.
Or security cameras. Most people don't worry about *one* security camera, but a lot of people get concerned when they are constantly being monitored by cameras which are tied together by computer to monitor where they go each day.
Any program relying on (nontrivial) preemptive multithreading will be buggy.
Is it considered bad if I recognize and have read about half of those books?
If this guys links Amazon Wish Lists, Google Maps, the yellow pages, and personality typing using Ruby on Rails, he can call it a Web 2.0 Mashup and make millions when Google, Yahoo!, or Microsoft buys him out.
I smell a fully monetized eyeball!
- "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
No the FBI or anyone else would never bother with amazon wish list. They would simply get the sales records. This guy does not have access to those so he uses what he can to prove his point.
Yes it is scary. Especially for those of us who have family (or more to the point do not have family) killed for expressing the wrong ideas.
I however don't think we should blame the FBI or similar agencies, they are the instruments of us the people. It is we who have voted the current goverments into power. Corruption you say? Well then it is you and me that have allowed that to happen. I do not believe in the mythical innocent citizen. Others have died for freedom. No reason we should be allowed to sit on our backsides and complain our freedoms are taken away. FIGHT
Not that I will of course. I know deepdown that what is happening is wrong and also know that I am one of the cattle. Perhaps it will make it easier when I am put in a cattle wagon to be gassed.
The problem with fighting for your freedom is that one persons freedom fighter is another persons terrorist.
I ain't got an answer or a solution except to suggest "PAY CASH". Even if your part of the herd there is no reason to make it any easier for them to send you off to the slaughterhouse.
Will it happen? It has happened countless times before. Check the McCarty trials. The treatment of Japanse americans vs German americans. The gunning down of american citizens by police during peace protests. The way england handled the RIA and labor strikes. All of them pretty recent.
Something scary might happen in our lifetimes. Or not. This is one tiny example to prove that it won't be hard on the technical side. Now all we need to is to elect leaders crazy enough to do it. /me looks at the current leaders of the "free" west. Too late.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
You're a fool. And yes, the company you work for is also an idiot for using john.smith@megacorp.com as your mandatory email address. All your doing is making an index for yourself into the biggest rolodex on the planet. People argue that some names are so regular no one could possibly narrow it down, but a simple whois can help narrow things down to a particular state. Public legal records from there can make things more interesting.
Join the Slashcott! Feb 10 thru Feb 17!
Using data mining to catch criminals is nothing new and there is nothing wrong with it. Many white-collar criminals have be caught "cooking the books" using this kind of process. Having said that, I also have to say that there is a point where this practice can go too far. It can become an invasion of privacy that could cast the shadow of suspicion on to ordinary, law-abiding people.
Suppose you were a person who likes surfing the net to read things like "The Anarchist's Cookbook" (an entertaining read) who is also curious about Muslim Extremisim (because it is so often in the news) and is planning a car trip with your family to New York City and Washington D.C. Perhaps you have downloaded maps and driving directions to the Capital, the White House and the United Nations Building from MapQuest. Maybe you have visited EBay and bought some reloading equipment (because you are a sport-clay shooter).
Now imagine some data mining application at fbi.gov puts all of this information together and concludes that you are an extremist who is about to embark on a trip where you plan on bombing the United Nations building in New York City and the Capitol and the White House in Washington DC!
Seperate and disparite pieces of data aren't always able to fit nicely into a simple formula. This is where the danger of this kind of information comes in. Taken seperately and considered without an adequate foundation, these "facts" tend to support a totally erronious conclusion. Next thing you know, someone is quietly asking questions about you abd you have no idea why.
These kinds of things have happend to innocent people before. Someone I know faced scrutiny years ago shortly after the Oklahoma City bombing. There was no real reason for his being suspect and it took a long time to figure out why they looked at him. The FBI questioned his neighbors, they followed him, photographed his home, and in general made life uncomfortable for him.
It took time to figure it out but, we finally concluded that there were reasons why he came to their attention. They were:
- He was a gun collector
- He bought gunpowder by the pound (he was a re-loader)
- He worked at a facility where he may possibly have had access to amonium nitrate
- He lived alone
- He lived in the wrong place (outside of town in an area linked to suspects)
- He had several 55 galon oil drums on his property
- He was a member of the NRA
To the FBI all this information seemed to indicate that he could possibly be linked as the third man in the Oklahoma City bombing. Nothing could have been further from the truth but for a few tense weeks, he was the focus of enough attention so that he felt like he could not visit friends, go target practicing, or do much of anything. He got paranoid and asked us to not call him because he thought he may be wiretapped. It really ate him up inside and he had done nothing wrong. The truth of the matter is that he is one of the most law-abiding people around. He had not done one illegal thing to draw this suspiscion on him. Litterally, he was just in the wrong place at the wrong time. He is just a kind of quiet guy who likes to keep to himself.
I don't think that data mining brought this investigation on him. I think his name simply popped up on too many lists (which is in a way, a form of manual data mining). Still with computers and access to hundreds or thousands of different data sources, the possibilities have compounded themselves making this kind of process likely to impact too many poeole. Innocent people.