Slashdot Mirror


Data Mining Amazon.com Wish Lists

Dr. Webster writes "In his article "Data Mining 101: Finding Subversives with Amazon Wishlists," Tom Owad of Applefritter outlines a way in which one could build detailed personal profiles of hundreds of thousands of U.S. citizens in a matter of hours. Reading habits, personal tastes and even political party affiliation could be inferred from the results, and through the use of Yahoo! People and Google Maps, one could even map out geographically where people with certain interests or affiliations live, down to their address. Most surprisingly, the process of doing this is completely legal, and doesn't even violate Amazon's Conditions of Use."

33 of 183 comments (clear)

  1. Mining voluntary information on a public website? by Saven+Marek · · Score: 3, Insightful

    Mining voluntary information on a public website? Come back and tell us when you can mine the info as easily from say real amazon sales records of what I actually did buy not what I might want the public to think I am buying.

  2. 3 Dead Trolls in a Baggie by Anonymous Coward · · Score: 2, Funny

    Obligatory music whenever data mining is mentioned... Privacy Song...

    Lie,Lie,lie... Lie about your age, your gender and your race. ... Throw a monkey wrench right up their database.

  3. This is not a story. This is not news that matters by Doomedsnowball · · Score: 3, Insightful

    Breaking news! People conducting surveys report other people freely giving away personal information! That could be an article from http://www.theonion.com/. Shocking. Call it a "wishlist" not a "voluntary survey about what you like" and it's an amazing invasion of privacy.

    --
    7h3$3 4r3n'7 7h3 Ðr01Ð$ ¥0 4r3 £00|{1n9 f0r. M0v3 4£0n9. --OB1
  4. I see by Anonymous Coward · · Score: 3, Funny

    So THAT'S why I'm on the no-fly list

    1. Re:I see by psykocrime · · Score: 4, Funny

      So THAT'S why I'm on the no-fly list

      No, that's because you ordered those Paladin Press, Delta Press, and IMS catalogs.

      --
      // TODO: Insert Cool Sig
  5. Re:Mining voluntary information on a public websit by insertwackynamehere · · Score: 3, Insightful

    i was gonna say the same thing, I'm really for people's rights online (i'm like insane about freedom of speech and stuff if you ask my friends), but I honestly dont see what people expect. If you put information online for the public, this is what happens, no conspiracy, no illegal or suspicious activity. This information is voluntarily released for up for grabs, if people want to use it for that I dont see a problem.

  6. Re:This is not a story. This is not news that matt by Reverend+Darkness · · Score: 5, Insightful

    Actually, if you had read the article, you would have seen that Mr. Owad does not reference "invasion of privacy" at all. What he does do is help people understand how information they share online can be used to create a general profile, and even to link them to others. The point of the article is to educate people. But, like others, you were probably just going on the snippet... I mean, why read the whole story when you can see the headlines via RSS, right?

    --
    ... elipses...
  7. Point of the article by terradyn · · Score: 4, Insightful
    Most of the comments seem to be along the lines of: "What use is it to mine wishlists?" You're missing the point of the article. His main idea is from this section of the article:
    This is what's possible with publicly available information, but imagine if one had access to Amazon's entire database - which still contains every sale dating back to 1999 by the way. Under Section 251 of the Patriot Act, the FBI can require Amazon to turn over its records, without probable cause, for an "authorized investigation . . . to protect against international terrorism or clandestine intelligence activities." Amazon is forbidden to disclose that they have turned over any records, so that you would never know that the government is keeping records of your book purchases. And obviously it is quite simple to crossreference this info with data available in other databases. On a final note, the FBI is now hiring computer scientists to implement a project that sounds very similar to what I just did.
    1. Re:Point of the article by Tassach · · Score: 2, Interesting
      the FBI can require Amazon to turn over its records, without probable cause
      This needs to be repeated loudly and often.
      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    2. Re:Point of the article by zCyl · · Score: 2, Insightful

      I still don't get it.

      The FBI can find out where I live without Amazon or Google.


      Yeah, but the FBI isn't supposed to know that you bought a book called "Why Bush is a Tyrant." The point being, it becomes quite dangerous if the government is allowed to keep tabs on what you read, because the political freedom which comes from freedom of speech requires that ideas can be exchanged and learned without fear of consequences.

    3. Re:Point of the article by houghi · · Score: 2, Funny

      the FBI can require Amazon to turn over its records, without probable cause

      HTH, HAND

      --
      Don't fight for your country, if your country does not fight for you.
    4. Re:Point of the article by crazyphilman · · Score: 2, Insightful

      I believe that what most of us find irritating about the Patriot Act is that many of the powers Bush asked for after 9/11 were not used to pursue terrorists, but rather political groups that the president disliked. For example, the airport no-fly list was mostly not used to prevent terrorists from flying (they all had false ID anyway) but rather, to prevent hippies and other malcontents from attending protests.

      Now, I believe you and I both completely agree that asshole fundamentalists with bombs should be repeatedly forced to endure body cavity searches performed by a very large man hopped up on speed, with his choice of implements (no lube).

      HOWEVER, I also believe that Bush should be prevented from abusing the powers he has received. I believe the FBI should be absolutely barred from using these powers against American citizens unless they can document, to a judge's satisfaction, that the person is in fact up to no good and has actually committed a crime (or is about to).

      Now, what's wrong with that? Most of the funamentalists with bombs aren't in fact Americans. So my little restriction of government power won't affect your desire to not be blown up in the slightest.

      I await your reply...

      --
      Farewell! It's been a fine buncha years!
    5. Re:Point of the article by NoseSocks · · Score: 2, Interesting

      Oddly enough, how many of the previous attacks on our children in our schools have been made by these tens of thousands of trained Islamofascist terrorists or those who support them? It usually seems to be done by a fellow American that has no ties to Islam.
      The problem I and many others have is that we are pretty sure that even if the government had all the data mining capabilities in the world, a large terrorist organization will still find a flaw in the system and abuse it. The issue is not that we don't have enough information. The issue is that most of our governmental systems and security as a whole are lacking even the most basic competence. The overall end result? There are fights on what should be held private and what should not be while a larger issue is appears to be left ignored.
      For every possible scenario you can produce on what might happen if we don't infringe more on everyone's privacy, I can produce actual situations where security (be it government or private) is particularily negligent and can let through a terrorist attack without issue right now.

  8. Most subversive anarchists... by CupBeEmpty · · Score: 4, Insightful

    ...that I know freely subscribe to Amazon.com wish lists. They are like "lets overthrow the government that wants to jail us" but they are also all over "Lets let everyone know how we feel about corparations and the government by making wish lists that not only incriminate us but play into the hands of the very corporate droogs we hate... makes sense right." Anyone thinking they will get useful information about truly dangerous groups from Google Maps or Amazon Wish Lists needs to take a breather and sit down for a minute.

  9. You didn't RTFA by TubeSteak · · Score: 3, Interesting

    Did you RTFA?

    He maps out (using google maps) the locations of the people who read certain books.

    A lot of these wishlists have a city, state, full name and birthdate attached to them... which is more than enough for google to give you a street address (though not always with 100% accuracy)

    Just to test it, i randomly picked a 'sarah' who had a wishlist. Turns out there's only one Sarah Johnson in Portland, OR.

    --
    [Fuck Beta]
    o0t!
  10. What? by Perseid · · Score: 3, Insightful

    From the article:
    On a final note, the FBI is now hiring computer scientists to implement a project that sounds very similar to what I just did:

    "Currently, the FBI is strengthening systems engineering in order to tie new systems together architecturally and ensure that standards for custom and packaged applications are enforced, and it needs engineers to accomplish this goal, the agency said.

    (etc...)

    Where does he read data mining into this? I read that the FBI wants to update their computers to make their databases better. Their databases.

    This article strikes me as scare mongering, and until I hear that the government plans on breaking the knuckles of people who read Aldous Huxley, I don't care about what's merely possible.

    1. Re:What? by surprise_audit · · Score: 2, Informative
      the FBI wants to update their computers to make their databases better. Their databases.

      These days, it wouldn't even take an Act of Congress for Amazon's databases to become FBI databases...

  11. Just to point out by TubeSteak · · Score: 4, Insightful
    It is one thing to 'mine' information from Amazon, it is another thing entirely, to mine useful information.

    Even his crude filtering techniques can yield worthwhile leads for police/FBI. He says that the first result for bible is "The Cannabis Grow Bible: The Definitive Guide to Growing Marijuana for Recreational and Medical Use".

    Is it so hard to imagine that a certain fraction of people with that book on their wishlist may either be growing weed, or have it in their possesion? Or that a percentage of people 'wishing' for the Improvised Munitions Handbook (printed by our favorite Uncle Sam @ the DoD) aren't chemists or demolitionists?

    /doesn't have an Amazon wishlist and never will

    --
    [Fuck Beta]
    o0t!
  12. Re:Mining voluntary information on a public websit by Rei · · Score: 5, Funny

    Profile for Jaish al Ashurah ] Wishlist

    Wishlist

    This list is for: Jaish Al Ashurah
    Birthday: None Entered
    Shipping Address: Private
    Unique Facts: A shadu la ilaha illah Allah

    Total items: 10

    "The Anarchist's Cookbook" by William Powell
    "Improvised Explosives: How To Make Your Own" by Seymour Lecker
    "Ultimate Sniper: An Advanced Training Manual For Military And Police Snipers" by John Plaster
    "Crusades Through Arab Eyes" by Amin Maalouf
    "The Protocols of the Meetings of the Learned Elders of Zion With Preface and Explanatory Notes" by Sergius Nilus, Henry Ford, and Victor E. Marsden.
    "Explosive Dusts: Advanced Improvised Explosives" by Seymour Lecker
    "Creative Cloth Doll Making: New Approaches for Using Fibers, Beads, Dyes, And Other Exciting Techniques" by Patti Medaris Culea.
    "The Tragedy of Karbala" by M.A. Naquvi
    "51 Documents: Zionist Collaboration With the Nazis" by Lenni Brenner
    "How to Build a Nuclear Bomb: And Other Weapons Of Mass Destruction" by Frank Barbaby

    --
    South Park pokes fun at sacred cows to make a point. Family guy pokes cows to hear them moo.
  13. well by Nutty_Irishman · · Score: 4, Interesting

    In my county in NC, if you want a party affiliation all you need to do is look it up on the public records website:
    http://www.co.durham.nc.us/common/PublRecordsdB.cf m

    You can also figure out how much someone's house is worth, what they paid in taxes, etc.

    It starts to get a little scary though when your search for public records reveals mortage applications with the individual's SS# listed on the sheet. All available online, and provided for by your very own government!

  14. Most people use wishlist once and then never again by loggia · · Score: 2, Interesting

    Look at a dozen random wishlists and you'll find the same pattern. Customer tried wishlist on December 11, 2002. Added Harry Potter and the Goblet of Fire. Never used wishlist function again.

  15. And you thought you were funny... by know1 · · Score: 2, Funny

    ...when you put that inflatable nun and bottle of baby lotion on your wishlist. woops...

  16. Re:Mining voluntary information on a public websit by Skidge · · Score: 3, Informative

    Next time you see a recommendation like that, you can click the "Why was I recommended this?" link under it and then uncheck the "Use this for recommendations" checkbox by the items that you don't want to be used as sources for your recommendations. Alternatively, you can go into "Your Store" through the tabs at the top and then go to the Improve Your Recommendations section and find the items and uncheck the same checkbox.

  17. It may be more relevant that it may appear. by igrigorik · · Score: 2, Insightful

    It may not be 'real news' but I don't think it should be dismissed as completely irrelevant. (Like 95% of current commentators have done).

    First, on relevance of wishlists:
    Granted that wishlists are not the most accurate estimates of your preferences, what is? My list contains over 50 books, and for the most part they are all related to each other. In fact, I would say that by looking at my list you would have a pretty accurate gauge to measure my interests. Am I an anomaly? Possibly. (Though I doubt it)

    But it still makes you wonder how then does Amazon produce dozens and dozens of relevant suggestions to each of your books. For example, I often add a book to my shopping cart just to see the "what other people have bought when they bought this book x". Click, Wishlist, click, Wishlist. I think it's naive to dismiss wishlists completely. In fact, I'm sure that you will be able to successfully data mine data obtained from the wishlists and extract interesting and useful information.

    Now, the actual experiment:
    An interesting observation that I've recently read about developments in AI: "It stops being AI once it hits the mainstream". It's true, and it's happening here. The idea does not capture anything new, but the application is interesting. You can find out what people are reading and where. (And that's a powerful tool!) It open a big can of worms: advertising, targeting social groups, other 'moral' and 'immoral' uses. To those who know how to utilize it, this might prove to be a goldmine.

  18. What could you do with Purchase Circles? by gbulmash · · Score: 2, Interesting
    Amazon already catalogs bestsellers and "uniquely popular" items for thousands of U.S. cities in their Purchase Circles section.

    When they first started the idea, they gave it some PR, but now it's sort of a low man on the totem pole, relegated to the backwaters. When I checked 6400+ cities, only 2800 of them were recording enough activity to warrant a bestseller or "uniquely popular" list.

    They generate the 2 types of lists for 5 classes of items: books, CDs, DVDs, toys, and consumer electronics. Now this might not be as potentially compromising as finding out a single person was ordering subversive books. Yet finding out a small town in Alabama's bestselling genre is showtunes is definitely something interesting.

    - Greg

  19. They have this already! by Derling+Whirvish · · Score: 3, Informative
    I've always wondered why Amazon didn't take a more 'social networking' approach to this since:
    a) I only want to share my wish list with people I trust;
    b) I only want to share certain sublists with certain people.

    They do! Go to "edit wishlist" and the second item after you name the list is "This list will be viweable by:" and it gives three choices: "Anyone who searches for me," "Only people I have invited with the 'Share this list' feature," or "Only me."

  20. Re:WTF is wrong with you? by typical · · Score: 4, Insightful

    Very simple principle. Lots of data is individually acceptable, but when compiled or processed, is unacceptable.

    For example, say you maintain a Slashdot identity that you don't link to your real name. While no one post of yours may be sufficient to tie your identity to your name, the sum total may be sufficient.

    Or security cameras. Most people don't worry about *one* security camera, but a lot of people get concerned when they are constantly being monitored by cameras which are tied together by computer to monitor where they go each day.

    --
    Any program relying on (nontrivial) preemptive multithreading will be buggy.
  21. Re:Mining voluntary information on a public websit by AndreiK · · Score: 2, Insightful

    Is it considered bad if I recognize and have read about half of those books?

  22. Everyone is missing the point by Twid · · Score: 3, Funny

    If this guys links Amazon Wish Lists, Google Maps, the yellow pages, and personality typing using Ruby on Rails, he can call it a Web 2.0 Mashup and make millions when Google, Yahoo!, or Microsoft buys him out.

    I smell a fully monetized eyeball!

    --
    - "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
  23. Well 99% of the people here don't get it by SmallFurryCreature · · Score: 3, Insightful
    Even comments to your posts don't get it. All this guy did was prove just how easy it is to use a seemingly harmless database to prove your a commie. Oh wait, get my mind out of the 50's, a terrorist. Or did the boogyman change name again? Pedo's are an eternal favorite and you can't really defend the rights of pedo's unless you wanna be lynched.

    No the FBI or anyone else would never bother with amazon wish list. They would simply get the sales records. This guy does not have access to those so he uses what he can to prove his point.

    Yes it is scary. Especially for those of us who have family (or more to the point do not have family) killed for expressing the wrong ideas.

    I however don't think we should blame the FBI or similar agencies, they are the instruments of us the people. It is we who have voted the current goverments into power. Corruption you say? Well then it is you and me that have allowed that to happen. I do not believe in the mythical innocent citizen. Others have died for freedom. No reason we should be allowed to sit on our backsides and complain our freedoms are taken away. FIGHT

    Not that I will of course. I know deepdown that what is happening is wrong and also know that I am one of the cattle. Perhaps it will make it easier when I am put in a cattle wagon to be gassed.

    The problem with fighting for your freedom is that one persons freedom fighter is another persons terrorist.

    I ain't got an answer or a solution except to suggest "PAY CASH". Even if your part of the herd there is no reason to make it any easier for them to send you off to the slaughterhouse.

    Will it happen? It has happened countless times before. Check the McCarty trials. The treatment of Japanse americans vs German americans. The gunning down of american citizens by police during peace protests. The way england handled the RIA and labor strikes. All of them pretty recent.

    Something scary might happen in our lifetimes. Or not. This is one tiny example to prove that it won't be hard on the technical side. Now all we need to is to elect leaders crazy enough to do it. /me looks at the current leaders of the "free" west. Too late.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Well 99% of the people here don't get it by mesocyclone · · Score: 2, Interesting

      Okay, I'm counting for the last 60 years - long enough for you?

      Number of Americans killed in the last 50 years by *the federal government* for expressing the wrong views: 0. This leads me to wonder who in your family you are referring to.

      Number of Americans killed by government for expressing the wrong views:0
      Number of Americans killed by government accidently during protests 4 that I know of, at Kent State.

      Number of Americans by or on behalf of Joe McCarthy: 0 (but Joe McCarthy was certainly a person who injured people by abusing his office)

      Number of Americans killed by right wing terrorists: about 200 (Oklahoma city, a few at abortion clinics, one at Atlanta Olypics)

      Number of Americans killed by left wing and eco-terrorists: a few (Ted Kazynski, a few during the Vietnam war years. an acquaintance of mine was permanently injured, for the sin of being a university computer operator, by a left-wing bombing).

      Number of Americans killed by Islamofascist terrorists: approximately 3000, most on 9-11; a few at the same WTC in 1993; a few in aircraft or cruise ship hijackings.

      In other words, if one deconstructs your examples, they are best described as nonsense. The bad guys killed a whole lot more than the government. The *abuse* of government power was even less . If you look closely at the events leading up to 9-11, it was privacy rights absolutism that, in at least two events, prevented the attack from being stopped (there is a very good chance it would have been without that extremism).

      If you studied World War II history, you would know that during THAT war, privacy rights and some other civil rights vanished. After the war they came back and became stronger than at any time in the nation's history. We are in a war in which counterintelligence is more important than any other we have fought, with the possible exception of the civil war (in which not only privacy rights disappeared, but so did habeas corpus). War requires sacrifice, and one of the things we have to do is wisely and carefully sacrifice some of our privacy rights.

      The treatment of the Japanese, while certainly scooping up a lot of innocent people and detaining them, had nothing to do with privacy rights. If you want to see it repeated, just continue to advocate pro-terrorist policies such as privacy fundamentalism, and see what Americans due to Muslims and "Muslim-looking" people after a nuke goes off in one of our cities, which was my point. We already had a Sikh killed here in Arizona, just after 9-11, by a citizen not the much feared government, because they *thought* he was a Muslim. Don't you think that maybe the government would use some acquired personal information to prevent 9-11's in the future, if possible, thus reducing the likelihood of such attacks, and the probability of a generally agreed upon set of measures that make the Patriot Act look utterly trivial?

      Finally, let's apply a little reason to government. If you are a libertarian and not cocmpletely looney, you know that the most important reason to have any government at all is to protect us against other citizens and foreign powers (in this case, stateless or state-backed terrorists). That is the FIRST purpose of government. We do this knowing that placing any power in the hands of government is a risk.

      So the rational person tries to weigh the risks. I see that rarely in internet debate; rather, what appears is knee-jerk civil liberties absolutists.

      Now, please tell me of *actual* cases where the government abused you, a familhy member, or someone you know. Note that abuse does NOT mean accidently detained or surveilled, but does mean intentionally used its powers for purposes of gaining power or money undemocratically.

      --

      The only good weather is bad weather.

  24. If you use your full name for an email address by sl4shd0rk · · Score: 2, Interesting

    You're a fool. And yes, the company you work for is also an idiot for using john.smith@megacorp.com as your mandatory email address. All your doing is making an index for yourself into the biggest rolodex on the planet. People argue that some names are so regular no one could possibly narrow it down, but a simple whois can help narrow things down to a particular state. Public legal records from there can make things more interesting.

    --
    Join the Slashcott! Feb 10 thru Feb 17!
  25. Data Mining vs Privacy by gone.fishing · · Score: 2, Informative

    Using data mining to catch criminals is nothing new and there is nothing wrong with it. Many white-collar criminals have be caught "cooking the books" using this kind of process. Having said that, I also have to say that there is a point where this practice can go too far. It can become an invasion of privacy that could cast the shadow of suspicion on to ordinary, law-abiding people.

    Suppose you were a person who likes surfing the net to read things like "The Anarchist's Cookbook" (an entertaining read) who is also curious about Muslim Extremisim (because it is so often in the news) and is planning a car trip with your family to New York City and Washington D.C. Perhaps you have downloaded maps and driving directions to the Capital, the White House and the United Nations Building from MapQuest. Maybe you have visited EBay and bought some reloading equipment (because you are a sport-clay shooter).

    Now imagine some data mining application at fbi.gov puts all of this information together and concludes that you are an extremist who is about to embark on a trip where you plan on bombing the United Nations building in New York City and the Capitol and the White House in Washington DC!

    Seperate and disparite pieces of data aren't always able to fit nicely into a simple formula. This is where the danger of this kind of information comes in. Taken seperately and considered without an adequate foundation, these "facts" tend to support a totally erronious conclusion. Next thing you know, someone is quietly asking questions about you abd you have no idea why.

    These kinds of things have happend to innocent people before. Someone I know faced scrutiny years ago shortly after the Oklahoma City bombing. There was no real reason for his being suspect and it took a long time to figure out why they looked at him. The FBI questioned his neighbors, they followed him, photographed his home, and in general made life uncomfortable for him.

    It took time to figure it out but, we finally concluded that there were reasons why he came to their attention. They were:

      - He was a gun collector
      - He bought gunpowder by the pound (he was a re-loader)
      - He worked at a facility where he may possibly have had access to amonium nitrate
      - He lived alone
      - He lived in the wrong place (outside of town in an area linked to suspects)
      - He had several 55 galon oil drums on his property
      - He was a member of the NRA

    To the FBI all this information seemed to indicate that he could possibly be linked as the third man in the Oklahoma City bombing. Nothing could have been further from the truth but for a few tense weeks, he was the focus of enough attention so that he felt like he could not visit friends, go target practicing, or do much of anything. He got paranoid and asked us to not call him because he thought he may be wiretapped. It really ate him up inside and he had done nothing wrong. The truth of the matter is that he is one of the most law-abiding people around. He had not done one illegal thing to draw this suspiscion on him. Litterally, he was just in the wrong place at the wrong time. He is just a kind of quiet guy who likes to keep to himself.

    I don't think that data mining brought this investigation on him. I think his name simply popped up on too many lists (which is in a way, a form of manual data mining). Still with computers and access to hundreds or thousands of different data sources, the possibilities have compounded themselves making this kind of process likely to impact too many poeole. Innocent people.