Xeroxing Personal Data From Your Browsing History
grease_boy writes "Xerox has filed a patent covering a technique to recover demographic information like your age, sex and perhaps even your income by analysing the pattern of web pages you browse. They want to license the technique to online advertisers and shops. Read the full patent here."
Because nobody could have ever thought of this before.
the NPG electrode was replaced with carbon blac
Wow, great, another patent covering something completely obvious, like analyzing my browser history to find out what sorts of things I might like.
ZuluPad, the wiki notepad on crack
...I must download more lesbian pr0n.
Get real. This is worthy of a patent? Just by the fact that you're reading this post you're most likely male, some college, etc.
...pay attention to those tracking cookies.
"A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
I think Gator might have beat them to it.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
XXX#######
Abstract of patent: Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.
.... how do you classify that?!
this should help with the spam wars going on.
The patent may well have merit but to be used it would have to break the law. Notwithstanding that governments may keep them for national security reasons, if the law in a country prevents a third party using or selling browsing habits for commercial purposes is it possible to take out a patent that presumes illegal behaviour? Such as a method of extracting money from a bank using a shotgun? Aren't they getting a little ahead of themselves in thier race to the bottom of corporatist fascism? Or is this very revealing patent application telling us that they consider buying the necessary laws to use it a mere formality?
They used to come up with new and innovative ideas such as, the Xerox copier, a graphical user interface using windows, and a host of other innovate technologies.
Now they've reduced themselves to patent trolling in order to pander to marketing scum. Just, wow.
...and that's the way the cookie crumbles.
.. not yet a patent. Look for it as a patent in 2-3 years. Maybe never.
**snort** teehee!
They've also given name to a photocopy of my buttcheeks.
PC LOAD TP.
Is the USPTO going to grant a patent even on this? No wonder the PTO gave a patent on "a method to add two numbers"...
How is statistical analysis patentable? I was under the impression that most patent systems excluded the possibility of patenting a mathematical method - which is exactly what statistical analysis is. You could probably extend that to all computer programming.
Patent in zanza form:
It is like setting up a 2D space, one being the weight and the other the age. Then if person X hangs around the doughnut shop all day, we can assume that (s)he is fat, if (s)he spends the rest of the time at movies watching Teenage Mutant Ninja Turtles and Spiderman and alike then we can also assume that (s)he is young, so we can assign a vector pointing somewhere in the (fat,young) quadrant. Wow. If the pattern is food, food, movie, food, food, food, movie then we can assume that (s)he is rather fat. Wait, it will get even better!
Now, if we don't know how to guess who's fat and who's young, we can get a handful of test subjects of whom we know how fat/skinny/young/old they are and watch where they go. So we then can have a reference to the preferences of various kinds of people and we can base our decisions about further subjects on that. Wow ^ 2. What's more, we can implement this using any method, included but not limited to, a computer which may have a processor, disk, memory and any other peripheral units but maybe it doesn't; or a mobile phone or application specific hardware or any other device which can be used to add numbers together and store them, such as abacuses, pieces of corn or beans, superhumans, humans, subhumans and in general everything and anything under the Sun. This is *so* original that the mind just boggles. What'll they think of next?!
It's sad to see Xerox PARC to fade away, they used to be so cool.
Very well put.
You know, I'm really sick of the whole "Guess your personal needs based on browsing habits". I get this enough from Amazon, recommending crap to me that I don't want, but that I sell to others.
I run a website which sells stuff. Now, it may not be stuff I personally want, but obviously other people do. So, I go through Amazon looking for products to sell. Of course, the advantage is that Amazon recommends items to me that I might sell to the other people reading my site, so it works out, but still, Amazon has a screwed up image of what I want as an individual.
Now imagine all these people who do searches online to find crap to feed their blogs. All the people who scour the internet in search of material for websites, stuff they are going to mention in passing, and then move on.
All the marketing people are going to get is that 50% of the people who surf the web want to see dismemberments via locomotive accidents on YouTube. That's the "vector".
The point I'm trying to make is that only half the people on the internet are the passive surfers this technology would work with. The other half are people who create the content online via looking for content online. (and then there's a small percentage who actually create content, but they don't surf as much).
So, the entire concept to start with is screwed because it assumes that the web is TV.
If telephones are outlawed, then only outlaws will have telephones.
... to protect Xerox's IP by not using their trademarked name as a verb! Because it hasn't become generic or anything...
We figured out a long time ago that it's easier to elect seven judges than to elect 132 legislators.
Patents turn ordinary developers with ideas into criminals. Check it out, this really is one of the best articles about patents I have ever read: http://philsalin.com/patents.html
Just reading this patent is chilling. How can the Patent Office even consider this as new and inovative when the NSA does this to us already. Hell Xerox might have been the contractor for them using this for all we know, and now want to patent this technique.
This should not be patentable here or anywhere else.
No software should be patentable.
I wait to see the patent for chewing your food thoroughly before swallowing.
Cheers
* Carthago Delenda Est *
...so we CANSPAM you some more... just a little bit more. Its not bad enough that 90% of all email traffic is spam... lets add to that figure.
Relocating to San Francisco / Palo Alto... Hire me?
I use Firefox's NoScript and AdBlockPlus. Between them both, I've blocked out all of the Yahoo!'s and Google's ad-tracking capabilities (and nearly all ads in general). Without cookies to track from one site to the next, I don't see how this will work unless every site does URL redirection. Also, don't log into Yahoo! or Google, then surf. They can record all of your search queries and tracking via your logged in cookie.
Well, they aren't trying to patent statistical analysis or tracking cookies in general, but they are grabbing for a lot. The claims don't look too impressive either. All the claims are too broad without a single super narrow claim that might just get accepted. Maybe the person who wrote the claims didn't understand the math. Double click, some of the old shopping sites, and similar companies must have had this well before the 11/2/2001 priority date. There is nothing in the claims or the application saying it only applies across domains. The examiner is probably going to reject every claim and love it because it won't take too long.
1. A machine-implemented method for extrapolating user profile information from user web page access patterns, comprising:
computing bias values for a plurality of web pages; ancient and general
assigning said bias values to the plurality of web pages; ancient and general
detecting at least a subset of said web pages accessed by a user having an unknown user profile attribute; ancient and general - tracking cookies?
combining said bias values of said subset of web pages to obtain a combination result; and ancient and "combining" is way general
assigning a selected user profile attribute to said user in response to said combination result indicating a positive bias of the selected user profile attribute; ancient
wherein computing said bias values for the plurality of web pages further comprises determining a fraction of users with the selected user profile attribute who visit a selected web page as measured over the plurality of web pages.hmmm, intersting limitation. Calculating a demoographic from the tracked pages. Pretty ancient too.
Eventually, this app is going to land on a junior associate's desk who will try to rescue it. In the long run, there's a decent chance that Xerox will get a patent on a very specific algorithm applied to limited data that is gathered in a few different ways. Hey, inventions like electrostatic duplication don't come along every day.
I am a lawyer, but not yours. Anything I tell you might be a total lie intended to benefit my clients at your expense.
Aggreed, it is stupid, if your purpose is actually to do marketting with it and sell more products. But my guess is that it's not how it'll get used.
Thing is, if you think about it... it fits just neatly in the eternal 3-way total war, whee the ad provider tries to shaft both the advertising company and the web master, and in most cases the two try to shaft the ad provider too. Tons of useless metrics exist just so the ad provider can tell some company "here's why you owe us a big pile of money for serving your ads", or so they can tell the web master "here's why we owe you a pittance."
(And just so it doesn't sound like the ad provider is the only scumbag, the whole dot-com bubble was based on the "hey, look, we can rip off the ad providers" idea. Ad rates in the beginning were based on sites which had one banner on the whole site. It tended to be somewhat targetted too, since if the web master chose just one, it tended to be somewhat related to what the site was about. And people used to even click on them occasionally. And they were worth decent money. Then some people discovered, basically, "woo, but if we put 10 ads per page, now we're owed gazillions of dollars." Whole companies went to IPO with that as their only business plan. But I digress)
The fact is, the _only_ real criterion of whether a marketting campaign was successful, is whether you sold more stuff as a result. Everything else, eyeballs, clicks, etc, is just smoke and mirrors. It's just some useless metrics that get gamed all the time.
E.g., it may sound like "clicks" is a relevant number, but it is only in a world where everyone clicked only because you got them equally interested in the product. Once you figure out other ways to fake it, comparing numbers of clicks becomes apples to oranges. And once you give someone a criterion like "number of clicks" to justify their salary, we're already seen the result: fake UI ads, punch the monkey ads, and outright redirects served as ads. It doesn't mean that those people became more interested in a company's product just because they got hijacked, but on a "look how many people we got to click your ads" statistic it looks the same.
So my take is that this is what it will be used like. Some ad provider will make up some scientific-sounding "how well we matched your ad to a target demographic" metric and use it to justify why you should pay a premium to advertise through them. Never mind that the demographic was a wild guess, and it actually lost information in the process... twice. It will look neatly in the marketting materials anyway.
A polar bear is a cartesian bear after a coordinate transform.
OK Xerox, so I'm a wanker. Like many of us. I confess. But how the hell are you able to deduct my age from the 2% non-porn related clicks? You'd conclude I have a stamina of a 16 year old and you'd be wrong. Unfortunately.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
My name is Ranko, I'm a girl and in desperate need for Viagra. Now its Xerox turn. Let's see, I visit youtube, wikipedia, slashdot, but not myspace. What am I?
I have a Squid proxy server set up to block all advertisements. If a site sends me an advertisement, I add it to an ACL which blocks it. The full Squid is a bit overkill for the purpose, though, and I do plan to write my own "mini" proxy server just for advert-blocking (and I'll even add options to have it download the advert and just not display it, so blocking advert-blocker-blockers; and maybe even falsify clicks to distort their figures -- though I'm not sure if this is kikely to make them show me more adverts).
When watching TV, I make sure to change the channel on my Sky Plus box when the programme starts; then go and do something else for ten minutes or so (basically just enough time to account for all the advert breaks in the show, minus a little bit to account for any pausing and rewinding I'm likely to do). Then I rewind it to the beginning, watch the first part (pausing and rewinding wherever appropriate) and fast-forward through the advert breaks. By the end of the programme, I've caught up with the live transmission and can flick to another channel (for obvious reasons, you can only rewind as far as the point where you changed the channel).
Je fume. Tu fumes. Nous fûmes!
Many companies continue to use airbrushed pictures, so-called subliminal, in their ads despite the fact that the very concept has been debunked. They do it Just In Case it'll work, and they pay big money for it. The only difference is nowdays they deny they do it.
.iso files for them to read. Hey, I'm just providing more data for their demographics by showing them what I'm watching. I wonder what's going to happen to the people running these spy machines when I notify the companies that they just downloaded a shitload of pr0n.
Same with this. It's probabilistic, meaning is does statistics and guesses. It has more holes in it than I'd allow any of my research methods undergrads get away with on homework: besides multiple people using the same browser, how about people who use different browsers? In particular, who use one for some thing, another for other things, etc.?
If there's going to be Yet Another Invasion of Privacy based in cookies, it's time to reconsider testing all those content providers' machines for overflow bugs by merging the cookies they send to DVD
And what are they going to do with it? "Targeted" "marketing". Read: try to guess better what spam you'll respond to.
Lest anyone thinks this isn't already done, you haven't been going to unusual web sites and then watching what ads pop up on subesquently viewed sites, and persist on popping up on those subsequent sites weeks or months later.
They're using "probabalistic latent variables" on the "vectors", meaning they look at where you go to from where rather than just where you've been. That means they're going to need to suck down a pile of your cookies. Are they going to pay for the bandwidth? This is upload, which ISPs cap at around 1/4 of download. Gee, thanks for slowing down my work guys, I only transfer gigabytes of data with other researchers. I could use the time off you're giving me by making it take that much longer.
If anyone wants to see one way latent variables are extracted, look into the noise reduction and especially the stereo-from-mixed-mono separation applications of continuous wavelet transform. There's ample freeware and free scripts for commercial software that'll do this analysis. And that's just the more recent way. Bell Labs came up with an analog to this analysis many years ago when they invented "voice prints" based on multiple parallel analyses using the concept that Fourier borrowed from the arabs 200 years ago. I can simulate the same extraction by fooling my software into thinking the data I'm feeding it is an accepted standard for EEG recording. I've done it with heart rate data, sun spot data, climatological data, seismograph readings and so on. I could do it with these "vectors" they're describing, and don't even have to understand very well what those "vectors" are supposed to mean. The latent variables will pop out as abrupt changes in the complexity of the data as the dimensions of the analyses (of the matrices, or similar) increases. I'll see it as blobs of color that weren't there before.
If a bunch of statisticians get ahold of this patent, they can easily describe how the analysis is just another implementation of analyses that have been used for many years and is essentially the same result from a different calculation, leaving the only novel contribution to be to use to which it's put and the specific code written to do it.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
News like this make you appreciate techniques like TrackMeNot.
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Like they once used umbrellas to keep a white skin they do not browse or sit in front of a monitor today. The 80% owned by the top 20% isn't covered in that search method.
I thought they were already doing this! Why else am I getting all this spam that assumes I'm a college dropout with a small penis who likes to help out Nigerian businessmen while scarfing down untested homeopathic supplements?
What job are you after? Details can be sent to listed email address.
InfoSec that matters, when it counts.
That might mean that unless Xerox licese this tech, only they could use it --meaning no-one else could harvest information this way. Hurrah! Here's for hoping, anyway...
I can see it now a Mozilla Plug-in that creates
random cookies to confuse these systems or hey
even create artificial patterns that might work
to your advantage.
(Yes sir you can see from my profile that I'm the
right person for testing driving the new Ferrari
just leave the keys in the mailbox)
I was just going to recommend the Redirect Remover extension for Firefox but it seems to have disappeared from the public site and into the sandbox.
Don't leave your mind so open that your brain falls out. Don't close it so much that you cut off the blood.
I expect you to die, Mr. Bond.
I tried this and it told me that I was an 87 year old, divorced Jewish woman living in Florida.
Dude...
It is your personal duty to fight for what is right on a daily basis. Ignoring injustice is identical to approving
...that the right to make a profit is higher than my rights to freedom and privacy? Sorry folks, but business has absolutely no right to my money. I earned it, I have the right to choose how I spend it with absolutely no influence from the seller. This is why advertising fails with someone like me. I ignore all advertising. If I need or want something I have the following criteria:
1. If it's something I need frequently, I find a brand that I like (defined as: does what I need, or has properties that I find to be important) and stick with it
2. If it's something I only purchase every so often, like a car or a digital camera, I do heavy research into the deepest technical aspects of the product to make sure it meets my needs and is reasonable in price for what it offers vs. my income and what I can afford to spend. Then I buy the product that fits my requirements. In this case it's not likely that I will stick with the same brand as these properties may change over time
3. For anything I purchase, I try to make sure that I'm buying from a company or supplier who is less supportive of evil practices. Since we live in a capitalist society and the profit motive is worshiped as the highest practice, I know that it's impossible to expect any business to be less than 90% evil (defined as: willing to cause harm to customers, employees or other by-standers in the name of profit) so I try to find those companies that are the least evil. Hint: Walmart fails (and no... the high number of jobs they provide do not make them less evil since they treat their employees like shit and take advantage of their employee's ignorance to boot. Another hint: Walmart is a bad place to work)
So, anyone who thinks that they have a right to my money is sadly mistaken. To all businesses of the world: you only get my money if you do the right things. I am your master. Not the other way around. The same applies to any other potential customer with a brain. The rest of the sheeple can fuck the hell off.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
Call it "identifying characteristics" and you get a patent. Call it "profiling" and you get all the bed wetters crying together... again.
It's like they said said some said thousand said times.
Not to mention the 10% or so of the patent that describes a COMPUTER in its constituent parts. It seems to me, if this is necessary, then perhaps the patent is describing an algorithm. But, of course, IANAL.
I said,
Randy.
Can it tell penis size? I'm not asking for myself. Just out of scientific curiosity.
"I'm not good. I'm not nice. I'm just right."
Start surf random plugin Load trophy wife theme
I'm sure they will be much happier when it becomes a generic term for spam, spyware, and identity theft instead, as in "that f'ing virus xeroxed all my personal data to the Russian Mafia and now my bank accounts are empty" or "those damn pill peddlers keep xeroxing me the same tired crap, trying to sell me Viagra".
"Time is an abstract concept devised by carbon-based lifeforms to monitor their ongoing decay." - Thundercleese
What I find interesting here is that they think their extrapolated information will be more accurate than the user supplied data.