'Scrapers' Dig Deep For Data On Web

Like Google? by bonch · 2011-04-13 04:32 · Score: 3, Interesting

Firms offer to harvest online conversations and collect personal details from social-networking sites, résumé sites and online forums where people might discuss their lives.

You mean like Google already does for its advertisers? In fact, one of the related links in the article is a story about Google titled Google Agonizes on Privacy as Ad World Vaults Ahead, discussing their plans for utilizing their vast archive of valuable user data. The battle for online privacy was lost long ago.

Re:Like Google? by blair1q · 2011-04-13 04:43 · Score: 0

This is a new form of privacy of which the news has not come to Harvard.
I'm pretty sure information posted for the entire planet to read is not private.
Out on the street, a huckster can size you up in about ten seconds, with 90% accuracy. Online, in text, you're not wearing that tribal-armband tattoo, so it might take a few minutes to figure out you're a joiner with delusions of individuality.
Time to revise my motto: The Internet is not secure, and open forums are not private.
Re:Like Google? by Anonymous Coward · 2011-04-13 04:59 · Score: 1

> I'm pretty sure information posted for the entire planet to read is not private
Well, that's what I think too, but amazingly, about 98% of humanity doesn't seem to agree. It seems to me that they're insane if they expect something posted to the whole world to be private, but there are SO many who think that way, I'm not sure what to make of it.
Re:Like Google? by betterunixthanunix · 2011-04-13 05:16 · Score: 4, Insightful

The battle for online privacy was lost long ago.
Only because one side of the battle never bothered to fight. Nobody was forced to go to social networking websites and post their life story, anyone could encrypt their email and IM conversations, and ad blocking software is widely available. Large amounts of the information that these companies are aggregating could have been made far more difficult to obtain if the majority of computer users could have been bothered.

Sadly, the Internet has become more of an adversarial game than a way to unite people.

--
Palm trees and 8
Re:Like Google? by locofungus · 2011-04-13 05:22 · Score: 1

The majority of humanity probably think posting something to facebook or whatever is similar to writing "Got totally plastered on holiday" on the back of a postcard and posting it to their local (something that people do)
Sure, it's public but after a few years it will have vanished without trace.
Tim.

--
God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
Re:Like Google? by hoggoth · 2011-04-13 06:15 · Score: 2

/ sheepishly pulls sleeve over tribal armband tattoo...

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Re:Like Google? by VolciMaster · 2011-04-13 07:16 · Score: 2

The battle for online privacy was lost long ago.
Only because one side of the battle never bothered to fight. Nobody was forced to go to social networking websites and post their life story, anyone could encrypt their email and IM conversations, and ad blocking software is widely available. Large amounts of the information that these companies are aggregating could have been made far more difficult to obtain if the majority of computer users could have been bothered. Sadly, the Internet has become more of an adversarial game than a way to unite people.
forced to use social tools? no.
encryption available? yes
understood by anyone in the general public? nope

--
antipaucity
Re:Like Google? by Americium · 2011-04-13 07:22 · Score: 1

The battle for online privacy was lost long ago.
So if I post to a public forum I should expect privacy?
What about CC companies selling data, that was going on before the internet, and seems more intrusive than many of these situations.

Sadly, the Internet has become more of an adversarial game than a way to unite people.
I think all those countries having revolutions in the middle east might disagree with you.
Re:Like Google? by jd · 2011-04-13 07:46 · Score: 2

There's that and there's the fact that the US (one of the largest consumers of data) has no data privacy laws and has been pressuring places that do (such as the EU) to violate their own laws. The laws don't solve the problem in and of themselves, what they do is make the public more* aware that the problem even exists. (*You can have more than nothing.)
The older ITAR laws and RSA patents didn't help - it effectively criminalized any effort to produce a product, since you'd need to sell the product in the US to be able to generate enough interest.
The problem now is that the legacy protocols are too widely used to be easily replaced and legacy products have so much staying power that a backwards-compatible system would remain effectively insecure for decades.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:Like Google? by Anonymous Coward · 2011-04-13 09:09 · Score: 0

Speaking from experience: That tattoo means nothing. And anyone good will realize that. I've surprised a lot of people when it became evident that I'm not a sheep to be herded with the rest of the crowd. Now yes, you did say 90% accuracy, but in some of the circles I've traveled 90% ain't good enough. It's the 10% that will literally kill you. Yes, it's been a rough life. But a good one.
Re:Like Google? by Anonymous Coward · 2011-04-13 12:11 · Score: 0

This isn't really about "privacy" anymore than when you go to the mall, or a rock concert, or spring break. You don't really have privacy there and you don't EXPECT it.
What this is really about is AGGREGATING data in an unprecedented way. We tolerate the invasive news collected online because we're used to "investigative" reporting. Except even "investigative" reporting went way over the line in the Geraldo and Hansen days... it won't really be dealt with until we get really drastic liable laws like the UK has... where just because somebody can find facebook pics from 3 years ago, doesn't mean that turning them in to your employer isn't "liable" in the sense that it's none of their business and "badmouthing" for no good reason. One could argue that owners of the bots are doing much the same thing in that there's no real oversight in what they collect, other than it can't be "made up". As long as they have one website link they can display it's not liable.... but that doesn't mean they conducted a "fair and balanced" search on you.
Re:Like Google? by Anonymous Coward · 2011-04-14 04:38 · Score: 0

This:

legacy products have so much staying power that a backwards-compatible system would remain effectively insecure for decades.
Few things have actually made me wish I created an account when I started frequently /. 8 years ago than moding this insightful. Well put.

the darker side of grey by Anonymous Coward · 2011-04-13 04:37 · Score: 0

Is it legal to USE information gathered in this way to discriminate against someone if it was gathered with methods contrary to a site's TOS?

Re:the darker side of grey by Loether · 2011-04-13 05:30 · Score: 2

I think they are 2 distinct issues that do not combine the way you suggest.
1. If you violate a websites TOS the website can come after you.
2. The info they gain spidering a website is pretty much free for them to use to discriminate against you.
Anything I post on slashdot/FB/any online forum I treat like it is viewable by every future and past employer, insurer, lender, ex girlfriend etc. Anything online will exist forever and if it's not already permanently linked to you, it will be before you die. If that's right or wrong, legal or illegal is really besides the point IMHO.

--
TODO create witty sig.
Re:the darker side of grey by Anonymous Coward · 2011-04-13 05:33 · Score: 0

The TOS violation and the use of the data aren't really related in the legal sense, really the only thing that matters is what you're discriminating on. If you're discriminating based on race, gender, ethnicity, etc. it really doesn't matter where you got the information, it's illegal either way. If you're not discriminating against a protected class (eg you don't hire someone because their facebook account is full of them getting shitfaced), the person you discriminated against has no basis for complaint. The site you scrapped the data from could complain that you violated their TOS, and the user of that site could complain to the site that their data was used in a manner contrary to their TOS, but the end user and the data scraper have no legally binding agreement between them.
Re:the darker side of grey by Americium · 2011-04-13 07:28 · Score: 2

I don't know how good of a comparison this is.
So if I write a book, can I include TOS that makes it illegal for anyone to use the information within the book? If I write a book about how much my boss sucks, and how I slack off at work, can I include TOS so that nobody is allowed to relay that information to him? Even if I only sell my book to members of a book club, I wouldn't think this changes anything.
If you intentionally post information about yourself on a widely viewable forum, I would expect other people might read it.
Re:the darker side of grey by jd · 2011-04-13 07:50 · Score: 2

Well, the problem with (1) is that a TOS is an agreement with no signature, no confirmation of acceptance (implicit is unlikely to hold up in court) and no proof that the TOS was even visible by the user (since what is visible to the user is a function of the browser and cannot be established at the server-side).

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:the darker side of grey by plover · 2011-04-13 14:25 · Score: 1

Certain kinds of discrimination are illegal in specific cases, of course, and remain illegal regardless of how you obtained the information.

--
John

They won't get me by Tigger's+Pet · 2011-04-13 04:38 · Score: 2

I'm not on FB, Twitter, MyCloud or whatever else, so there's no data out there about me. If there's nothing to harvest then they can't harvest it - I'd rather be classified as 'boring' or 'not with it' (whatever the fuck 'It' is), than have stuff out there that might come back to bite me in the ass in 10 or 20 years time.

Re:They won't get me by yog · 2011-04-13 04:47 · Score: 2

Definitely avoid using a real or traceable name in online discussion forums and social sites. Also, avoid embedding your real name into your email address, such as "JohnSurfer@cox.net" or the like.
Unfortunately, my real name is embedded in one of my email addresses, and it's all over the web by now. I guess I can eventually switch to a different address, but the damage is done.
If you have someone's name, you can now obtain their current and past addresses, their age, their schools, possibly where they work, possibly their political party affiliation, and possibly a ton of other information if they have used their real name in online activities. It's not rocket science to do this; the information is just sitting out there waiting to be grabbed.
I suppose if you have nothing to hide and have avoided getting too controversial in your online discussions, or too outrageous in your social network photos and statuses, you're probably safe from major problems. Employers are going to be looking for extreme behavior, not slightly out of the ordinary behavior. If an employer doesn't like some minor thing about you, e.g. a picture of you on Facebook wearing green antennas at a Halloween party, then probably they're not someone you'd want to work for anyway.

--
it's = "it is"; its = possessive. E.g., it's flapping its wings.
Re:They won't get me by Anonymous Coward · 2011-04-13 05:02 · Score: 2, Funny

I suppose if you have nothing to hide and have avoided getting too controversial in your online discussions, or too outrageous in your social network photos and statuses, you're probably safe from major problems.
Yep. That's why my pic on chatroulette is an exact average size penis.
Re:They won't get me by Anonymous Coward · 2011-04-13 05:03 · Score: 0

The downside of that is that the info can be used against one come a political battle, or just a statement.
I will give a good example of that. When I graduated college in December 2008, I didn't have a Myspace/FB/Twitter account, nor did I care about possessing one. However, after repeated interviews where the interviewer asked my account name to friend/follow and I told them that I didn't have one (nor did I care to spill out my guts out for all and sundry on the Internet to see my private stuff), then told me that I was a fossil because I wasn't with the times (and compared to not having a FB ID with not having a telephone or E-mail address), I put up some token accounts with some sanitized stuff on them.
The idea of companies grabbing data is not new. I'm sure it will become more and more common as time goes on, and eventually a "desirability score" will be made of individuals, similar to a credit score as another factor for employers to screen on. Pretty much, the further one is personality wise from Snooki, the less chance of being able to find work.
Re:They won't get me by Anonymous Coward · 2011-04-13 05:23 · Score: 2, Funny

That's OK, Phillip Wilkerson of Midland, MI. We still know all about you. Tell Donna and the kids hi for us. Don't forget to pick up dog food on your way home from the tanning salon.
Sincerely,
Google
Re:They won't get me by jshackney · 2011-04-13 05:24 · Score: 1

Definitely avoid using a real or traceable name in online discussion forums and social sites. Also, avoid embedding your real name into your email address, such as "JohnSurfer@cox.net" or the like.
That's unlikely to help. I'm afraid this fight is already lost
Re:They won't get me by sakti · 2011-04-13 05:29 · Score: 3, Insightful

IMO it's better to have an easy to find public 'you' online for these people to track. You use that for everything 'safe'. You then use multiple anonymous accounts for anything you don't want tracked.
If you have nothing tracking online I think it might start looking more suspicious than not. Plus having nothing might encourage 'them' to dig in and try to relate you to your anonymous account(s).

--
"It is better to die on one's feet than to live on one's knees." - Albert Camus
Re:They won't get me by Anonymous Coward · 2011-04-13 06:12 · Score: 0

What that means is that when they manage to tie your slashdot ID in with something else they'll put you down as a "mild tin foil hat" personality type.
Re:They won't get me by hoggoth · 2011-04-13 06:21 · Score: 3, Insightful

Wow, that's pretty inappropriate for an interviewer to require you to open your personal family or friends circle to him. What if my family is discussing my alcoholic father, my pregnant niece, my HIV+ friend, and my habit of killing interviewers and burying them in my backyard?

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Re:They won't get me by Anonymous Coward · 2011-04-13 06:36 · Score: 0

"If an employer doesn't like some minor thing about you, e.g. a picture of you on Facebook wearing green antennas at a Halloween party, then probably they're not someone you'd want to work for anyway."
While true, having a job is essential, and it's a privileged few who can always pick and choose their employment so willy-nilly, especially in this economy.
Re:They won't get me by TaoPhoenix · 2011-04-13 06:39 · Score: 1

A real pro would be able to do it based on this comment of yours.
http://slashdot.org/comments.pl?sid=2031640&cid=35457796

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Re:They won't get me by networkBoy · 2011-04-13 06:43 · Score: 2

fundamentally that's what I do.
There is a real me on FB. Then there is me here (and this ID is shared across multiple sites) which would not be too hard to link to the real me.
For stuff I really don't want tied to me in re. job interviews, non-gov't background checks etc. I use other identities. For something that I would be afraid of coming out in a relatively thorough discovery && || government background check I simply don't post it on line. At all.
-nB

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:They won't get me by Anonymous Coward · 2011-04-13 06:51 · Score: 0

You probably don't like Social Networks. You hold a computer with Linux, you prefer Gnome to KDE and you think "stable, simple, clean, easy to use, easy to configure and gadgets" are important things. You encrypt some data. You think Google is maybe evil et caetera ... Just by reading your /.'s profile.
Re:They won't get me by SuricouRaven · 2011-04-13 06:54 · Score: 2

There are many applicants for each job, so employers can be picky. If they have a set of candidates who are all qualified and of similar levels of experience, they'll pick the one who is most 'normal' in their personal life, and thus least likely to somehow embarass the company or to just not get on with other employees.
Re:They won't get me by ceoyoyo · 2011-04-13 07:33 · Score: 1

In eight years on Slashdot I wonder if you've ever accidentally posted something that might link to you. I can't be bothered to find out, but I'm sure that information might be valuable to someone.
Of course, you probably drag cookies around like everyone else anyway.
Re:They won't get me by Tigger's+Pet · 2011-04-13 09:15 · Score: 1

Well done - you can track my previous postings on /. Do you want a prize? I'm now accepted as one of the 6.5 million people in the UK who have their DNA on record because this country stores DNA samples from everyone convicted (and many who are not convicted). Assuming of course that I'm not just posting things to try and make a point and gain Karma points - just like all the people on here who post about "My wife had this happen to her..." - we know that they haven't got a wife or they wouldn't be on here ;=P
Re:They won't get me by Anonymous Coward · 2011-04-13 09:22 · Score: 0

It also helps to post as Anonymous Coward.
Re:They won't get me by TaoPhoenix · 2011-04-13 09:56 · Score: 1

I was trying to be polite.
I was half way to a contextual analysis based on some of your more creative phrases but I ran out of time to rule out false positives. At a minimum I think you post on at least five sites and cross referencing those is almost enough. The last trick requires one of the web admins (for easy sake start with slashdot) to use the new geolocation trick based on public nets to narrow it down. The point is that it's a When-Not-If world out there so plan your future expecting to be tracked and deciding what to do about it.
I'm a 3/10 grade cuationary futurist practicing reworking my habits now before a couple ugly law floating around congess hit live and reflexes do the rest.
P.s. It's not just the DNA database bit, but the *rough timeline of conviction plus sentence length* I was trying to draw your attention to as a tracking factor. Right now that takes two high powered phone calls at the end of the data chain, but it's a Leaks World, so we are learning obscurity is growing short.

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Re:They won't get me by Jane+Q.+Public · 2011-04-13 18:58 · Score: 1

Check out my name. I have several email addresses under that name with different providers, and under different names as well. I have for years. And none of those email accounts are attached to my "real" name or personal information, in any way. And most of them were established from different IP addresses. Also: other people use that name. That is one of the reasons I chose it.

I fully believe (because history clearly demonstrates as much) that the ability to communicate privately and anonymously is essential to a free society. I do not, however, expect others to hand that to me on a silver platter; we must all take pains to exercise our rights, lest they be taken from us.

The people who take such things for granted (or worse, argue against them) do not understand or appreciate what others have given so that it may be possible. As has been said before: no, you are really not "paranoid" enough for your own good.
Re:They won't get me by Jane+Q.+Public · 2011-04-13 19:11 · Score: 1

Bollocks. Utter nonsense. The people who have "lost" this "fight" are only the ones who were never "fighting" in the first place!

They weren't using different information (or even names and locations) on different sites. They weren't using different IP addresses and MAC addresses. They weren't... doing ANYTHING. Because they didn't even know they had to. That's a pretty weird definition of a "fight".

Pardon me, but (as is probably the case with most internet users in the US today) getting repeatedly sodomized in such a way as you don't even know it or feel it -- at first -- is NOT a "fight". It's forcible rape of the worst kind. It's like dissecting a frog that has been pithed.

Your government not only allowed that to happen, they cooperated with it and still are.

No thanks to people who think a nonexistent "fight" has already been "lost". What a bizarre outlook.
Re:They won't get me by cayenne8 · 2011-04-14 02:28 · Score: 1

What if my family is discussing my alcoholic father, my pregnant niece, my HIV+ friend, and my habit of killing interviewers and burying them in my backyard?
I'd hope to God that you all weren't discussing such things on a public forum like Face book?!?!?!?
Geez, use the phone, or meet in person...I'd never put any discussions like that on an internet forum. Bad for you and your father if they searched for info on him for a job....

--
Light travels faster than sound. This is why some people appear bright until you hear them speak.........

They're coming for you, AC by blair1q · 2011-04-13 04:38 · Score: 2

That Anonymous Coward guy is going to have a mailbox full of goatse spam.

Re:They're coming for you, AC by dev.null.matt · 2011-04-13 04:53 · Score: 1

That Anonymous Coward guy is going to have a mailbox full of goatse spam.
With the kinds of responses he's posted to some of my posts, let me assure you... he already does!

Now lets see by Grindalf · 2011-04-13 04:40 · Score: 1

Now what kind of individual stands to gain from the of generating this rumour? Lets see now ...

--
The purpose of existence is to make money.

Bravo by swanzilla · 2011-04-13 04:40 · Score: 2

Example 'scrape' FTA:

He used a pseudonym on the message boards, but his PatientsLikeMe profile linked to his blog, which contains his real name.

I don't think we need to dig any deeper to come to the conclusion that this guy is an idiot.

--
0 = 1 + e^(Alt something)

Re:Bravo by TypoNAM · 2011-04-13 05:04 · Score: 4, Funny

He used a pseudonym on the message boards, but his PatientsLikeMe profile linked to his blog, which contains his real name.
I don't think we need to dig any deeper to come to the conclusion that this guy is an idiot.
Indeed, Joseph Swanson.

--
This space is not for rent.
Re:Bravo by swanzilla · 2011-04-13 05:18 · Score: 1

Indeed, Joseph Swanson.
SEO on a budget. Take notes.

--
0 = 1 + e^(Alt something)

The link in the summary is a dupe by Nero+Nimbus · 2011-04-13 04:47 · Score: 5, Informative

This was talked about back in October:

http://yro.slashdot.org/story/10/10/15/1340244/Data-Miners-Scraping-Away-Our-Privacy?from=rss

I thought the guy in the picture looked familiar...

"We (/.) ban scrapers..." LOL by billrp · 2011-04-13 04:50 · Score: 2, Insightful

"We ban scrapers like this regularly here simply for not adhering to the rules spelled out in robots.txt." Hah! robots.txt doesn't stop any decent crawler

Anyone up for making a few new DNSBLs? by mysidia · 2011-04-13 04:52 · Score: 1

Known robots, and scrapers

IP addresses that do not honor /robots.txt.

and IP addresses that robotically submit spam on robots.txt disallowed HTML feedback feedback forms

Much web scraping can be automatically detected.

Sites like Facebook/social networking sites are perfect places to trap/detect scrapers, if they would be willing to contribute to a DNSBL

Re:Anyone up for making a few new DNSBLs? by Rizimar · 2011-04-13 06:17 · Score: 1

A good place to begin would be to examine the robots.txt of large sites to see what they're blocking. Sometimes they leave helpful comments in the text files as well. The most interesting I've come across so far is Wikipedia's robots.txt file which has comments for every disallow or series of disallows.
Re:Anyone up for making a few new DNSBLs? by mysidia · 2011-04-13 07:05 · Score: 1

The most interesting I've come across so far is Wikipedia's robots.txt file [wikipedia.org] which has comments for every disallow or series of disallows.
Well.. it bothers the hell out of me that I can't Google VfD/Afd/Page for deletion Articles on Wikipedia, because a few people were annoyed there were VfD articles about their nonnotable vanity page on WP. Wtf are the Wiki people thinking? Sometimes interesting points arise in a discussion, and it would be useful to be able to search those discussions in the future, since they're so massive.....
That's great for the user-agent fields of known bots. Unfortunately, it doesn't contain an IP address banlist. Something tells me they don't bother too much about IPs of bots that don't honor and use generic user agents.
I wonder if anyone's tried listing Firefox/MSIE in robots.txt Disallow entries... does that hurt any bots without impacting human navigation?
Re:Anyone up for making a few new DNSBLs? by BillX · 2011-04-13 14:26 · Score: 1

There are a few specialist blacklists popping up. Here is one specifically for listing spam robots that attack the most popular forum softwares (phpBB, SMF, etc). What I would really like to see is one that lists all the latest "scrapers to detect when people say negative things about your company/product and C&D them" services. I'd sign onto that in a minute - a no-brainer security measure for yourself, your blog and your forum users.

--
Caveat Emptor is not a business model.
Re:Anyone up for making a few new DNSBLs? by Rizimar · 2011-04-13 14:36 · Score: 1

Sometimes, bots can be detected by their patterns or behavior. If a bot doesn't want to comply with robots.txt and ends up sucking a site's bandwidth, the site may ban it automatically if it's configured to do so. Not sure if Wiki does this, though
Listing Firefox/MSIE in robots.txt also wouldn't do anything because those are browsers, not web crawlers, so they don't have to even acknowledge the robots.txt standard. Though, that's not to say that it wouldn't be fun, let alone downright tempting, to disallow users of IE6 from accessing various sites in hopes that they'd switch to something more relevant :P
Re:Anyone up for making a few new DNSBLs? by mysidia · 2011-04-13 14:48 · Score: 1

Listing Firefox/MSIE in robots.txt also wouldn't do anything because those are browsers, not web crawlers, so they don't have to even acknowledge the robots.txt standard.
Shouldn't effect users.... but I was thinking some of the 'evil bots' might be using an API/framework for making bots, where they supplied the fake UA field to, and that framework might be so gracious as to _force_ the bot application developer to comply (?)
I was also wondering if FF/MSIE might have some auto-crawler features that would be subject to robots.txt.... such as selecting 'save a web page complete' which normally crawls the page and all its dependencies to capture them.
Also.... any link pre-fetching technology is crawling, since the human didn't select the web page to be shown yet, by definition; any pre-fetching of a link disallowed in robots.txt would be breaking the robot exclusion conventions.
Re:Anyone up for making a few new DNSBLs? by Rizimar · 2011-04-13 15:37 · Score: 1

Shouldn't effect users.... but I was thinking some of the 'evil bots' might be using an API/framework for making bots, where they supplied the fake UA field to, and that framework might be so gracious as to _force_ the bot application developer to comply (?)

Yeah, there are some frameworks and free-to-use bots all around, but because of the diversity of bots and their uses as well as the functions of various servers, it'd be hard to control their behavior so simply. That's part of the reason why robots.txt is voluntary; it's more so that the good bots will find relevant data and not login screens, user forms, etc.

Also.... any link pre-fetching technology is crawling, since the human didn't select the web page to be shown yet, by definition; any pre-fetching of a link disallowed in robots.txt would be breaking the robot exclusion conventions.

I don't agree with this. Prefetching isn't so far off from regular browsing; downloading all of the images, scripts, objects, etc, that are linked to any common page online would qualify everyone for running a crawler if that were the case. Crawlers move much differently through a site than a regular user, often at a faster pace, and read in a way much unlike our own.
Re:Anyone up for making a few new DNSBLs? by BradleyUffner · 2011-04-13 17:58 · Score: 1

A good place to begin would be to examine the robots.txt of large sites to see what they're blocking. Sometimes they leave helpful comments in the text files as well. The most interesting I've come across so far is Wikipedia's robots.txt file which has comments for every disallow or series of disallows.
After reading this the first thing I thought was, "Now we need a meta-robots.txt file to stop robots from scraping the robots.txt file."
Re:Anyone up for making a few new DNSBLs? by Hognoxious · 2011-04-13 21:17 · Score: 1

Something tells me they don't bother too much about IPs of bots that don't honor and use generic user agents.

Perhaps (unlike some) they're not stupid enough to think there's a 1:1 correspondence between users and IP addresses?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:Anyone up for making a few new DNSBLs? by riondluz · 2011-04-14 05:54 · Score: 1

Thanks for the link. What i try to make work is a 2-fold approach that treats robots.txt as almost irrelevant.
For scrappers, i use logs to count requests and blacklist them at the firewall.
For 'form' spam I use a captcha 1st step that hooks into a back-end RBL checker that runs their IP agains (among others)
"pbl.spamhaus.org, sbl-xbl.spamhaus.org, bl.spamcop.net, multi.surbl.org, bl.spameatingmonkey.net"
If they're listed then they're blocked at the captcha.
Also, every file I serve tests that the referrer is my site, though I know its easy to spoof.
Its the RBL checking that is most effective and I'm surprised I haven't seen it more widely adopted.

--
resist propaganda

Future Politicians by metlin · 2011-04-13 04:55 · Score: 1

I've always wondered -- how would this work for future politicians from our generation?

All your comments, history etc are probably available in a multitude of places, and anyone with enough motivation can go around digging and find some pretty serious material. Combined with the fact that most people know (or care) little to nothing about privacy, you will have an entire generation of users with a good chunk of their private lives and opinions shared out on the Internet for everyone to see.

And knowing how we all have skeletons in our closets, and how we've all been immature at some point in time or the other in our lives, how many future politicians candidates can claim to be "squeaky clean"?

I mean, I see this primarily as a problem for the right more than the left, given how their voter base expects them to have "conservative values" or some such nonsense.

Re:Future Politicians by dev.null.matt · 2011-04-13 05:00 · Score: 1

There's already pretty damning video clips of many US politicians that are widely available. It doesn't seem to have any real impact on their ability to get (re) elected here. Watching the Daily Show for a week, you will come up with numerous examples.
Unless of course you're referring to the effects these sorts of things might have on the political proceedings in smoke filled rooms.
Re:Future Politicians by Anonymous Coward · 2011-04-15 11:29 · Score: 0

Don't you think that anybody who claims to have been "squeaky clean" for her entire life is just lying?
If you agree to this, perhaps it's more sensible to put trust into someone humble enough to admit (and bear with te consecuences of) their past wrongdoings or lame comments.
That said, I try not to leave a noisy trail on my online life. But some time ago I saw a post here on ./ about the "silent lurkers" and how the Civilization is missing our POVs :)
When my head starts to loop in this way I always resort to this old saying "paranoia pays". Perhaps I won't even hit the POST button. Fuck perhaps ./ javascript has already logged everyhing I have already typed, so the post is "out there" in some obscure database, so I think I will finally post. Oh my! Perhaps there's some keylogger in my firefox or in my opensuse with packman packages that change on a daily basis :(
oh my why the fuck did I turn on the fucking laptop this morning? oh my! I think I'm killing myself after the post. good bye!
Hey someone at slashdot please implement that gmail lab thing that makes you do some math before you are allowed to post preventing drunk-posting. The captcha is not enough, sorry.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 04:57 · Score: 1

Getting banned sure will though.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 04:57 · Score: 0

Well, no. That's why they get banned. Also, I think your definition of "decent" is a little skewed in this context. The decent thing for any crawler to do is respect robot.txt files and the rules they contain. Of course, they would first have to look for a robot.txt file, which one would think any decent crawler would do.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 05:01 · Score: 0

"We ban scrapers like this regularly here simply for not adhering to the rules spelled out in robots.txt."

Hah! robots.txt doesn't stop any decent crawler

Yes... not being stopped by robots.txt is the reason they ban them. Which implies that they're using some form of ban that does not rely on robots.txt (and which may or may not be effective).

for great good luck! by Anonymous Coward · 2011-04-13 05:03 · Score: 0

Your offerings please Anonymous Coward, keep them coming!

If I can read the page by countertrolling · 2011-04-13 05:05 · Score: 1

What's to stop me from 'scraping' the info? What's to stop me from simply downloading the entire site with something like this? Slowly if needed to avoid arousing suspicion..

--
For justice, we must go to Don Corleone

Re:If I can read the page by betterunixthanunix · 2011-04-13 05:19 · Score: 1

Slowly if needed to avoid arousing suspicion..
How slowly? Could you download all Slashdot comments in a profitable amount of time? You would also have to use a download pattern that is not obviously automated (e.g. sequentially requesting each link on a page).

In short, it is not the easiest thing to do. It is like trying to pass the Turing test (which software is getting pretty good at doing, as it so happens).

--
Palm trees and 8
Re:If I can read the page by hoggoth · 2011-04-13 07:07 · Score: 1

Run a separate scraper from different IP addresses for each "category" on Slashdot. Each scraper will read all of the articles in that category and refresh the comments from time to time (random intervals) just like a human would. That would be pretty hard to detect.

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Re:If I can read the page by TheRaven64 · 2011-04-13 07:14 · Score: 1

Depends. Am I allowed to use a botnet? From a previous story, I know that you can buy machines on botnets for about five cents each. For a dollar, I could have 20 machines, all grabbing one Slashdot story per minute (probably slow enough not to be seen as a spider). That's about a million Slashdot stories every four days. Maybe make it a million a week to make sure. Spread it over a big botnet and you can get the entire archive in an hour or so, without it looking like anything other than a few hundred thousand users all looking at archived stories.

--
I am TheRaven on Soylent News
Re:If I can read the page by Hognoxious · 2011-04-13 21:21 · Score: 1

That's about a million Slashdot stories every four days.

If you're only interested in unique ones it'd be more like a few thousand.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

OK, I Confess! by ackthpt · 2011-04-13 05:05 · Score: 1

I did expect the Spanish Inqueisiton!

--

A feeling of having made the same mistake before: Deja Foobar

Re:"We (/.) ban scrapers..." LOL by billrp · 2011-04-13 05:06 · Score: 1

I don't think there can be such a "ban" - if humans can browse a website, then crawlers can crawl.

Re:"We (/.) ban scrapers..." LOL by TheSpoom · 2011-04-13 05:06 · Score: 1

robots.txt isn't meant to have any enforcement capability; by its nature it's just an advisory mechanism telling bots who and what they will and will not accept. If a bot chooses to ignore it (as pretty much all of the types of bots described in this article do), it's up to the site admins to enforce it via IP bans etc.

--
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs

Hello Scrapper by Anonymous Coward · 2011-04-13 05:09 · Score: 0

Hoping somebody is scraping this message.

Irony by angloquebecer · 2011-04-13 05:09 · Score: 1

Soon as I click to read the comments, the ad on the right is for a web scraping solution.

Re:"We (/.) ban scrapers..." LOL by betterunixthanunix · 2011-04-13 05:12 · Score: 2

However, there are patterns of browsing that are clearly not human. Humans do not make 100 requests in a 10 second timespan, nor do humans traverse every post made by every user.

Yes, it is imperfect and you might ban an occasional human, but this is essentially the situation we have with spam filtering. It is a bit sad that the Internet is becoming so adversarial, but that is what we face.

--
Palm trees and 8

Wait a minute... by Anonymous Coward · 2011-04-13 05:13 · Score: 1

You're telling me that stuff on a public web site is public?

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 05:27 · Score: 0

Made me lol.

Scraping public data to save money for them and us by garcia · 2011-04-13 05:28 · Score: 2

Because the public sector has very little time to handle FOIA requests and they sometimes cost more money to complete than I'm willing to pay (usually because they don't do much of their own data work in-house and have to call on a contractor to do it for me), I use their websites to glean the data I want.

Last week I gave a talk about using SAS to do screen scraping and then perform analysis on the data of jail inmate registries and level 3 sex offenders in MN. I have dashboards of the data available on my website and as I mentioned in my presentation it has even been used to help one county avoid what could have been a serious privacy issue.

So while there are any number of pitfalls to screen scraping (not understanding the meaning of the data and trends, being fed incomplete or purposefully incorrect data, or even being banned outright) screen scraping can be great for learning about and reporting on the public sector when they are physically or financially incapable or simply unwilling to do it themselves.

He's an Idiot with Plenty of Company by RobotRunAmok · 2011-04-13 05:44 · Score: 2

Slashdot is filled to the brim with people who take the time to create an alias and then list their homepage on their profile, which of course, is displayed in a link on the same line as their alias in the post they just made.

I click on those homepages whenever I read something really stupid or ridiculous or inflammatory or completely polar opposite my perspective. Which is to say, I click on them A LOT. I am amazed at how many of these "homepages" are links to commerce sites, or sites advertising some kind of service.

"Why," I inevitably ask myself, "would I ever buy anything from you, you knucklehead, you?"

It's like the guy who walks into a business meeting with a potential new client, someone he's never met before, wearing a big "I Love Obama!" button on his jacket. Or an equally large "Palin/Romney '12" button. Sure, you appreciate their passion -- maybe... if you agree with their POV -- but you immediately question their common sense, maturity, and business acumen.

Re:He's an Idiot with Plenty of Company by plover · 2011-04-13 15:08 · Score: 2

"Why," I inevitably ask myself, "would I ever buy anything from you, you knucklehead, you?"
You aren't supposed to buy from them. The link isn't there for your benefit. It's an SEO trick, part of the strategy in trying to raise the page rank for that site.
If you run a blog, you'll find you'll get a commenters that say stuff like "hi, your site is a good understand! one for my book marks." It's flatteringly nice, and obviously English isn't their native tongue, so you thank them for their kind words. And with luck, you may not follow the link in their user name, which you might then discover links to some Russian site, which if you bother to visit with a translator looks like some kind of news aggregator page. "Even weirder", you think.
Eventually, you realize that the comment they posted is utterly generic, and could have applied equally to a cooking site or a fishing tutorial site. But why link to a news aggregator? You can peel the onion further, dig around the news site, and never find anything that appears to be of value. If you look at the collection of them, however, you discover it's but one plot in a link farm that ultimately links to a lot of sister sites, and all of them have links to the companies that paid them for the optimization. You'll finally realize there's a whole fake web of links out there that exist strictly to boost Google's page rank of their customer's sites.
The best way to fight them is to make sure your blog software adds rel="nofollow" to any href tags providing links to user-supplied URLs. Most SEO spammers know that Google won't use those links when computing pagerank, and will hopefully leave your blog alone.

--
John

my profile by Anonymous Coward · 2011-04-13 05:56 · Score: 0

I would like to see the profile they have build of me.

a.c.

I worked for a social scraping company... by sdguero · 2011-04-13 05:59 · Score: 2

The company was SEM/SEO then they moved to social optimization and scraping. It was a black art, like the SEO stuff, and totally dependent on the provider (in this case facebook and twitter) to not change anything. It's the same basic the problem with SEO and Google; if facebook's (or Google's) API coughs the social media scrapers (or SEM/SEO people) get pneumonia. If Facebook wants to stop it, they can do so fairly easily.

Unfortunately for privacy, a huge part of FB's business model (like Google) is selling that data to the scrapers and the scrapers' clients.

Re:I worked for a social scraping company... by Anonymous Coward · 2011-04-13 10:52 · Score: 0

PatientsLikeMe, the company mentioned in the article as being upset about scraping, sells the same data. I'm not convinced it's user privacy that they're worried about.

Re:"We (/.) ban scrapers..." LOL by mgcleveland · 2011-04-13 06:00 · Score: 1

I think the point they're making is that crawlers which do not obey the rules spelled out in robots.txt are blocked.

Marketing is a sham by xanthos · 2011-04-13 06:04 · Score: 1

Face it, the type of people who go into marketing have very little to offer this world. Their whole reason for existence is to hopefully sell something to somebody who might not otherwise buy it. The only redeeming aspect of marketing is that it is a non-violent sinkhole in which to drop money, vs say a war in some God forsaken desert.

Have you ever met a marketing/advertising person who actually liked people?

--
Average Intelligence is a Scary Thing

Re:Marketing is a sham by Anonymous Coward · 2011-04-13 06:31 · Score: 0

As opposed to egomaniacal nerds with no social skills and persecution complexes?
Re:Marketing is a sham by Jeng · 2011-04-13 07:39 · Score: 1

Marketing Marketing Marketing, where the real money from the movie is made!
I was going to post a response agreeing with you, but the more I think on it, well....
Marketing subsidizes my entertainment choices, considering how much Geico spends on advertising I think basic cable would collapse if Geico stopped advertising.
Marketing also helps the company I'm at. Our marketing consists of our catalog and website with our products and pricing. Without that how would our customers know what to buy from us? Some level of marketing is necessary.
Also, the marketing department where I work is full of some real cool people who do indeed like people.

--
Don't know something? Look it up. Still don't know? Then ask.
Re:Marketing is a sham by Anonymous Coward · 2011-04-13 10:23 · Score: 0

Slashdot might not like this, but I think marketing done right is probably closer to a science than computer programming is. It certainly is far more rigorously mathematically based.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 06:38 · Score: 2, Interesting

Humans do not make 100 requests in a 10 second timespan, nor do humans traverse every post made by every user..

That's what I use a Greasemonkey script for, you insensitive clod!

Stalking? by b4upoo · 2011-04-13 06:41 · Score: 1

Collecting data about others is somewhat an essential freedom. But my view and the modern view differ as most people do not feel the same way. But if we take the usual view any company collecting data about a specific person could be charged with stalking. We usually think of a pervert stalking a child or pretty girl. But stalking is stalking regardless of whether it is a corporation or a pervert. The motive for the stalking is irrelevant. Considering the current mood huge civil suits might take place and even criminal prosecutions might be applied. This is one demonstration of why hacking and social engineering need to be legal. After all, how will you ever know to what degree others are studying you without being able to penetrate their data? Restricting hacking is a path to tyranny that is quite direct and predictable. The natural balance is to allow all people and groups to completely study each other in great depth.

Re:"We (/.) ban scrapers..." LOL by Ares · 2011-04-13 06:42 · Score: 1

iptables -a INPUT -j DROP $Bad_Scraper_IP_Address

Re:REGEX + Python or PERL could collect data on yo by Anonymous Coward · 2011-04-13 06:48 · Score: 0

this is where the REGEX work comes into play, and yes, it does work.

This is slashdot. We know what regexp is. We don't need to have it capitalized or explained. We also know what perl and python are. Please stop capitalizing things. Also, while you're at it, please stop putting "random" things in "quotation marks."

And I'm sure you're capable of writing a crawler to go to a /. ID page and compile a list of all their posts. This is not a black magic art. Call us when you can do something special.

Re:"We (/.) ban scrapers..." LOL by Culture20 · 2011-04-13 06:58 · Score: 2

mod_security is pretty handy at spotting crawler patterns (you have to be a really weird human or a well designed crawler to look like something you're not).

EULA should stop this behavior by hrieke · 2011-04-13 06:58 · Score: 1

Add a line in your acceptable use / EULA section stating that you expect the user of the account to be human and that any attempt to scrape the data off of the server is fined at $100,000 per message, plus $10,000 to each message author.

--
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIIIV IIVIIIIIIVIII...

Re:EULA should stop this behavior by Anonymous Coward · 2011-04-13 07:35 · Score: 0

Yeah, because the likelihood of collecting any such fines is high enough to justify the time you spend adding the line to the EULA. Not.
Re:EULA should stop this behavior by Just+Some+Guy · 2011-04-13 08:04 · Score: 1

Add a line in your acceptable use / EULA section stating that you expect the user of the account to be human and that any attempt to scrape the data off of the server is fined at $100,000 per message, plus $10,000 to each message author.
And also, you reserve the right to sue the Tooth Fairy for lost unicorns.
There is no "legal gray area" in scraping. By publishing data on a public webserver, you give consent to clients for viewing it. And what does "the user of the account to be human" mean, anyway? Presumably, humans will eventually view the data downloaded by the scraper. Challenge of the day: give me a legally watertight definition of "web browser" that includes user agents like Lynx (which downloads data from a remote server and presents it in a manner almost exactly unlike Firefox), and excludes a scraper (which downloads data from a remote server and presents it in a manner almost exactly unlike Firefox). Bonus points if your definition also accounts for screen readers for the blind, HTML-to-WAP gateways, ad-blocking proxies, and iPhones. Go ahead; we'll be waiting.

--
Dewey, what part of this looks like authorities should be involved?
Re:EULA should stop this behavior by hrieke · 2011-04-13 08:17 · Score: 1

Two minutes of your time to insert the HTML?
A day for your lawyer to write up the text, who is either on a retainer or works directly for your company?
That was hard.

--
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIIIV IIVIIIIIIVIII...
Re:EULA should stop this behavior by hrieke · 2011-04-13 08:28 · Score: 1

Sure- Automated process that stores the results in a database or is otherwise used in a system where the results are aggregated and retrievable for 4th party consumption with a method to tie back to a person.
That wasn't difficult at all. Just because I write something for consumption to the members of a particular web site (assuming that it's NOT out in the public like Slashdot's or any other comment system), I would not expect it to be slurped up and sold by 3rd parties. On a member's only web site, such as talked about in the story, the inclusion of my EULA statement would be a strong deterrent against these scrapers.

--
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIIIV IIVIIIIIIVIII...
Re:EULA should stop this behavior by Anonymous Coward · 2011-04-13 09:30 · Score: 0

Automated process that stores the results in a database or is otherwise used in a system where the results are aggregated and retrievable for 4th party consumption with a method to tie back to a person.
Like Firefox.
Re:EULA should stop this behavior by Anonymous Coward · 2011-04-13 10:05 · Score: 0

Compare those costs to the one in a hundred million chance of collecting your fine, and you'll find it is NOT worth your time.

Re:"We (/.) ban scrapers..." LOL by hoggoth · 2011-04-13 07:04 · Score: 2

A smart discrete scraper will scrape breadth-first, ie: scrape 100 websites alternating the next page from each site in turn, instead of the next page on a single site until that site is finished. Some scraping on active sites like Slashdot or just Google's spidering is never done; It just continues on as new content is created. It would be easy for a scraper to act just like a human on Slashdot, just keep clicking 'refresh' every once in a while. An astro-turf post from GNA would really throw the admins off the trail.

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)

Reporting Back... by istartedi · 2011-04-13 07:08 · Score: 2

The report is back sir, and the results are disturbing. Almost everybody likes sex, and a lot of them are weird. The ones that don't like sex have very strange hobbies. The ones that don't abuse illegal drugs are abusing legal drugs, and almost nobody weighs what they say or looks like their online picture. What should we do?

(boss pauses for a moment) "Don't hire anybody ever again".

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?

Nonsense by Anonymous Coward · 2011-04-13 07:17 · Score: 1

Its ridiculous to expect users to anticipate and thwart privacy invasions. These companies could be shut down overnight (or at least rendered illegal) with common-sense legislation. The problem is not users, it is their bought-and-paid-for "representative" government(s) which sell out their constituents to be deceived and abused by sleazy industries.

Re:Nonsense by plover · 2011-04-13 14:09 · Score: 1

Its ridiculous to expect users to anticipate and thwart privacy invasions. These companies could be shut down overnight (or at least rendered illegal) with common-sense legislation. The problem is not users, it is their bought-and-paid-for "representative" government(s) which sell out their constituents to be deceived and abused by sleazy industries.
It's "ridiculous"? Someone held a gun to your head and told you to post your oh-so-pitiful life story on line? They made you post that picture of you drinking with some friends at a stripper bar, or the story about that time you were snorting coke off a hooker's ass? You think some all-powerful government should come and save your irresponsible neck from someone else trying to make a buck off your drunken stupidity, and do so by censoring your writings from them? And you think that doesn't sound ridiculous?
It's quite simple. If you don't want to share it with the world, DON'T SHARE IT WITH THE FUCKING WORLD.

--
John
Re:Nonsense by Jane+Q.+Public · 2011-04-13 16:43 · Score: 1

"Its ridiculous to expect users to anticipate and thwart privacy invasions. These companies could be shut down overnight (or at least rendered illegal) with common-sense legislation. The problem is not users, it is their bought-and-paid-for "representative" government(s) which sell out their constituents to be deceived and abused by sleazy industries."
Not really. I mean yes, in part. Some of what OP was talking about is completely free (as in freely available to anybody) public information. But OP doesn't like scrapers because (1) if used irresponsibly they can hit servers too hard for comfort, and (2) while the information might be freely available, it takes "normal" people a lot of time to go online and sort through all that information, while a scraper can grab it and sift it in a very short time indeed.

But OP doesn't seem to be accounting for a couple of other situations. For example, a lot of people gathering information automatically might be doing it for academic or other "legitimate" purposes, without any intent to sell information or otherwise violate privacy. It is true that if someone wants to do that, it may not be unreasonable to expect them to contact the site manager and say, "Hey... we want to scrape your site with THIS account, for this purpose, and we will sign a paper saying that personal information will not be gathered and distributed." But on the other hand, that can be a pain, and it can take days to get permission for even one site. If responsible, the site managers might insist on knowing exactly how the information is to be used, etc., taking even more time. Or they may just not bother to respond at all. Easier to just do it.

I do agree with you that proper legislation could help solve the problem. The U.S. Senate is about to debate a law stating that trackers must all allow people to opt out. While that is definitely a step in the right direction, the simple fact is that opt-out still favors the assholes of the corporate world. Tracking problems will never be anywhere near controlled until we have a law saying that "anybody collecting personal information (defined in appendix A) by electronic means, for commercial use, may only collect information from people who have specifically given permission for that information to be gathered." We have such laws about other forms of communication, including electronic. There is old (and good) legal precedent.

In other words, we must have a law specifying opt-in only, not opt-out. Even opt-in will not get rid of all the problems (some will still do it illegally until they are caught), but there is no doubt whatever that it is the right and proper thing to do.

We run a "scraper". by Animats · 2011-04-13 07:26 · Score: 1

Our SiteTruth system does some "scraping". We're looking for the name and address of the company behind the web site, so we can check the business out. We also look for ad links and a few other things, like BBBonline seals, which we check. We use a user agent name of SiteTruth.com site rating system. We don't look very deeply into a site; if after examining the most likely 20 pages, we haven't found out who runs the site, we figure they're not going to tell us. The site is down-rated accordingly.

Our experience is that 0.1% of sites have a "robots.txt" file that tells us to not look at any pages at all. We don't look at those sites, and their SiteTruth rating information says "Blocked". Total exclusion of crawlers is rare. Most sites want some visibility.

One of the more amusing uses of a "robots.txt" file used to be seen on Marchex (the "What you need, when you need it" domainer) pages. The site wasn't blocked from crawling, but the link to the page that told you about Marchex was. That, we suspect, was to keep search engines from noticing that all those domains were really one business. That didn't help Marchex much. Marchex (NASDAQ: MCHX) is still around, stock way down from the peak and reporting a slight loss this quarter.

We do have one exception to obeying the "robots.txt" file. We look at the home page of the site to see if it's a redirect before looking at the "robots.txt" file. Some sites have both a redirect and a "keep out" robots.txt file on the same domain. This is like posting signs that say "Keep Out" and "Please Use Other Door" on the same entrance. That contradiction was apparently a workaround for an old Google crawler bug. Google would index both "example.com" and "www.example.com" separately, then consider them duplicates, which caused some SEO problems.

Actually logging into sites from a crawler is just wrong. I'm amazed that a deep pocket like Nielsen would do that.

Re:We run a "scraper". by Kalriath · 2011-04-13 16:25 · Score: 1

Hmm. Sitetruth seems to be a little flawed. Not the least because it considers itself to be a little questionable, and secondly because it doesn't consider the possibility that a subdomain might have more authoritative information than the main domain (for example, "store.company.com" might have an EV certificate, giving you a high assurance of identity and location, while the main site at "www.company.com" has no high assurance sources). I also notice the complete lack of contact information. Ironic, for a company that claims to be a legitimate scraper performing a valuable service - specifically identifying sites with "questionable" identity.

--
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
Re:We run a "scraper". by Animats · 2011-04-13 19:57 · Score: 1

for example, "store.company.com" might have an EV certificate, giving you a high assurance of identity and location, while the main site at "www.company.com" has no high assurance sources
It's rare to see that. Know of a significant example? One might expect it for "store.yahoo.com", but that site won't even accept a HTTPS connection. Neither will "disney.go.com". Citibank has separate certs for "www.citibank.com" and "online.citibank.com".
Contact information is on the "about" page.
Re:We run a "scraper". by Kalriath · 2011-04-14 11:12 · Score: 1

Ah, there it is - why didn't I see that email address before. I might email you guys some specific examples now that I can see how.

--
For a site about things like basic rights, Slashdot users sure do like to censor "dissent".

Re:"We (/.) ban scrapers..." LOL by CCarrot · 2011-04-13 07:38 · Score: 1

... nor do humans traverse every post made by every user.

...unless they have a fistful of mod points to spend...heck, sometimes I'm just very interested in a story and want to see what everyone has to say about it. True, that doesn't happen often, and I certainly don't read 10 posts a second, but it does happen...

--
"I love animals! Some are cute, others are tasty, what's not to like?" - Betsy Schroeder, Jeopardy contestant

If already not following the rules by HikingStick · 2011-04-13 07:44 · Score: 1

If the scrapers are already not following the rules laid out in the robots.txt file, what's to say they'll honor your ban. They'll find some way around any technical means of blocking them, in time.

--
I use irony whenever I can, but my shirts are still wrinkled...

Re:If already not following the rules by Bucky24 · 2011-04-13 11:23 · Score: 1

I'm pretty sure by ban he meant an entry in an .htaccess file banning the IP, not a line in a text file saying "please keep out"

--
All the world's a CPU, and all the men and women merely AI agents
Re:If already not following the rules by HikingStick · 2011-04-14 04:26 · Score: 1

Right, but if one IP address (or even a range) is blocked, all they need to do is move to another IP address. There are plenty of ways to spoof IPs, too.

--
I use irony whenever I can, but my shirts are still wrinkled...

Some bad practices in HR that needs to end by yuhong · 2011-04-13 07:53 · Score: 2

On this topic, here is some bad practices in HR that needs to end:
1. Hiring based on stereotypes is NOT a good idea.
2. The purpose of HR should not be to minimize legal liability.
3. The illusion that celebrities are perfect needs to end.
4. Filtering people based on health problems to minimize health insurance costs is not a good idea.
5. Not hiring people based on debt creates a paradox for those who have to pay it off.
And as a side note, companies with seriously broken HR often have other problems too.

Re:Some bad practices in HR that needs to end by Jiro · 2011-04-13 07:59 · Score: 1

If you don't try to minimize legal liability, you'll find yourself with more legal liability than you need. And legal liability really hurts.
Re:Some bad practices in HR that needs to end by yuhong · 2011-04-13 08:19 · Score: 1

But it should not be the primary purpose of HR.
Re:Some bad practices in HR that needs to end by Anonymous Coward · 2011-04-13 08:54 · Score: 0

it is not, it is only 1/5th of their purpose.
Re:Some bad practices in HR that needs to end by nastyphil · 2011-04-13 09:08 · Score: 1

If you don't try to minimize legal liability, you'll find yourself with more legal liability than you need. And legal liability really hurts.
Liability only hurts if you have done something actionable.

--
Dialectician. Archology.
Re:Some bad practices in HR that needs to end by Anonymous Coward · 2011-04-13 16:02 · Score: 0

You have lawyers who will fight baseless lawsuits for free? Send me their contact info!
Re:Some bad practices in HR that needs to end by Hognoxious · 2011-04-13 21:32 · Score: 1

Liability only hurts if you have done something actionable.
Anything is actionable, in the sense that somebody can sue you for it. And even if the case is laughed out of court in five minutes you're still looking at a few grand in legal fees, wasted time etc.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:"We (/.) ban scrapers..." LOL by sharkey · 2011-04-13 07:55 · Score: 2

Actually, it stops ALL "decent" crawlers. It's the ones that behave indecently that ignore robots.txt.

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.

they may have already gotten you by bityz · 2011-04-13 08:16 · Score: 1

Even though you never post a thing, someone else may post something about you. You may already be tagged in multiple photos on Facebook. You may have loan applications visible on the web. Your information is not entirely under your control - with pervasive digital storage, constant security challenges, and an increasing cultural trend to blurring the line between public and private, there is a growing chance that your information will leak out into the public.

DNA Scraping? by jasno · 2011-04-13 08:29 · Score: 1

Would that be legal? Could I setup a company that collected DNA samples without their owners permission(say, by tying the hair clippings from a salon to the CC that paid for the cut)? Could I sell that info to the government?

If no one's done it, someone should, if for no other reason than to scare the shit out of people and hopefully wake them up.

--

http://www.masturbateforpeace.com/

Re:DNA Scraping? by King_TJ · 2011-04-13 10:09 · Score: 1

Umm..... yes, someone obviously could do it, but you'd probably have some difficulty linking up the clippings you found to specific individuals. (I mean, would you propose the hair stylists themselves start indexing their customers' hair clippings? They'd be the ones who know their clients' names, addresses and phone numbers since everyone's in their computer system already. If they started acting as the data collectors for this type of operation, it would cause a big loss of business when people started finding out -- so most salons would probably ban the practice, regardless of its legality.)
And just as a somewhat related side note? My g/f is Jewish and brought up the fact that some Jews already believe in not leaving any toenail or fingernail clippings behind. They collect them to destroy them by burning them, etc. Granted, it's based on very old scripture and so doesn't say anything about concerns about people obtaining one's DNA .... but it's interesting that maybe they were onto something anyway!
http://answers.yahoo.com/question/index?qid=20100114112104AAz2PtZ

Screen Scraping's been GOOD to me (& mine) by Anonymous Coward · 2011-04-13 08:35 · Score: 0

I've been doing that for years now, & for a good purpose: To populate a custom HOSTS file with data to block out known bad sites & servers + bad hosts-domains from 7-8 very reputable sources for said material.

I do this as an added layered security approach to things online for myself, my family, & friends... it works for faster & safer online experiences.

(Simply because this file covers ALL webbound apps you have, & it's run @ the TCP/IP stack level in kernel mode (acting merely as a filter for that system to utilze)).

I've just built/co-built/rebuilt a system for that that is better in many ways than its predecessors, in fact...

(3rd one now, 1st was in Borland Delphi Object Pascal, 2nd was Ms-Access SQL (for normalization portion only though), & lately it's Python with REGEX work)...

In fact, that system's running on multiple threads in timings as I type this, and even when I sleep...

(Which takes a burden of 20 minutes work away from me I used to have to do in the a.m. or evenings, before... now? Now I don't HAVE to, anymore, lol, YEA!)

That "all said & aside":

This type of system's NOT THAT TOUGH TO BUILD, not really, because tools like PyThon, Perl, & even std. *NIX shell commands can be "popped together" with some regex work to do so... pretty damned easily too, once you have the "base toolkit" & process in place for it, & really for ANY KIND OF DATA ONLINE!

Too bad folks are using it to potentially & perhaps more than potentially bogus purposes vs. one another... this is human nature I suppose, the beat side (unless the person's a known killer & such, then I'd think it was fair to warn others perhaps... there's always "shades of gray" in any situation, & I don't like "absolutes")

I guess what I am trying to say here though, is this:

Not ALL "Screen-Scraping" going on, is bad...

(My reasons for that, are from what I consider the most insightful portion of your reply below (really well said man)):

---

"Sadly, the Internet has become more of an adversarial game than a way to unite people." - by betterunixthanunix (980855) on Wednesday April 13, @01:16PM (#35809838)

Sometimes, it seems that way, doesn't it? Especially with articles like this one... really, Really, REALLY "well-put" on your part though!

However - Again/Lastly in closing: Not all of the "screen-scraping" stuff online is for "nefarious purposes", sometimes, it's for the general good of others too (per my reason for doing it myself, noted above @ the top of my reply to you).

APK

P.S.=> Didn't mean to "ramble", or go into "too much detail" (because with a "handle/username" like yours, you probably KNOW the detail I am guessing here... the detail was more for those that don't know this stuff) - I really liked how you closed you post though, made me think a bit is all... apk

Re:Screen Scraping's been GOOD to me (& mine) by Anonymous Coward · 2011-04-13 11:58 · Score: 0

Hey, it's APK, the "hosts file guy" again! How's life in the 90's, man?

Re:REGEX + Python or PERL could collect data on yo by Anonymous Coward · 2011-04-13 08:48 · Score: 0

Please don't respond to Mr One-Note Samba. As you can see, it just encourages him.

Mod him down if you've mod points and you feel so inclined; otherwise just ignore him.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 09:11 · Score: 0

They ban scrapers precisely because they don't follow robots.txt.

Further trolling by the 'AC troll'? LMAO! apk by Anonymous Coward · 2011-04-13 09:20 · Score: 0

It's funny seeing an ac troll run from replying here http://news.slashdot.org/comments.pl?sid=2082332&cid=35811080

Funny how his "speaking on behalf of /." for everyone seems to have been his undoing... lol!

(After all - he RAN from replying when I asked if HE was "all of /." (which we clearly know, he's not (or was it just you again, trolling as AC?)).

So, go ahead: Hit others with your registered 'LUSER" account effete, & useless 'down-mod points", instead of facing the music in the URL above.

(Thanks for proving a point here: That you're the TRULY "anonymous coward" here (and I stress, coward))

APK

P.S.=> Yes, I am assuming that you're probably just TomHudson doing this ac trolling of myself, as is per his usual, shown here quoted in his own words, no less:

http://slashdot.org/comments.pl?sid=1646272&cid=32150544

Some people are pitiful... & there's no hiding from statements like that one shown in that URL above... apk

Re:Further trolling by the 'AC troll'? LMAO! apk by Anonymous Coward · 2011-04-13 10:48 · Score: 0

Poor old APK, he has never acheived a mental age of more than 10, and seems to really believe that his hosts file is useful. Plenty of people other than TH and myself have a great time trolling him, he produces the most amusing rants, littered with irrelevant links and childish triumphalism, when he actually makes himself look more and more like a complete and utter tool.
Watch his reaction, it is bound to be entertaining. I encourage everyone to have a go, its funny as hell!
Its like Slashdot has its own village idiot...... (:
Re:Further trolling by the 'AC troll'? LMAO! apk by Anonymous Coward · 2011-04-14 08:23 · Score: 0

Yes, I am assuming that you're probably just TomHudson...
Then you assume incorrectly... "APK"
I know who you are and how to find you.
If you don't stop pretending to be me, I am going to come to your house and take care of you myself.
You're completely pathetic. ... apk

Pot Color similar to Kettle Color by Anonymous Coward · 2011-04-13 09:36 · Score: 0

CmdrTaco writes: "We ban scrapers like this regularly here simply for not adhering to the rules spelled out in robots.txt."

Well, I put ok.txt in my robots.txt file, and lo and behold the ill-behaved slashdot code went ahead and did a GET on it anyway. Then it had the gall to incorrectly state that I was coming from a proxy.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 10:19 · Score: 0

"We ban scrapers like this regularly here simply for not adhering to the rules spelled out in robots.txt." Hah! robots.txt doesn't stop any decent crawler

I think you misread that. It doesn't say "we ban scrapers, for not adhering to the rules spelled out, in robots.txt", it says "we ban scrapers for not adhering to the rules (spelled out in robots.txt)". The banning itself will not be done using robots.txt... which really should be obvious.

That Black Dude . JPG by Anonymous Coward · 2011-04-13 11:01 · Score: 0

"You Gonna Get Scraped"

Re:"We (/.) ban scrapers..." LOL by no+known+priors · 2011-04-13 11:40 · Score: 1

When they say ban, they mean IP ban presumably. As in, the robot doesn't follow robots.txt, and because of this, they get their ass kicked, and banned. That makes a lot more sense I think.

--
Appended to the end of comments you post. The maximum is 120 characters.

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 11:47 · Score: 0

...

robots.txt is meant to govern honest crawlers, nothing else. There are measures you can take besides robots.txt to hinder a crawler if they aren't following the rules.

No gray area at all by Anonymous Coward · 2011-04-13 12:09 · Score: 0

Scraping often is a cat-and-mouse game between websites, which try to protect their data, and the scrapers, who try to outfox their defenses. Scraping itself isn't difficult: Nearly any talented computer programmer can do it. But penetrating a site's defenses can be tough.
One defense familiar to most Internet users involves "captchas," the squiggly letters that many websites require people to type to prove they're human and not a scraping robot. Scrapers sometimes fight back with software that deciphers captchas.

I don't see a gray area at all. If these scrapers are "hacking" their way into web sites, they are cyber-criminals. Plain and simple.

I wonder if this will alter the relationship... by opus_magnum · 2011-04-13 12:47 · Score: 1

...between generations. I'm not sure how children or students will take you seriously once they will be able to see every dumb thing you did when you were their age.

I am 110% Secure vs. 950,000++ bad sites here by Anonymous Coward · 2011-04-13 13:11 · Score: 0

So... can you say the same?

HOSTS files are free, versatile, reliable, & easy to work with, for speed, security, & anonymity online as their results... & it's very noticeable on ALL fronts noted, especially in combination.

HOSTS FILES GIVE MYSELF, or ANYONE, THESE BENEFITS:
---

A.) Better speed (vs. adbanners & malware that come with them @ times, & hardcoding your sites so no DNS requests are made)
B.) Better online layered security (vs. malicious sites & bots, phishing mails, + bot "Command & Control Servers", etc./et al)
C.) Better "Intangibility" vs. DNSBL (DNS Block Lists)
D.) Better "Anonymity" vs. DNS Request logs
E.) Better reliability (vs. DNS poisoning/redirects, or DNS servers going down)
F.) Best of all? FULL ABSOLUTE CONTROL of it... simple text file edits!

---

(BOTTOM-LINE - I think of them as helping a PC online be the fastest car on Top Gear, with the proven safety of a Volvo @ the same time, from the same free package... & as for it actually working out that way? It does... &, I don't just *think* it: I KNOW IT/SEE IT/FEEL IT!)

APK

P.S.=> To all of those, & you overlooking their merit, due to my usage of the HOSTS file? Well, I get them... You, by way of comparison? Do not.

It's your money & speed + security online after all... apk

Re:I am 110% Secure vs. 950,000++ bad sites here by Anonymous Coward · 2011-04-13 18:48 · Score: 0

How can you be "110% Secure", if all you do is disable the ability to look up the IP address from the domain name? A hosts file doesn't work at all against IP address links that do not have to be looked up.
Also, do you really have a 950,001-line hosts file?
(same AC as GP)

I am 110% Secure vs. 950,000++ bad sites here by Anonymous Coward · 2011-04-13 13:18 · Score: 0

So... can you say the same?

HOSTS files are free, versatile, reliable, & easy to work with, for speed, security, & anonymity online as their results... & it's very noticeable on ALL fronts noted, especially in combination.

HOSTS FILES GIVE MYSELF, or ANYONE, THESE BENEFITS:
---

A.) Better speed (vs. adbanners & malware that come with them @ times, & hardcoding your sites so no DNS requests are made)
B.) Better online layered security (vs. malicious sites & bots, phishing mails, + bot "Command & Control Servers", etc./et al)
C.) Better "Intangibility" vs. DNSBL (DNS Block Lists)
D.) Better "Anonymity" vs. DNS Request logs
E.) Better reliability (vs. DNS poisoning/redirects, or DNS servers going down)
F.) Best of all? FULL ABSOLUTE CONTROL of it... simple text file edits!

---

(BOTTOM-LINE - I think of them as helping a PC online be the fastest car on Top Gear, with the proven safety of a Volvo @ the same time, from the same free package... & as for it actually working out that way? It does... &, I don't just *think* it: I KNOW IT/SEE IT/FEEL IT!)

APK

P.S.=> To all of those being experienced in MY favor:

(& you overlooking their being HOSTS Files' merits)

Due to my usage of the HOSTS file? Well, I get the benefits listed above... You, by way of comparison? Do not. It's your money & speed + security online after all... apk

Re:"We (/.) ban scrapers..." LOL by Anonymous Coward · 2011-04-13 14:19 · Score: 0

Robots.txt only stops decent crawlers: Ones whose operators have set them to follow the directives in robots.txt files.

On the other hand, I would guess that blocking the IP addresses of "users" who are bulk downloading multiple discussion threads simultaneously would be approximately 100% effective. Guess which method ./ uses to ban scrapers?

Let's spend more cash on publicity? by vlueboy · 2011-04-13 15:15 · Score: 1

Open source has an uphill battle educating the masses as more uneducated people join it with zero expectacions of passing some required level of readiness prior to being let loose online.

Merge a good version of a "secure" OS, like Debian, say, Ubuntu with a paranoid version out there where your proposed security is ON by default --no need to know where to get Adblock for grandma's firefox. Test and tweak to ensure the security doesn't cripple the top 50 websites, (youtube, facebook, myspace, hotmail, google services, etc) and call it "Securiva 2012" so that the newbies go "hmm, it *must* be good because it's selling a year in *advance* of 2011, like any good new car model (free discourages people, but good enough things will get pirated anyway). Sell it at the bargain bins next to those 10 dollar games. Next year, do the same battery of tests to remove/add sites, and release "Securiva 2013". Better yet, make it automated by default a la Chrome. Make sure your users understand that their data / programs need to be manually checked between scheduled upgrades, or perhaps charge extra for use of the "the cloud" to keep the data safe and just test the programs.

Speaking of forking, I have marveled how forks of Good(TM) Open Source distros are so obscure to even us IT geeks that even if good, they have no chance of getting the attention they deserve and helping out the common unprotected newb. For every, say, 10000 Windows users there may be 1 user of $TOP_BRAND_LINUX, but why doesn't every $TOP_BRAND_LINUX user know and PREFER $NEWERTOP_BRAND_LINUX_FORK? To illustrate more or less, pretend instead of OSs, we're comparing adoption of Google Chrome among geeks to how many geeks even KNOW about Chromium. Let's ignore informed /. geeks --think about your wife's or grandma's "assisted" choices when all they have is US for security consultation.

Re:Let's spend more cash on publicity? by Anonymous Coward · 2011-04-13 15:45 · Score: 0

Jesus. Mod -1 retarded.

He made a GOOD comparison though by Anonymous Coward · 2011-04-13 15:50 · Score: 0

Chrome vs. Chromium (& how many people actually KNOW about the latter's existence, period...)

APK

P.S.=> I use the latter... apk

Re:"We (/.) ban scrapers..." LOL by yacc143 · 2011-04-13 19:29 · Score: 1

Well, considering that there are two additional escalation steps:

*) emulate a human-like access pattern that works at a human-speed.

*) passively record data via a proxy when you normally browse.

Add to this multiple IP addresses, and catching your scraper becomes so much more problematic.

How it works vs. hosts-domain names (& IP) by Anonymous Coward · 2011-04-14 00:30 · Score: 0

"How can you be "110% Secure", if all you do is disable the ability to look up the IP address from the domain name?" - by Anonymous Coward on Thursday April 14, @02:48AM (#35815214)

Most malicious sites aren't driven by IP addresses in attacks. Most attacks are driven by URL's in phishing mails, or malicious links, or bogus adbanners. In fact, the ratio of that is roughly 99.9% in favor of them using hosts-domain/subdomain names, because they pay for them.

I.E.-> You use an IP Address, & once it's "shut down"? It's shut down. HOWEVER, if the authorities shut down a host-domain name of a bogus server/site?? Well - The hacker/crackers can just go to another hosting provider & UP THEY GO AGAIN (either way (ip addy or host-domain name)? The bad guys pay for them, so using hosts/domain names can be RECYCLED & reused again, simply by changing hosting providers/registrars/etc.!).

(The RBN was notorious for this... as is Zeus/SpyEye currently in fact...)

---

"A hosts file doesn't work at all against IP address links that do not have to be looked up." - by Anonymous Coward on Thursday April 14, @02:48AM (#35815214)

Right, which is the "why" of WHY I noted using firewalls too... I put any malicious IP addresses (yes, my sources put those up too) into firewall table rules (in software, or my firewall-router (linksys)).

---

"Also, do you really have a 950,001-line hosts file?" - by Anonymous Coward on Thursday April 14, @02:48AM (#35815214)

Yes, and it's growing right now as I write this... I have been building it up (and entries out of it when sites prove clean, my sources DO provide removal lists too) since 1998 or thereabouts in fact...
APK

P.S.=> Would you like to try it? It works... it REALLY, works (for better speed, security, & even some better "anonymity" online)... apk

Re:How it works vs. hosts-domain names (& IP) by thejynxed · 2011-04-15 14:25 · Score: 1

Until you get a virus/trojan that decides to overwrite your HOSTS file first thing after it roots your machine.
Oops.

--
@Mindless Drivel: 100% of Twitter posts ever Tweeted.

Well, tomhudson quoted says QUITE otherwise by Anonymous Coward · 2011-04-14 08:44 · Score: 0

"Then you assume incorrectly... "APK" - by Anonymous Coward on Thursday April 14, @04:23PM (#35821728)

Or, did YOU not say others should stalk & troll me as AC replies, here:

http://slashdot.org/comments.pl?sid=1646272&cid=32150544

You're FLAT-BUSTED, as a stalking ac troller, TomHudson, & funniest part is? BY YOUR OWN WORDS captured in that link above, & this quote from it, verbatim:

"Wait until he starts on another kick, then reply to him as an AC. It's the new meme". by tomhudson (43916) on Sunday May 09 2010, @08:29PM (#32150544) Homepage Journal

No, no - the "new meme" is EXPOSING you as a stalking, trolling, libelling scumbag tomhudson, & YOU'RE DOING THE JOB FOR ME!

APK

P.S.=> Thanks for being SO especially "transparently stupid", most of all, & easily caught in the act + making it easy for me to do so & expose you all in the same "motion" by your own words being your UNDOING... apk

Re:Well, tomhudson quoted says QUITE otherwise by Anonymous Coward · 2011-04-14 18:17 · Score: 0

LMAO.
And no, I most definitely am NOT TomHudson.
And you are not Alexander Peter Kowalski. You've been using my name for at least 10 years, tho. And putting it on your CRAPWARE!!! BUT YOU ARE NOT ME.
Maybe you are Yuri Klastalov?
Don't forget, I know where you live. "...your days are numbered"
You are completely pathetic.
"The REAL" APK.

"Impersonating me" now? Failing THAT too, lol! by Anonymous Coward · 2011-04-14 08:53 · Score: 0

"I know who you are and how to find you. If you don't stop pretending to be me, I am going to come to your house and take care of you myself. - by Anonymous Coward on Thursday April 14, @04:23PM (#35821728)

Additionally, it APPEARS you are trying to "impersonate me"... poorly done job!

(I.E.-> You don't have my posting "style" down, @ ALL... you'd make a LOUSY "forger", that's certain enough!)

APK

P.S.=> Also, lastly? IF you're trying to "threaten or scare" me?? LOL, come to my house then & face me then... we'd see who "scares who" then. Of course, you'd NEVER do it "man to man", this I am certain of based on your ac stalking/trolling b.s. directed MY way here tomhudson!

First off, lol, I wager this much:

You'd be slaughtered BY THIS NEIGHBORHOOD ITSELF first

(Unfortunately, it's one of the most violent crime areas in this nation - 3rd place last time I checked in fact (after larger metros like NYC/L.A. etc), but... nobody "messes with me" in this neighborhood - I took care of THAT, long ago in fact, & the "creeps" know better, word "gets around", fast, once you "park one of that kind good"))... apk

Re:"Impersonating me" now? Failing THAT too, lol! by Anonymous Coward · 2011-04-15 06:49 · Score: 0

SO... the best you can do when caught pretending to be me is to make threats you'll never be able to act on? You're going to come to my house, right? Next you'll say you're going to rape my kids, is that right?
Bring it on, buddy.
Why should I try to imitate YOUR posting "style"? You've been trying to make me look like an illiterate moron for the last 10+ years.
The FAKE apk = stagnated.
The FAKE apk = exposed.
Posting bogus Windows crap under MY name... Shit, everyone who knows me in real life knows I'm a lifelong Mac user!!
You are NOT ME.
You are a pathetic FAKE. ...the REAL apk.
(P.S. And no, I am *not* TomBarbaraHudson, either. S/He is ABSOLUTELY telling the truth about that!)

STILL trying to "impersonate me"? Please... by Anonymous Coward · 2011-04-14 23:51 · Score: 0

Anything on arstechnica? Is a lie, or it was my posts being edited by them, and not only on THEIR forums, but on their members' forums. Case in point:

Jeremy Reimer of arstechnica was caught IMPERSONATING ME on his own forums, and admitted it at Windows IT Pro (and on his forums before he moved them to another hosting provider & started it again)... so, you think arstechnica doesn't pull that kind of crap? See this:

"Anyway the "APK" registered here is just an affectionate clone of the original. In fact I prefer him to the original." - Jeremy Reimer - March 25, 2005

http://tech.slashdot.org/comments.pl?sid=1300193&cid=28685295

and here also (Windows IT Pro magazine forums):

http://www.windowsitpro.com/article/internals-and-architecture/the-memory-optimization-hoax#feedbackAnchor

Heck, if you look at the latter one? Reimer even impersonated another person named Martin Meszaros as well. The arstechnica bunch? They have NO problem breaking laws either.

E.G.=> Some of their members (2) even had their websites removed in whole (Jay Little from CrystalTech.com & petitiononline.com) & in part (Jeremy Reimer from Shaw in Canada) for IMPERSONATING MYSELF, email harassing myself, making libellous altered photos of myself, childish songs, DEATH THREATS (this made it serious) & more, which a Det. Felton of B.C. Canada where Reimer lives helped put an end to, finally.

The one that sticks out there, though, is IMPERSONATING ME... which it seems you appear to be trying to do, now, even.

Well, well, judging on past happenings? You must be from ARSTECHNICA then! (the home of the underachiever online...)

---

"And no, I most definitely am NOT TomHudson" - by Anonymous Coward on Friday April 15, @02:17AM (#35825768)

He's the only one that I know of that trolls me as "anonymous coward" postings as he has been caught in it, red-handed, and telling others to do so, here:

http://slashdot.org/comments.pl?sid=1646272&cid=32150544

So, it stands to reason you're just he, doing it yet again here now.

---

"And you are not Alexander Peter Kowalski." - by Anonymous Coward on Friday April 15, @02:17AM (#35825768)

Uhm, last time I checked (when I woke up this a.m. in fact, lol), I was!

---

"Maybe you are Yuri Klastalov?" - by Anonymous Coward on Friday April 15, @02:17AM (#35825768)

WoW... there's a name: Afaik? He is part of the RBN (Russian Business Network) & he hated anyone who was helping to secure Windows for users (which I have actively been doing since 1997 online in fact)... which would make some sense, considering that botnet was floored right around the time he put up his Twitter post saying "Alexander Peter Kowalski can suck my sweaty cock" or something along those profane lines...

---

"You've been using my name for at least 10 years, tho." - by Anonymous Coward on Friday April 15, @02:17AM (#35825768)

No, more like 45++ yrs. (since I came into this world in fact)...

---

"And putting it on your CRAPWARE!!! BUT YOU ARE NOT ME." - by Anonymous Coward on Friday April 15, @02:17AM (#35825768)

Uhm, 1st of all? The link you used?? Dead. Long dead in fact, as I have not had a website @ pixelstation.com outta Trinidad since the late 1990s... what are you, in a time-machine, lol, or WHAT??

---

"Don't forget, I know where you live" - by Anonymous Coward on Friday April 15, @02:17AM (#35825768)

Then, "bring it on", but TO MY FACE then... heck, you'd never even make it to

Y R U ac stalking and trolling + impersonating apk by Anonymous Coward · 2011-04-15 09:40 · Score: 0

Answer the question. We already have a good idea that it's Tom Hudson, based on a quote of his own words saying he was doing that here:

---

"Wait until he starts on another kick, then reply to him as an AC. It's the new meme". - by tomhudson (43916) on Sunday May 09 2010, @08:29PM (#32150544) Homepage Journal

QUOTED FROM -> http://slashdot.org/comments.pl?sid=1646272&cid=32150544

---

So, why are you denying you are merely tomhudson the ac stalker troll of /. then?

Re:Y R U ac stalking and trolling + impersonating by Anonymous Coward · 2011-04-15 19:45 · Score: 0

So, why are you denying you are merely tomhudson the ac stalker troll of /. then?

Because I am not tomhudson, moron.

You are completely pathetic.

Slashdot Mirror

'Scrapers' Dig Deep For Data On Web

158 comments