Actually this is not so similar to Google Base, but a more direct (and small-thinking) rip off of Craigslist, as far as I can see. There is the similarity that Microsoft also have a search engine to directly map over this data, but eWeek are going much too far (also in http://www.eweek.com/article2/0,1895,1877217,00.as p, linked) in ignoring the fundamental differences between a community listing site, an auction site (where the role of the provider is much more hands-on) and the need for Google to get their engine to work with sites dynamically generated from a back-end database...
The article you linked to is nothing more than an article showing you how to quickly find your rankings in the search engine.
The article is just the most clearly attributed piece I could find - as several other posters have pointed out, manipulating search engine rankings is one of the services his business offers.
Now, is the guy a scumbucket for screen scraping content? I don't know. He give links to the locations he took it from. The only difference from RSS feeds is that the content creator may not have wanted their content copied.
Were I a smaller man, I might ask AP and the BBC what they thought to his style duplication of content... he's certainly not using their RSS feeds, or observing their terms.
If "I have the best beatles site in the world!" is the ONLY thing he posted and it was a link to the website, it would be spamming. I'm not saying the site isn't crap, it is, and I'm not saying the guy isn't shady. He's not spamming though.
I do see your point. Link/Blog spam is a well-accepted term, though, and I maintain this is what he's busy with...
Why not write an email to Google about it. I'm sure they will be interested in Page Rank abuse.
More like persuade them that Slashdot should go under the category of blog and its links all treated as potential spam... if they haven't already come to that conclusion.
What follows is my copy/paste response which I'm sure people who haven't dared check out his site yet will be interested in reading
Rather than cut-and-paste my response (that none of the content is original, all scraped), I'd expand on it - Rolling Stone, Top 100 Beauty Sites, Geocities, CGISpy, IconoSearch etc. etc.... not full of adverts, huh?
(And finally, don't make yourself look so stupid posting 'mod parent down' under your real name - grow up, check your facts, and get some mod points...)
The site, though ugly and features incredibly poor color choice and CSS effects, contains a several screens full of George Harrison biography, and less then a couple of dozens links, almost all of which point to other pages on the site and appear to be very George Harrison related.
The first biography is stolen from the Mr Showbiz site (now defunct, but this whole thing is available on the Web in several places).
The second biography is stolen from Associated Press (both are actually attributed, but not linked, if you care to look).
The links, I propose, are trying to solicit reciprocal links (note that there's a dedicated 'Add Our Beatles Link to Your WebSite' link, and that the site has been submitted to various web rings).
As I say, I could re-create this content and similar (including ripping off discographies, and a random set of images via Google) in minutes...
This is the second time today two of his were on the front page, and earlier wasn't the first time either. I suggest we all visit his George Harrison tribute site, there's lots of material there that would take literally seconds to collect on the rest of the web, and lots of useful links to boot...
The link didn't pass with the aggregation on the results I sampled.
About this you're right. Something's driving him to submit all these Slashdot stories though, and if you're right about it being more than just backlinks, I'm guessing it's something to do with this duplication (after all the aggregated stories do link back to the originals and each mention the Beatles at least twice)...
[That "he is a professional spammer"] I have to see proof of.
If this guy goes to every message board on the planet and posts "I have the best beatles site in the world!" as the linking text to his website, and that is the ONLY thing he contributes - he is a spammer
Oh wait, so within the Google results you quoted, he's been making a positive (and Beatles-related) contribution to the Web? Did you even look at what you quoted?
Apart from his 'profession', what makes me suspicious of this person (which I can't believe you don't see) is that his entire George Harrison site is screen-scraped from other sources. There's nothing original there, and I could have made it in a couple of hours...
See elsewhere on this thread - he's on top page of http://www.google.co.uk/search?q=george+harrison, that site is devoid of original content and full of adverts, and he runs a spamming company (and writes articles on spamming search engines)...
Next, Bill Gates will be saying "Look how dangerous the GPL is!
I know you're probably only going for funny points, but just to point out that if these authors had used a proprietary license to allow only a few select people to copy and make changes, like Microsoft do, Sony would still have been infringing their copyright...
Yes, I buy green arabica, decaffeinated by the Swiss method (as you say, with nothing more than water), and had decided this article had nothing of interest to me before it even appeared on Slashdot...
I'd love to understand what made you use that subject, because this precisely is not Semantic Web!
There are no common ontologies (I can just add whatever concepts and attributes I like without any agreement or documentation) and no means for exposure (like RDF) of the marked-up data - it's all internal to their database and hidden behind an interface that doesn't go far beyond keyword search...
It all boils down to:
- Do we trust AOL and Yahoo [...]
Add: Do we trust AOL and Yahoo to make a valid definition (perhaps this is what you meant by honesty).
Even before they start, 'spyware' is not enough, and 'malware' ill-defined, to define installation of 'hidden extras' I do not want. These are both companies who package things I don't want as default options in their own installers - not a good start, even if they're 'up front' about it (and include separate uninstallation procedures).
If there's to be a 'police' force for this, I'd rather it be someone whose hands are completely clean...
Indeed there is some breadth and some allowed delay in how European Directives are implemented by different member states - you Swedes are notoriously for being behind in bringing your IPR law in line with the homogenization we were trying to achieve in Europe (hence the existence of certain BitTorrent sites that we'll not name...) I believe, though, that you pulled up your socks in July, so I suggest again that you go buy yourself a good book on the subject...
You are confusing buying a COPY of a software with buing the COPYRIGHT to a software
No, I'm clarifying the difference between buying the installation media, and having any kind of 'ownership' of (right to execute) the software, rather than a license. You're the one who's confused...
Actually this is not so similar to Google Base, but a more direct (and small-thinking) rip off of Craigslist, as far as I can see. There is the similarity that Microsoft also have a search engine to directly map over this data, but eWeek are going much too far (also in http://www.eweek.com/article2/0,1895,1877217,00.as p, linked) in ignoring the fundamental differences between a community listing site, an auction site (where the role of the provider is much more hands-on) and the need for Google to get their engine to work with sites dynamically generated from a back-end database...
Then read this: http://slashdot.org/comments.pl?sid=169624&pid=141 38729
(And finally, don't make yourself look so stupid posting 'mod parent down' under your real name - grow up, check your facts, and get some mod points...)
The second biography is stolen from Associated Press (both are actually attributed, but not linked, if you care to look).
The links, I propose, are trying to solicit reciprocal links (note that there's a dedicated 'Add Our Beatles Link to Your WebSite' link, and that the site has been submitted to various web rings).
As I say, I could re-create this content and similar (including ripping off discographies, and a random set of images via Google) in minutes...
This is the second time today two of his were on the front page, and earlier wasn't the first time either. I suggest we all visit his George Harrison tribute site, there's lots of material there that would take literally seconds to collect on the rest of the web, and lots of useful links to boot...
Apart from his 'profession', what makes me suspicious of this person (which I can't believe you don't see) is that his entire George Harrison site is screen-scraped from other sources. There's nothing original there, and I could have made it in a couple of hours...
Like the likely PS3-as-PVR -> PSP link.
(PSP as a LocationFree client is already confirmed and shipping...)
Now I'm no expert on search engine spamming, but is it not possible he's capitalising on this?
http://www.google.co.uk/search?q=beatles+%22mit+ wireless+campus+tracking+users%22
Not true - he is a professional spammer...See elsewhere on this thread - he's on top page of http://www.google.co.uk/search?q=george+harrison, that site is devoid of original content and full of adverts, and he runs a spamming company (and writes articles on spamming search engines)...
FEC Deciding Future of Political Blogs
Here are the ones (currently indexed by Google) that were:
Wifi Camera Uploads without Computer
Microsoft Adopts Virtual Licenses
Cisco Updates Network Security Technology
Google and Oregon Launch Open Source Initiative
Open-Source Insurance
Archaeological [sic] Uncovers a New Name
New Server Chip Niagara
Sprint Launchings Music to Mobile Downloads
MIT Wireless Campus Tracking Users
Consumer Friendly Downloads?
Paris Accelerates Move to Open Source
Hopefully he's reading (he does), and will now play closer attention to this submitter.
I hear they're also making the concession of changing colours in the logo - flying the 'white flag of war'!
Both the cars and the insects are beetles!
Yes, I buy green arabica, decaffeinated by the Swiss method (as you say, with nothing more than water), and had decided this article had nothing of interest to me before it even appeared on Slashdot...
There are no common ontologies (I can just add whatever concepts and attributes I like without any agreement or documentation) and no means for exposure (like RDF) of the marked-up data - it's all internal to their database and hidden behind an interface that doesn't go far beyond keyword search...
Please tell me if I'm wrong...
Even before they start, 'spyware' is not enough, and 'malware' ill-defined, to define installation of 'hidden extras' I do not want. These are both companies who package things I don't want as default options in their own installers - not a good start, even if they're 'up front' about it (and include separate uninstallation procedures).
If there's to be a 'police' force for this, I'd rather it be someone whose hands are completely clean...