The Ham and Spam of Weblogs
An anonymous reader submits "Will the blogosphere become just as spammy as Usenet? There may be over 10M weblogs out there, most of them seem to be fake spam blogs created to manipulate the search engines. Scott Johnson, CTO at Feedster, complained that "at times we see upwards of 90% of the traffic from Blogspot being spam," and the problem is likely to only get worse. Can blog search engines like Technorati, Feedster, and PubSub filter the signal from the torrent of noise? Or will we have to seek new approaches such as the social filtering used by Del.icio.us or collaborative filtering used by Findory to separate the ham from the spam?"
I wish Google had an option to exclude blogs from my search. Considering many blogs use b2evolution, phpBB, or whatever, Google could easily determine what IS a blog and what IS NOT and filter it accordingly. Google IMHO would be a much better place if I could exlude blogs and those stupid parked domain search sites from my queries.
::242
I'm not trying to be flamebait; It would be a nice option though.
90% of EVERYTHING is crap. It just happens that weblogs trend toward a specific TYPE of crap -- SPAM. I mean you may think JeffK is crap, but some of us find him funny, so anything with actual content has to be not crap to somebody (if only the creator). That means all the crap must be content-free.
Mal-2
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
I personally feel that as with all technologies, as more hands are in the pot, there will be more people wanting/and or innovating. But, spam's never gonna cease silly :)
Oh well, what the hell
By posting comments rather than actual content, I am able to raise awareness of who I am without having to invest the time in actually doing serious writing. I can offer my opinions on any number of topics without having to find those topics. Many times, my opinions do not reflect the current zeitgeist of any blog I comment upon, but that is one way of garnering readers interested in my comments.
Let's face it, blogs are vanity projects. You could just as simply write in your paper diary and keep it under your pillow like a little girl. Instead, you choose to put your diary on the web and open it up for criticism and comments. This is just another way of demanding attention. Unfortunately, that is pretty time consuming and troublesome. Having to come up with original content every day to keep people coming back to your blog is pretty difficult. Imagine having to make yourself heard in the cacophony of spam blogs. It's a fool's errand.
Which, as I mentioned before, is why I stick with commenting in other people's blogs. I gain all the notoriety of providing interesting and insightful content without having to provide actual content.
It's been easier to become popular by doing this than trying to compete with spam blogs.
Look! it's M0nk3y Man!
Check out the rest of the inbred fat ass miscreants on their staff page:
http://technorati.com/about/staff.html
The guy makes a good point...human validation via captcha. If you're going to spend 10 minutes complaining, whining, bragging and/or loathing about something then you can spend 3 seconds typing in the word "uNFsaQ" to prove you're human.
If it takes you less than 10 minutes to write in your dear diary--I mean blog--then it's probably a 1 liner to the effect of "i think she likez me omglolbbq!!!" and you need to get off my internet.
Problem solved. Next?
"blogosphere"? Considering that blogs are probably the dumbest form of communication possible (a linear log of rambling bullshit) I can only hope that the Blogosphere is destroyed by the Vogon Constructor Fleet to make way for a colonic bypass.
With email spam filtering you have to consider each email separately. A blog has a persistent identity and reputation. In theory, this should make it easier to filter blog spam than email spam. On results of this type of filtering is that it will will penalize new blogs in search results, both spammy and real.
Blog comment spam will remain a problem, of course.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
what is the difference between a first post and your post?
you fail it.
Too much ham and spam is bad for you. No wonder the email servers are always choking.
Who is going to pay $1 to read about how your boyfriend dumped you last week and you're still crying in bed. Blog comment spam isn't all that hard to get rid of (filter links, filter content, or if you're just worried about search engines, use rel="nofollow").
Anyone who has a blog that you have to pay to comment on (or to see) isn't going to get much traffic.
I just wanted to point out that so-called "social software" is not social. Person-to-person communication through computers is mediated and indirect. Technology is a barrier to communication as much as it is an enabler. I agree that it is an enabler in situations where it is used to help overcome disabilities and things of that nature, however technology is used moreso by people who are actually avoiding being social. Email is often preferable to a telephone because it creates an additional barrier between ourselves and the "recipient" (aka person).
A prime example of software in a "social" context is the chatter that accompanies networked video games. This does not form real relationships between people. I heard a teenager recently say that his gaming buddies, who he doesn't even know by name, are like family to him. Technology has helped a whole generation and then some to fail to learn what real relationships are. When a teenager can't distinguish between somebody he's only ever witnessed virtually shoot ze germans and the people who nurtured him before he was able to take care of himself, we have a problem Houston.
And it's only getting worse. Now we've begun adding "social" in front of all kinds of new web applications. Anything that lets other users see your profile and the items you post and comment on them is seen as a valid replacement for real human contact.
There was a line from a movie I saw recently called Crash, where Don Cheadle's character says to his girlfriend "It's the sense of touch. Any real city you walk, you know. You brush past people, people bump into you. In L.A., nobody touches you. We're always behind this metal and glass. I think we miss that sense of touch so much, that we crash into each other just so we can feel something.". The next time we use the word "social" to describe a new type of web application, I think we should give that some thought first.
putfwd.com - 1GB Free file storage with a twist
This was pretty clever
QuickTrack, check it out.
Your hair look like poop, Bob! - Wanker.
I think he was meaning that weblog authors should verify by paying a single $1 dollar fee, not necesarilly the guys who want to comment
It was a bit unintuitive how you add sites to the filter list though -- just cut and paste "http://*.whatever.com/*" into your extensions list and any search results from whatever.com will then be greyed out.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Slashdot is a blog, created in the context of a news site, which we all come to and bitch about things we want out of technology, think is/are cool, and/or hate and want everyone to know why.
That being said, Google (along with other large search engines) have already taken stances on blogging, and are actively pursuing their individual stances. For most, this is creating their own blog service, and doing some shifting in their code to make sure blogs don't come out on top. But this isn't an absolute truth.
If you want these things, and Google doesn't offer them, make your own search engine, and do it better. No, seriously, don't look at me like I'm crazy; there have been over a dozen "major" search engines created after Google, some are only in serious use by geeky populations (AlltheWeb, as far as I can tell, fits this), some by the trendy, some by the "I hate Google"ites, etc. etc. It's as simple as that.
One reason I think Google's strayed from taking such a hardline on blogs is simply out of ease of use. Google doesn't want to complicate life with a million more search options, especially ones you can deal with yourself by subtracting out the majorly offensive sites (-livejournal -blogger -blogspot, etc).
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
Last I checked it was a penny for your thoughts, at most just two cents, now you're telling me you want to charge a DOLLAR!? Well sir, I assure you I am real, and my not sending a dollar proves that I am indeed human.
People are using blogs and forums to post links to their own sites. These links show up as backlinks to Google, and due to Google's ranking procedure that determines which website is the most relevant to each search, each extra backlink pointing to a website can effectively make that website more relevant in the searches.
Luckily, Google is one step ahead of the spammers, and has allowed only one link from each forum to contribute as a valid backlink. Therefore, having 100 forum signatures linking to www.spamdomain.com will no longer give credit for 100 backlinks; Only one backlink will be credited towards www.spamdomain.com. The problem is, alot of people have not realised that Google has done this yet, and as a result, people are still adding 8+ forum signature links in their posts, hoping to cheat the search engine ranking system.
Valkyrie is about to die! Wizard needs food -- badly!
In the Wired article (I know this isn't about spam, but what the hell):
"Lately, it seems like almost every time you tune into your favorite Blogger-hosted blog to catch up on the latest gossip, meme, political diatribe or cybersnark, you find that the site is frozen in time. Or, there are multiple posts with identical content."
Uh, no, not as far as I can tell. "Frozen in time," perhaps, after someone decided to stop blogging, but I used blogger for six months and never had a single hitch. Apparently, googling "blogger sucks" gives you thousands of sites bitching about google's service.
Sometimes there are outages, when you can't get in to alter a post or something similar, but those were few and far between (at least they happened less than half a dozen times in six months, and it only lasted a few hours.)
I guess this is a sign about how popular blogger is. I mean, then only way to balance my experience (zero fatal errors in six months) with thousands of complaints is to assume that there are a HELL of a lot of bloggers out there.
Oh, and to those bitching in general about blogs: please shut up. Yes, there are annoying vanity blogs, but blogger -- and the blogging concept -- has been a godsend to specialists, as well as to political organizing.
Protect your liberties. Donate to the ACLU
Is it irony you are going for?
The link defeats the Firefox pop-up blocker...
Get your Unix fortune now!
Most people in the world still don't have credit cards. Many don't even have bank accounts so that rules out Paypal too. Many credit card transactions are automatically rejected from 'third world' countries and 'rogue states'. These are probably the places that need online expression the most
All you create is a digital divide whereby those with facilities can blog. That just plain sucks, because I really couldn't give a shit about reading some square jawed, straight laced American boring asshole's blog about his SUV and his daughter's school play.
Sorry
99.9% of blogs are crap by people who can't bother to code an web page.
If they spent the time to learn how to create a web page, they'd also learn the value of CONTENT, not blabbering about whatever they ate for breakfast or how someone's opinion effected their enjoyment of that breakfast.
This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
Livejournal, for one, has such a setting where your entries are "friends only". Likewise, you can allow people who are anonymous to not post, or otherwise have various restrictions.
This sig no verb.
If you have a few minutes, click on the randomizer button at the top of the screen that reads "Next Blog" a couple of times. I'd be willing to say that at least 2 out of every 10 blogs is a spam farm.
It's just fucking sad.
Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
All these services like technocrati just measure quantity and not quality. That is why they are so easily spammed and abused. Just by crawling blogs and counting links and cross links won't do it.
And since most bloggers simply recycle and rephrase current events you need a different approach.
I hope it goes where 'cyberspace' and 'surfing' went.
On this site, he's the 12th most popular poster according to the automated Slashrank script engine.
Actually, Usenet is doing quite well. The spam battle has been won; there's very little spam in the technical groups. Serious workers in difficult fields are on there. Check out, say, "comp.games.development.programming.algorithms", where the people who write physics engines discuss how to do it. Or "comp.std.c++.moderated", where proposed changes to C++ are discussed. Usenet has far lower advertising content than the Web, where, today, "content" seems to be a little box in the middle of the page, surrounded by blinking ads.
I use Blogger to manage my blog.
The fact that I know how to enter the necessary information for Blogger to SFTP to my server demonstrates that I am not seeking to link farm.
Bottom line? Educate bloggers on how to integrate with alternative service providers, and aschew blogspot hosted blogs.
The legit will rise to the top, and the rest will be safely ignored.
Now we've begun adding "social" in front of all kinds of new web applications. Anything that lets other users see your profile and the items you post and comment on them is seen as a valid replacement for real human contact.
Del.icio.us has none of these features, and the words "social filtering" are not used to imply any sort of substitution of human contact. It is a system where you can file bookmarks and can find the most popular bookmarks as tagged by other users. "Social filtering" is the phrase that has stuck to describe this informal voting system. Feel free to suggest an alternative.
Your complaint about modern technology making a poor substitute for actual human contact may be valid, but really has nothing to do with this story or with del.icio.us.
I suspect you were also as disappointed as I was in school when I found out that "Social Studies" wasn't a place to talk to other students.
Simple. Have 8 different domains, or even 8 different URLs from the same domain. One backlink for each in the sig. I'll admit to doing this on one particular forum, but it's a web developer's site, and it's a common practice there.
The real solution, at least as far as search engine rank goes is the new rel=nofollow attribute for links that Google started using a few months ago. The best link that I could find when I was looking at this a couple weeks ago is this one. If it grows in popularity and the major forum and blog sites start using it on comments and signatures, the spam in blogs won't be able to affect search results nearly as strongly as they do today. (Unfortunately, readers will still have to skim past the SPAM comments in the victim blogs)
I wish there was a way of stabbing people in the face over the Internet when they use that awful buzzword "blogosphere"!!
Oolite: Elite-like game. For Mac, Linux and Windows
How is this news? And how did they not see this coming? Every single public audience interactive media on the internet thus far has been invaded with spam, ads and other crapware - usenet, irc, email, BBSes, forums, wikis ... Why didn't the blog software writers plan for this when creating their software? Is this not a bit MS-like by putting out software to grab a market and only later bothering with security features?
Caesar si viveret, ad remum dareris.
Besides, why try to separate ham from spam ? spam is essentially "SPicy hAM", and it is del-icio-us !
Trying to view blogspot, and at some point, it harassed me for a login at some point (maybe I wandered in the wrong direction). So I used the firefox BugMeNot plugin, and voila`, I had a blog. Okay, so it was more of a wiki than a blog, I really didn't post anything, and it was always being covered by folks writing crappy poetry.
Then some dweeb from canuckistan changed the password and uses it to boost the google ratings for his pathetic little torontine blog about getting drunk with ron jeremy.
The "random blog" used to show that most of the blogs on blogspot were in fact spam. Where'd that button go?
I just want to point out how much I love Findory. Findory is a website that uses the news stories you clicked on to pick more news stories for you to read, and tends to be very accurate. If you worry about privacy, it even works without a login or cookies, just not as well. Do any Slashdotters use Findory? I just love telling people how much I love using Findory, especially the personalized RSS feeds.
i've asked this before but no one has ever answered me. WHY DO PEOPLE CARE ABOUT BLOGS?!!!! what makes them so interesting? all the blogs i've seen looked like journals only they sucked more.
Who is going to pay $1 to read about how your boyfriend dumped you last week and you're still crying in bed
A *lot* of people... and a lot more than $1 too.
Soaps, Reality TV.... Big Brother is the ultimate proof of this - a dozen totally uninteresting people sitting in a room for 2 weeks - gets top rating and is simulcast on 3 channels simultaneously.
Surely it's only a matter of time before we start seeing del.icio.us tags getting link spammed :(
Seems like a reasonable way to do it to me. (Cue Rob rushing out the door to the patent office)
Allow users to directly rate the worth of the sites Google returns in a search. Anything from "Not what I was looking for", "This is a crap site", "Nothing but advertising" to "This is probably illegal".
It would give Google direct stats on the worth of the sites. People marking competitors down could be made difficult through techniques like character recognition.
Deleted
We then get to the huge problem. How do you measure quality? In fact, what is quality? I think slashdot karma is a good attempt at it.
Deleted
Like email spam, these sites will continue to exist so long as people click on the links, thus supporting the business model.
RichM
Data Center Knowledge
The description from my WebHashcash site:
WebHashcash is a Java-based anti-spam mechanism for collaborative web sites such as weblogs, discussion forums, and wikis, to guard against automated content posting, fake user registration, or ballot stuffing. It adapts the Hashcash email anti-spam system to web forms.
Hashcash is a system designed in 1997 by Adam Back whereby all messages require a modest investment of CPU power in order to generate a "stamp" which will be accepted by the recipient. The CPU processing happens transparently before the user even tries to send his message, and it usually doesn't take more than a few seconds. Thus, a small and easily-verified "postage" is attached to all messages. This cost is negligible for legitimate users, but prohibitive for spammers, thereby destroying the economics of spamming.
Click here to view instructions for installation onto your web site.
David Schneider-Joseph
It's called "using my brain/eyes" and "communicating with people".
;)
On sites that I already know and like (including some blogs), people mention and link to other sites, like, say, blogs. Since I then go over there for real content, well, guess what, it's not a link farm; it's good.
Problem solved
Who is going to pay $1 to read about how your boyfriend dumped you last week and you're still crying in bed.
No, but I bet there's a nerd market for the contact details of chicks on the rebound.
Ceci n'est pas un sig.
How many times are we going to get this site plugged?
This has happened with almost all technology, from Email to Websites to Forums to anything.
Derive Politics
You can tell by the lack of bad poetry.
Am I the only one who find the term "blogosphere" almost as annoying as "cyberspace" and "e-{insert whatever word you want here}"?
Do not fold, spindle or mutilate.
You can't stop the signal.
There is no God, and Dirac is his prophet.
The verification of a user account needs only happen once. There is a free UNIX host that does this, and there are a few of online games that essentially take a one time fee. I think that a lot of people would be willing to slap an actual dollar in an envelope and send it.
Your attempt to googlebomb microsoft using a "viral sig" is rather futile since the googlebot does not log in, and therefore does not see the sigs. Just so you know. Carry on.
Last I checked it was a penny for your thoughts, at most just two cents, now you're telling me you want to charge a DOLLAR!?
Well, yes, but you'll understand that we haven't raised the price of thoughts for nearly 200 years now (2c is the price of a thought at the moment, a penny is merely their redemption value after use -- thought recycling is important for the environment). This rise is merely in line with inflation, and you've benefited from cheap thoughts in the meantime.
Google for "new idria, ca"
The first link *is* relevant, and maybe 2 more on the first Google page are as well.
The rest? PURE CRAP. Lawyers in New Idria, CA? Job listings? Home appraisals? All just SPAM.
(FYI, New Idria, CA is a ghost town. It has a population of 3. There are no homes being sold, and thank god, no lawyers there either.)
So, I was looking for further history & photos and I was flooded with marketing garbage. Take a look at some of the URLs. It's clear that they're trying to boost their rank based on city names and not actually relevant content.
The web was around long before blogs. Google was necessary, successful and incredibly useful long before blogs.
Blogs are fine. But 99% of the time, they are useless to me when I'm searching for something. I'm often after technical data or reviews, and blogs are not usually the best source, or the best venue, for such things.
This is just one more argument for not having all tehse srevices be free.
If people had to pay even a nominal fee ($12/year) the majority of the spam blogs would disappear. And probably 90% of the crap blogs, too. They'd either quit because it's not worth the cost, or (in a minority of cases) they'd actually start thinking more before blogging (which has to be one of the stupidest words of the last 100 years, right up there with "bling-bling").
Even the most talented bloggers don't always have something interesting to post. Which is the main reason for a daily aggregate blog site like ...
http://www.blogspy.net/ (shameless plug :) )
No. They can censor all they want. Freedom of Speech is something that the government cannot infringe on. Individuals can infringe all they want. As long as I can get the information elsewhere and not through the almighty google, no right has been violated.
click me
I started the new one at Modblogs, and have had much better luck. Mainly, I just post the code at the one there - shell scripts and programs and such.
The rub is, while I would love it if everybody could screen out the garbage, I post (or at least try to post) informative articles - howtos, tips and tricks, guides, simple programs in various languages - and people looking for that particular solution have to be able to find it. The catch 22 turns out that search engines drop the index of my blog regularly, and when they pick it up again, people get all the spam in their search.
The solution is, of course, for people to maintenence their own site. Slashdot manages to screen out bots. Many bulletin boards do so. It's not rocket science. I picked blogs because they were a free web presence alternative to a paid site, without the "free web page" hassles of making my readers look at annoying ad banners and pop-ups. If any body out there has alternative ideas, I'm all ears. It does look like Blogspot has gone from bad to worse to panned out entirely.
Blogebrity could run an unlinked z-list.
Honestly, though, this is one real reason for A-lists.
Just another reason for me not to allow comments on my BLog. I really don't need some scumbag spammer trying to pitch their product on my BLog. I look forward to the day when the patriot act allows us to throw spammers in camp Gitmo. Oh wait, that would be too merciful. How about we just declare them an enemy of the state and shoot them on site.
Sorry, couldn't help myself
I do love "!" but not as much as I love "..."...
OK. This is pretty lame, but it must be posted:
I do not like weblogs and spam.
I do not like them RAM-I-am.
Do not like them here or there
I do not like them anywhere.
Not on a site, nor click with a mouse
Not here or there, not anywhere
I do not like weblogs and spam
I do not like them, RAM-I-am
Could you? Would you? With a goatse.cx?
Could you? Would you? On a VPN?
Could you? Would you? Behind a firewall?
Could you? Would you? On a wi-fi?
Not with a goatse.cx. Not on a VPN.
Not behind a firewall. Not on a wi-fi.
Not on a site. Nor click with a mouse.
Oh, no!
Not in a linux box. Not with firefox.
Not in a B-tree. You let me be!
I do not like weblogs and spam!
I do not like them, RAM-I-am!
I do not like weblogs and spam!
"You'll get nothing, and you'll like it!"
I'm just in the middle of launching an easy-to-use outsourced CAPTCHA system at http://botblock.com/ - we'll be evolving it rapidly to try and make it easy as pie for web admins to block comment spam in an outsourced way. The difference between this and piece of software you download for your blog is that A) you don't need server access to use BotBlock and B) we can evolve BotBlock to use increasingly sophisticated CAPTCHAs without requiring website operators to change anything. I've already deployed it on my personal blog and gotten commentspams down to zero per week from several hundred a week. I'd love to hear what folks think.
David E. Weekly
Code / Think / Teach / Learn
h4x0r for